rsonpath

Yet another rsonpath cheat sheet.

Introduction

rsonpath is a blazing fast JSONPath query engine written in Rust. It allows to run structure-aware JSONPath queries on JSON data as easily as one would query a string with regular expressions. The JSONPath standard defines a query expression syntax for selecting and extracting JSON values from within a given JSON value. rsonpy is a lightweight Python wrapper around the Rust library.

To learn fundamentals about rsonpath, please refer to the rsonbook, and specifically the JSONPath reference.

Use cases

Rsonpath is made to filter documents ahead of parsing/loading. Think of it like a way to select a small part of the documents before loading it into memory. It will shine in situations where you have a JSON file too big to hold in memory, but you nevertheless want to load parts of it.

Rsonpath does not load anything into memory. It only works on the raw JSON, not on an in-memory data structure.

jqlang vs. rsonpath

If you are asking what’s wrong with jq, the authors provide concise answers per Why choose rsonpath?:

The most popular tool for working with JSON data is jq, but it has a few shortcomings:

  • it is extremely slow

  • it has a massive memory overhead

  • its query language is non-standard.

To be clear, jq is a great and well-tested tool, and rq does not directly compete with it. If one could describe jq as a “sed or awk for JSON”, then rq would be a “grep for JSON”: It does not allow you to slice and reorganize JSON data like jq, but instead outclasses it on the filtering and querying applications.

Examples

Unwrap

It is typical for HTTP JSON API responses to not start directly with a collection of data items, because a typical response also includes other metadata. In this spirit, when connecting pipeline elements of JSON processors, input data mostly needs to be edited into a variant suitable for storing into a database, likely also transitioning from a top-level object to a top-level list.

The following expression unwraps the actual collection which is nested within the top-level records element.

expression: $.records
type: rson
Example

Input data

{
  "message-source": "community",
  "message-type": "mixed-pickles",
  "records": [
    {"foo": "bar"},
    {"baz": "qux"}
  ]
}

Output data

[
  {"foo": "bar"},
  {"baz": "qux"}
]

Transformation definition

# Tikray collection-level transformation definition.
# Includes a Moksha/rsonpath transformation rule for unwrapping.
# https://tikray.readthedocs.io/
---
meta:
  type: tikray-collection
  version: 1
pre:
  rules:
  - expression: $.records
    type: rson