Skip to content

Add a filter to handle newline-delimited JSON #226

Closed
@tho

Description

@tho

I would like to propose adding a new JQnl (or JQLines) filter to the script package. The filter would process newline-delimited JSON data by applying a JQ query to each JSON object in the input. It should work with pretty-printed and compact (jq -c; one JSON object per line) JSON input.

Newline-delimited JSON is widely used for streaming data where each line represents a self-contained JSON object. The current JQ method, unlike the jq command line utility, only processes a single JSON object from the input, making it difficult to process streams of data.

Use cases

  1. Log Analysis

    Application logs are often output as JSON objects, one per line. With JQnl, users could extract and transform specific fields from log files, filtering for specific log levels, error messages, etc.

  2. Processing Paginated API Results

    When dealing with APIs that return paginated results, JQnl would allow processing paginated reponses as part of a script pipeline.

  3. Data ETL Workflows

    For Extract-Transform-Load workflows where each record is a separate JSON object, the method would streamline the processing of large data sets by applying transformations to each record in the stream without loading everything into memory at once or rather before applying the JQ filter.

Concrete example for the Log Analisys use case

Extrct all warning and error messages from a JSON log file, e.g. output of slog.

/tmp/log.json - The mixed compact-prettyprint-compact format is intentional for illustration purposes.

{"time": "2025-03-17T18:04:26.534789-07:00", "level": "INFO", "msg": "info message"}
{
        "time": "2025-03-17T18:04:26.534946-07:00",
        "level": "WARN",
        "msg": "warn message"
}
{"time": "2025-03-17T18:04:26.534953-07:00", "level": "ERROR", "msg": "error message"}

jq - command line utility for reference

$ cat /tmp/log.json | jq 'select(.level=="WARN" or .level=="ERROR") | .msg'
"warn message"
"error message"

JQ - script.Stdin().JQ(os.Args[1]).Stdout()

$ cat /tmp/log.json | ./scriptJQ 'select(.level=="WARN" or .level=="ERROR") | .msg'
$ # no output, since `JQ` only processes the first JSON object

JQnl - script.Stdin().JQnl(os.Args[1]).Stdout()

$ cat /tmp/log.json | ./scriptJQnl 'select(.level=="WARN" or .level=="ERROR") | .msg'
"warn message"
"error message"

Sample implementation of JQnl:

func (p *Pipe) JQnl(query string) *Pipe {
	return p.Filter(func(r io.Reader, w io.Writer) error {
		q, err := gojq.Parse(query)
		if err != nil {
			return err
		}

		code, err := gojq.Compile(q)
		if err != nil {
			return err
		}

		dec := json.NewDecoder(r)
		for dec.More() {
			var input interface{}
			err := dec.Decode(&input)
			if err != nil {
				return err
			}

			iter := code.Run(input)
			for {
				v, ok := iter.Next()
				if !ok {
					break
				}
				if err, ok := v.(error); ok {
					return err
				}
				result, err := gojq.Marshal(v)
				if err != nil {
					return err
				}
				_, err = fmt.Fprintln(w, string(result))
				if err != nil {
					return err
				}
			}
		}

		return nil
	})
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions