JQ and Find – Edit Files Matching Precondition

findjq

I have a fair number of JSON files that have different versions as a property. I want to modify files that only match a certain predicate. This is follow-up to in-place editing with find and jq.

This is important to me because if I modify a couple of hundred files and open a PR, I don't want anything other than what I intended to change. E.g. modifying indentation it looks like the entire file has been modified and it is a lot harder for a reviewer to see the actual difference.

To illustrate what I mean, place the following two files in a folder named json:

{
    "identifier": "1",
    "version" : "1.0"
}

Note the trailing space after version, it is intentional.

{
    "identifier": "2",
    "version": "2.0"
}

The indentation in these snippets are 4 spaces (default in vscode).

Run the following script pointed to the json folder:

find json \
    -name '*.json' \
    -type f -exec sh -c '
    tmpfile=$(mktemp)
    for pathname do
        cp -- "$pathname" "$tmpfile" &&
        jq "select(.version == \"2.0\").identifier |= \"\"" <"$tmpfile" >"$pathname"
    done
    rm -f "$tmpfile"' sh {} +

The outcome I expect is that only the file with version 2.0 would get modified and nothing but else changes except the value for identifier. The final result is instead this:

First file has original identifier (expected) but indentation is now 2 spaces (not expected) and the space efter version is gone (not expected):

{
  "identifier": "1",
  "version": "1.0"
}

Second file has empty identifier (expected) and indentation is now 2 spaces (not expected):

{
  "identifier": "",
  "version": "2.0"
}

As far as I can tell, jq cannot preserve indentation so to limit the scope of the question I'm only interested in how I can leave files with non-matching versions untouched.

I have managed to solve this using grep in conjunction with the answer in the question linked above, but are there more efficient ways using only jq? Bonus would be to retain all indentation of the original file, if possible.

find json \
    -name '*.json' \
    -type f -exec sh -c '
    for pathname do
        if grep "\"version\": \"2.0\"" "$pathname" 1> /dev/null; then
            tmpfile=$(mktemp)
            cp -- "$pathname" "$tmpfile" &&
            jq "select(.version == \"2.0\").identifier |= \"\"" <"$tmpfile" >"$pathname"
            rm -f "$tmpfile"
        fi
    done' sh {} +

Best Answer

Using grep on JSON files might work, but it relies on the file's formatting to be as expected. It's additionally non-trivial to also test for a non-empty value of the identifier key at the same time as identifying version 2.0, taking into account that the ordering of keys in a generic JSON document is not fixed. So, yes, it is better to use jq for this.

The task is only to touch JSON files that need to change. These are files where version has the value 2.0 and where identifier is non-empty.

With -e, jq can be made to exit with an exit status given by the last evaluated expression, and we can use this to test whether the current file is to be modified or not. With any(), we may check whether any of the selected input objects has a non-empty identifier value:

jq -e 'any(select(.version == "2.0"); .identifier != "")'

This will exit with an exit status of zero ("success") if the current JSON document needs modifications.

As part of your find command:

find json \
    -name '*.json' \
    -type f -exec sh -c '
    tmpfile=$(mktemp)
    for pathname do
        if jq -e "any(select(.version == \"2.0\"); .identifier != \"\")" "$pathname" >/dev/null
        then
            cp -- "$pathname" "$tmpfile" &&
            jq "select(.version == \"2.0\").identifier |= \"\"" <"$tmpfile" >"$pathname"
        fi
    done
    rm -f "$tmpfile"' sh {} +

Note that any document that is changed by that second call to jq would be rewritten, meaning it would potentially change indentation and other whitespaces in the file apart from just the identifier key's value. This does not impact the JSON document from a parser's perspective but could trigger tools that aren't JSON-aware to report further changes to the file.

If you want to write JSON with four spaces of indentation, then add --indent 4 to the second invocation of jq.

Related Question