Curl JSON encoded in UTF-8

character encodingcygwin;jsonunicode

Important note: I'm using Cygwin


I retrieve JSON from a file that I alter before sending it to a server using curl. By now everything is ok except one thing, when the server receive the JSON, every special characters (with accents, etc) aren't well encoded. I suppose this was due to the fact that my JSON wasn't encoded in UTF-8 before being sent but I don't manage to convert it.

Here is the code:

sed -i 's/\r//g' $file
fileContent=`cat $file`

result=$(jq -c -j ".docs[$docIndex] + { \"_rev\": \"$rev\" }"<<<"$fileContent")            result="{\"docs\":[$result]}"

result=$result | sed 's/\r//g'
result=$result | iconv -t "UTF-8"
s=$(curl -H "Content-Type: application/json; charset=UTF-8" -H "Cache-Control: no-cache" -d "$result" $2/$3/_bulk_docs --silent)

My bash script and my JSON files are both encoded in UTF-8.
My LANG variable seems to be UTF-8. I checked with this: [[ $LANG =~ UTF-8$ ]] && echo "Uses UTF-8 encoding.."

Any idea?


Update

Here is the full script:

#!/bin/bash

# $1 = directory containing JSON files
# $2 = server url (+ port)
# $3 = database name

#Loop on every file of the given directory
for file in "$1"/*; do

    #Try to insert every document of a specific JSON file and retrieve their status (200 = OK; 409 = conflict)
    allStatus=$(curl -H "Content-Type: application/json" -H "Cache-Control: no-cache" --data-binary "@$file" $2/$3/_bulk_docs --silent | jq '.[] |.status' | tr -d '\r')
    docIndex=0

    #Loop on every status (one status = one document)
    while IFS=' ' read -ra statusArray; do
      for status in "${statusArray[@]}"; do

        #Remove unwanted windows characters of the file
        sed -i 's/\r//g' $file
        fileContent=`cat $file`

        #Retrieve the id of the current document
        id=`jq -r -j ".docs[$docIndex]._id"<<<"$fileContent" | tr -d '\r'`

        if [ "$status" = "409" ]
        then 
            #Retrieve the revision of the current document and add it to the document 
            rev=$(curl -X GET --header 'Accept: application/json' $2/$3/$id?revs=true --silent | jq -r -j '._rev' | tr -d '\r')
            result=$(jq -c -j ".docs[$docIndex] + { \"_rev\": \"$rev\" }"<<<"$fileContent")

            #Wrap the document inside {"docs":[]}
            result="{\"docs\":[$result]}"

            #Remove unwanted windows characters before sending the document (again)
            result=$result | sed 's/\r//g'
            result=$result | iconv -t "UTF-8"
            s=$(curl -H "Content-Type: application/json; charset=UTF-8" -H "Cache-Control: no-cache" -d "$result" $2/$3/_bulk_docs --silent)

            #Echo result
            echo $s
        else
          #Status should be 200.
          echo 'status: '$status ' - ' $id
        fi

        docIndex=$((docIndex+1))

      done
    done <<< "$allStatus"
done

This script aims to upsert documents inside a NoSQL database. If first try to insert and if if fails it retrieves a property (the revision) of the document, append to it and then retry once.

I know this script could be improved but this is my first bash script and it won't be used in production (only for testing, etc.).

Best Answer

You can use this in Bash on Cygwin, as long as you at least have Python 2.7 installed:

utf8_result="$(echo "$result" | python -c "import sys;[sys.stdout.write((i).decode('utf-8')) for i in sys.stdin]")"
Related Question