Openweathermap json to csv

csvjson

I need to convert some json containing forecast data to csv (for use in gnuplot). Have tried a couple of json2csv utilities

json sample:

{"cod":"200","message":0.006,"cnt":40,"list":[{"dt":1519333200,"main":{"temp":271.62,"temp_min":271.62,"temp_max":272.921,"pressure":1028.3,"sea_level":1037.2,"grnd_level":1028.3,"humidity":88,"temp_kf":-1.3},"weather":[{"id":800,"main":"Clear","description":"clear sky","icon":"01n"}],"clouds":{"all":0},"wind":{"speed":4.86,"deg":78.004},"rain":{},"snow":{},"sys":{"pod":"n"},"dt_txt":"2018-02-22 21:00:00"},{"dt":1519344000,"main":{"temp":271.22,"temp_min":271.22,"temp_max":272.193,"pressure":1028.11,"sea_level":1037.04,"grnd_level":1028.11,"humidity":100,"temp_kf":-0.98},"weather":[{"id":800,"main":"Clear","description":"clear sky","icon":"01n"}],"clouds":{"all":0},"wind":{"speed":4.52,"deg":80.0016},"rain":{},"snow":{},"sys":{"pod":"n"},"dt_txt":"2018-02-23 00:00:00"},

I've managed to use json2csv to flatten the data (and cut the first few unneeded fields):

1519333200 271.62 271.62 272.921 1028.3 1037.2 1028.3 88 -1.3 800 "Clear" "clear sky" "01n" 0 4.86 78.004 "{}" "{}" "n"
"2018-02-22 21:00:00" 1519344000 271.22 271.22 272.193 1028.11 1037.04 1028.11 100 -0.98 800 "Clear" "clear sky" "01n" 0 4.52 80.0016 "{}" "{}"
"n" "2018-02-23 00:00:00" 

I need either creating the new lines after the plain text date (sed??) or ideally flattening the json directly to csv

Best Answer

Regular expressions and sed aren't a great match for this kind of task, but it's fairly simple using jq, a JSON processing tool:

$ jq -r '.list[]|[.dt, .main[]] | @csv' < data.json
1519333200,271.62,271.62,272.921,1028.3,1037.2,1028.3,88,-1.3
1519344000,271.22,271.22,272.193,1028.11,1037.04,1028.11,100,-0.98

jq takes in a JSON data stream and filters it to extract the data you want, analogously to sed but for structured data rather than text. This command takes each element of the "list" array (.list[]) and plucks out the "dt" field (.dt) of the object inside along with everything in the "main" object too (.main[]), putting the whole lot into an array and converting that to a CSV row (@csv).


If you're interested in particular fields, or you want to reorder them, you can list them out too:

$ jq -r '.list[]|[.dt, .main.temp_min, .main.humidity] | @csv' < data.json
1519333200,271.62,88
1519344000,271.22,100

Just list out each one you care about and ignore the rest.


If you want to get into other parts of the data, you can put that in there too: .clouds will look into the clouds field, or you can read the wind speed with .wind.speed - always a dot before the name, like you were writing JavaScript. The jq manual is quite comprehensive for whatever custom behaviour you need.

Related Question