Sql-server – Import poorly formatted CSV/JSON to SQL Server

importjsonsql serversql-server-2016

I am trying to parse Android notification logs into a database. The columns are uneven because different notifications omit certain attributes. Essentially, it is a CSV file of key-value pairs. To put it another way, it is a really poorly formatted (& syntactically incorrect) JSON file. Maybe it would be easier to show you…

{"packageName":"com.automatic.mustang","postTime":1489795785973,"systemTime":1489795786083,"offset":-14400000,"isOngoing":false}
{"packageName":"com.enhance.gameservice","postTime":1489040766066,"systemTime":1489040766136,"offset":-18000000,"isOngoing":false}
{"packageName":"com.automatic.mustang","postTime":1489795785973,"systemTime":1489795786083,"offset":-14400000,"isOngoing":false}
{"packageName":"com.enhance.gameservice","postTime":1489040766066,"systemTime":1489040766136,"offset":-18000000,"isOngoing":false}

I have truncated them for the sake of brevity, but they do not line up toward the end of the rows. Some text values are quoted, some are not. All of the keys are quoted. I can use string functions to trim the extraneous characters once I get the columns sorted, but therein lies the rub: How do I sort the values into columns? I can do this in about 4 different programming languages, but I am just learning T-SQL.

How can I parse this to SQL Server (2016) in a single query? Is it even possible?

I could import it to a temporary table & then transfer it to the permanent one, but I would like to have one query to digest the info, since this will be repeated on a daily (maybe more) basis. The magic would be if I could declare a second-level delimiter, since all values are preceded by a colon. Declaring the colon as the delimiter breaks it the other way…arg(s)!

The data is all around ragged. Even the notifications from the same app may be drastically different. The way I would do it in PHP, C, JS, etc would be to loop through the keys until I hit the ones I needed, grab the appropriate values following them (or the whole pair), then trim the extraneous chars. I just don't know how to do it in T-SQL & the online docs leave something to be desired.

The expected result would be something like this:

Package Name | Post Time | System Time | Offset | NotificationText
-----------------------------------------------------------------
automatic    | 1489795   | 786083      | -1400  | this is the note
gameservice  | 1489756   | 786090      | -1400  | this is the note
automatic    | 1489799   | 786045      | -1400  | this is the note

The big problem is that the info I need is scattered throughout the rows. I don't need much of it, but it is buried.

Best Answer

You can use the technique shown in Loading line-delimited JSON files in SQL Server 2016 on the SQL Server Database Engine Blog, written by Jovan Popovic .

For example, given a file import.txt:

{"packageName":"com.automatic.mustang","postTime":1489795785973,"systemTime":1489795786083,"offset":-14400000,"isOngoing":false}
{"packageName":"com.enhance.gameservice","postTime":1489040766066,"systemTime":1489040766136,"offset":-18000000,"isOngoing":false}
{"packageName":"com.automatic.mustang","postTime":1489795785973,"systemTime":1489795786083,"offset":-14400000,"isOngoing":false}
{"packageName":"com.enhance.gameservice","postTime":1489040766066,"systemTime":1489040766136,"offset":-18000000,"isOngoing":false}

and a format file import.fmt:

13.0
1
1 SQLCHAR 0 0 "\r\n" 1 json ""

The file can be bulk imported and parsed with:

SELECT
    Parsed.packageName,
    Parsed.postTime,
    Parsed.systemTime,
    Parsed.offset,
    Parsed.isOngoing
FROM OPENROWSET
(
    BULK 'C:\Temp\import.txt',
    FORMATFILE= 'C:\Temp\import.fmt'
) AS BulkLog
CROSS APPLY OPENJSON(BulkLog.[json])
WITH
(
    packageName varchar(50),
    postTime    bigint,
    systemTime  bigint,
    offset      bigint,
    isOngoing   bit
) AS Parsed;

Giving:

Parsed results