Sql-server – Full text index for a column that stores json which contains guid – Is this a good idea, space-wise

full-text-searchindexsql server

I'm considering adding a full text index for a column whose values are json strings. These jsons usually contain Guid (or UUID) values.

The index goal is to allow fast search over both "normal" words (e.g., some user generated text) and guids.

Some more info about this column, called RequestContent:

  • I actually have another column, ResponseContent, which the following applies for as well, so that I presume that an answer for this question would apply for it as well.

This column is essentially used as a log for HTTP Post requests' body, so that each incoming POST request body is logged into that column. There are hundreds of web services whose requests bodies are logged, and there is no relationship between two different web services' requests.

For example, one request could look like:

{ "orderId": "607bbc9b-9c0e-4921-b5c3-55f68891e619", "status": 0, "createdOn": 16011560804216 ... }

While another could look like:

{ "claimStatus": 1, "actionStatus": 8, "senderId": "807acc23-9dde-4aaa-b5c3-55df5431e619", "documents": [ { "documentTitle": "some free text" }] ... }

Any json may contain fields of any kind: number, short string fields (like firstName, or Guids, etc.) or long string fields (usually containing hundreds of words of free text, but sometimes even large Base64 encoded data), boolean, nested objects, arrays. If it matters, it would be safe to say that most requests (I'd say 80%) contain number and short string fields.

This column's type is ntext (yes, it's old), but I can recreate it as any other type that I'd like.

Is it a good idea to build a full text index for this column? My concern is that the index would contain a row for each unique Guid, which would result in consuming an unfeasible disk space.

EDIT:
@David Spillett asked:

Could you extract the parts that you want indexed into a persisted
computed column that you can then add a full text index for?

It's technically feasible, but I'm looking for an answer for the current use case.

Best Answer

... a full text index for a column whose values are json ... usually contain Guid (or UUID) values.

Is it a good idea to build a full text index for this column?

I would say "No".

A full text index makes all of the words in the field available for searching. I don't think that's what you want. You want to be able to search for those Guid/Uuid values and, possibly, specific Request Types.

My preference would be to pull these values out into separate fields in their own right and index them.

Failing that, use a JSON-aware index type, which will be better, but still slower than the separate fields.