Why Store Text in an SQL Image Field?

sql servert-sql

I've been asked to get some info out of the database of a system we use. I've found what I'm looking for, but its left me… confused.

For some reason, the developers have the system storing text in an SQL Image field. I suspected this, but wasn't entirely sure, until I took the Hex output from my select statement, put it into a Hex to ASCII converter, and hey presto, it's exactly the kind of message I'm looking for.

What I'd like to know is: why? Is there some legitimate reason why someone would store text in an Image field?

Given the quite serious and private nature of these messages, the only thing I can think of is that someone was seriously mistaken in thinking that storing the text in an Image field would somehow make it more secure? Or is there really just something obvious I'm missing?

EDIT: Covered in the comments, but: At minimum, this text will occasionally likely include accented characters and such. Plus, my manager just let me know that the software was written using SQL Server 2000, or maybe even a version or two earlier. Possibly the simple answer is that there wasn't an analogue to nvarchar or similar back when it was written?

Best Answer

It seems like there are several possible choices, but without going back to the original developers of your system, there's no way to be sure. Some of the possibilities include:

  • Conversion from another system. If the developers wrote their software against database engine X (Oracle, MySQL, Access, SQL Anywhere, etc.) and then converted their system to SQL Server 2000, it is possible their conversion tools / methodology was not "smart" enough to use the right data type for their data.
  • Improper UTF-8 encoding. As mentioned by others, it could be that the programmers didn't know how to encode UTF-8 or other non-ASCII data, and chose Image as a fallback storage method.
  • Data obfuscation. While not an encryption method, storing text in an image field, as you note, makes it non-trivial to view. Perhaps the programmers didn't want to take the time/effort to figure out a true encryption method, and thought the Image data type was "good enough" to hide the text from DBAs like yourself.
  • Lack of Database expertise. I've seen more than one case where the programmer chose some "default" data type for a column without really understanding what they were doing. In at least one case, it was because they used their programming language's IDE to define the database, rather than developing the schema through the engine's preferred database design tools. This can often lead to interesting data types that aren't appropriate for the data or choosing deprecated data types, etc.
  • Data column "sharing." It is possible the programmers, for whatever reason, are using this image column to store text in some cases, pictures in other cases? It is not best practice by any means, but I've seen systems where similar shortcuts were taken.
  • Source data conversion. I don't know where your source text originates. It is possible that the application isn't inserting raw text into an image field, but is uploading a file that contains text in human-readable form. Some document formats have human-readable text inside them, along with control codes. I recall that back in the day, WordPerfect documents stored data in a binary file type, but the actual text was mostly human readable if you opened the file in a text editor. The formatting control codes made lots of random "garbage" in the data, but you could extract the text if you wanted to. If your application uploads files like these, then the raw text might still be viewable in the column. In which case, an "image" data type isn't wrong.
  • Feature change over time. It is possible the Image data type was the appropriate type at one time in the design of the application, but the purpose changed over time with no change to the underlying database? Maybe they uploaded files originally, but later switched to inserting text instead. Or maybe they inserted text initially, but planned to eventually upload files instead of text. This is lazy programming; the schema should be changed over time to match the data, but again, this happens often enough...