I receive image files as XML data, each byte of the image being a node with its decimal value, e.g. for this example .png file, , the xml I get is:
DECLARE @xml XML = N'<?xml version="1.0" encoding="utf-16" standalone="yes"?>
<XmlData>
<Element>
<id>Test</id>
<image>
<Element>137</Element><Element>80</Element><Element>78</Element><Element>71</Element><Element>13</Element><Element>10</Element><Element>26</Element><Element>10</Element><Element>0</Element><Element>0</Element><Element>0</Element><Element>13</Element><Element>73</Element><Element>72</Element><Element>68</Element><Element>82</Element><Element>0</Element><Element>0</Element><Element>0</Element><Element>20</Element><Element>0</Element><Element>0</Element><Element>0</Element><Element>20</Element><Element>8</Element><Element>6</Element><Element>0</Element><Element>0</Element><Element>0</Element><Element>141</Element><Element>137</Element><Element>29</Element><Element>13</Element><Element>0</Element><Element>0</Element><Element>0</Element><Element>4</Element><Element>103</Element><Element>65</Element><Element>77</Element><Element>65</Element><Element>0</Element><Element>0</Element><Element>177</Element><Element>143</Element><Element>11</Element><Element>252</Element><Element>97</Element><Element>5</Element><Element>0</Element><Element>0</Element><Element>0</Element><Element>9</Element><Element>112</Element><Element>72</Element><Element>89</Element><Element>115</Element><Element>0</Element><Element>0</Element><Element>14</Element><Element>193</Element><Element>0</Element><Element>0</Element><Element>14</Element><Element>193</Element><Element>1</Element><Element>184</Element><Element>145</Element><Element>107</Element><Element>237</Element><Element>0</Element><Element>0</Element><Element>0</Element><Element>24</Element><Element>116</Element><Element>69</Element><Element>88</Element><Element>116</Element><Element>83</Element><Element>111</Element><Element>102</Element><Element>116</Element><Element>119</Element><Element>97</Element><Element>114</Element><Element>101</Element><Element>0</Element><Element>112</Element><Element>97</Element><Element>105</Element><Element>110</Element><Element>116</Element><Element>46</Element><Element>110</Element><Element>101</Element><Element>116</Element><Element>32</Element><Element>52</Element><Element>46</Element><Element>48</Element><Element>46</Element><Element>54</Element><Element>252</Element><Element>140</Element><Element>99</Element><Element>223</Element><Element>0</Element><Element>0</Element><Element>0</Element><Element>108</Element><Element>73</Element><Element>68</Element><Element>65</Element><Element>84</Element><Element>56</Element><Element>79</Element><Element>99</Element><Element>24</Element><Element>5</Element><Element>184</Element><Element>192</Element><Element>114</Element><Element>32</Element><Element>254</Element><Element>14</Element><Element>196</Element><Element>30</Element><Element>96</Element><Element>30</Element><Element>21</Element><Element>192</Element><Element>126</Element><Element>32</Element><Element>254</Element><Element>15</Element><Element>196</Element><Element>9</Element><Element>96</Element><Element>30</Element><Element>21</Element><Element>192</Element><Element>48</Element><Element>55</Element><Element>80</Element><Element>7</Element><Element>136</Element><Element>29</Element><Element>208</Element><Element>240</Element><Element>121</Element><Element>32</Element><Element>6</Element><Element>25</Element><Element>216</Element><Element>142</Element><Element>36</Element><Element>6</Element><Element>195</Element><Element>34</Element><Element>64</Element><Element>140</Element><Element>23</Element><Element>128</Element><Element>98</Element><Element>19</Element><Element>164</Element><Element>153</Element><Element>88</Element><Element>60</Element><Element>27</Element><Element>136</Element><Element>241</Element><Element>130</Element><Element>213</Element><Element>64</Element><Element>12</Element><Element>242</Element><Element>34</Element><Element>50</Element><Element>126</Element><Element>15</Element><Element>196</Element><Element>32</Element><Element>205</Element><Element>215</Element><Element>145</Element><Element>196</Element><Element>96</Element><Element>56</Element><Element>3</Element><Element>136</Element><Element>73</Element><Element>6</Element><Element>32</Element><Element>141</Element><Element>32</Element><Element>3</Element><Element>71</Element><Element>147</Element><Element>13</Element><Element>249</Element><Element>128</Element><Element>234</Element><Element>6</Element><Element>250</Element><Element>0</Element><Element>113</Element><Element>5</Element><Element>16</Element><Element>43</Element><Element>128</Element><Element>121</Element><Element>163</Element><Element>0</Element><Element>1</Element><Element>24</Element><Element>24</Element><Element>0</Element><Element>127</Element><Element>60</Element><Element>48</Element><Element>197</Element><Element>152</Element><Element>102</Element><Element>243</Element><Element>130</Element><Element>0</Element><Element>0</Element><Element>0</Element><Element>0</Element><Element>73</Element><Element>69</Element><Element>78</Element><Element>68</Element><Element>174</Element><Element>66</Element><Element>96</Element><Element>130</Element>
</image>
</Element>
</XmlData>'
In binary:
SELECT * FROM OPENROWSET(BULK 'C:\test.png', SINGLE_BLOB) AS q;
==========
BulkColumn
----------
0x89504E470D0A1A0A0000000D49484452000000140000001408060000008D891D0D0000000467414D410000B18F0BFC6105000000097048597300000EC100000EC101B8916BED0000001874455874536F667477617265007061696E742E6E657420342E302E36FC8C63DF0000006C49444154384F631805B8C07220FE0EC41E601E15C07E20FE0FC409601E15C030375007881DD0F079200619D88E2406C322408C17806213A499583C1B88F182D5400CF222327E0FC420CDD791C4603803884906208D200347930DF980EA06FA007105102B8079A300011818007F3C30C59866F3820000000049454E44AE426082
How to I retrieve the image file from the xml as varbinary?
I've asked a similar question not long ago, so I tried the following query, but the resulting binary data is incorrect:
SELECT r.c.value('id[1]', 'varchar(50)') AS id,
CONVERT(VARBINARY(MAX), (SELECT (t.u.value('.','tinyint')) FROM r.c.nodes('image/Element') AS t(u) FOR XML PATH(''))) AS image
FROM @xml.nodes('/XmlData/Element') AS r(c);
=============
id image
-------------
Test 0x31003300370038003000370038003700310031003300310030003200360031003000300030003000310033003700330037003200360038003800320030003000300032003000300030003000320030003800360030003000300031003400310031003300370032003900310033003000300030003400310030003300360035003700370036003500300030003100370037003100340033003100310032003500320039003700350030003000300039003100310032003700320038003900310031003500300030003100340031003900330030003000310034003100390033003100310038003400310034003500310030003700320033003700300030003000320034003100310036003600390038003800310031003600380033003100310031003100300032003100310036003100310039003900370031003100340031003000310030003100310032003900370031003000350031003100300031003100360034003600310031003000310030003100310031003600330032003500320034003600340038003400360035003400320035003200310034003000390039003200320033003000300030003100300038003700330036003800360035003800340035003600370039003900390032003400350031003800340031003900320031003100340033003200320035003400310034003100390036003300300039003600330030003200310031003900320031003200360033003200320035003400310035003100390036003900390036003300300032003100310039003200340038003500350038003000370031003300360032003900320030003800320034003000310032003100330032003600320035003200310036003100340032003300360036003100390035003300340036003400310034003000320033003100320038003900380031003900310036003400310035003300380038003600300032003700310033003600320034003100310033003000320031003300360034003100320032003400320033003400350030003100320036003100350031003900360033003200320030003500320031003500310034003500310039003600390036003500360033003100330036003700330036003300320031003400310033003200330037003100310034003700310033003200340039003100320038003200330034003600320035003000300031003100330035003100360034003300310032003800310032003100310036003300300031003200340032003400300031003200370036003000340038003100390037003100350032003100300032003200340033003100330030003000300030003000370033003600390037003800360038003100370034003600360039003600310033003000
Best Answer
This is close but missing a few pieces. You extract into rows of
TINYINT
the decimal value from each<Element>
in the XML (e.g. 137, 80, 78, etc), but then theFOR XML PATH('')
converts them back into strings and concatenates them, leaving you with a UTF-16 encoded string of "1378078...". Converting that intoVARBINARY
just turns each string digit -- "1", "3", "7", "8", etc -- into its binary / hex Code Point:Instead, you need to do the following:
TINYINT
"137" into hex /BINARY
"0x89"VARCHAR
, but without the leading "0x" (this requires using theCONVERT
function, notCAST
, so that you can specify the "style" of2
)FOR XML PATH('')
puts everything together in a string in the form of89504E470D0A1A0A0000...
, then you need to apply the "style" of2
again in theCONVERT(VARBINARY(MAX), ...
so that it knows that89504E470D0A1A0A0000...
is merely0x89504E470D0A1A0A0000...
without the leading "0x".Putting those pieces into your query, we get the following:
And that returns the following for the
image
field:Something to consider:
The method of sending binary data via
is probably the worst / least-efficient method possible. I realize that you said that you are receiving this info in this format and so probably are not to blame for this and have no control over it. However, just so everyone understands what is going on (please see UPDATE section below), each byte of the binary data is:
<Element>
and</Element>
tags { why not<Byte>
? } ( 19 characters )How does this work out? Well, only 10 decimal values ( 0 - 9 ) are 1 character / 2 bytes. Another 90 ( 10 - 99 ) are 2 characters / 4 bytes, while the remaining 156 values ( 100 - 255 ) are 3 characters / 6 bytes. So the majority of possible values take up the full 6 bytes, and only a small minority take up the minimum 2 bytes. This means that the average space taken up per each original byte is probably between 2 and 3 characters / 4 - 6 bytes (I guess they call that "5" in some places ;-) ? ).
For your particular example data, you can run the following query to see the breakdown:
That returns:
Now we can multiply each
ElementCount
by (ElementLength * 2
) to get the number of bytes. And we need to factor in the<Element>
tags, which is again 38 bytes per each of the 238 original bytes:And to put that into more concrete terms, we should compare that to the original binary size to get a sense of the bloat:
Meaning, if you are sent a 1 MB image (not unreasonable), that will be represented by 42.25 MB of XML. Yikes! (please see UPDATE section below)
But before anyone starts complaining about XML, this is not the fault of XML, which while being an undeniably bloated format, can do much better than this. XML (at least natively in SQL Server) supports the ability to handle binary data using Base64 encoding/decoding. For example, using the same test PNG binary value, we can convert that to a string in XML using the
FOR XML PATH
clause:which gives us:
And doing a
DATALENGTH(N'iVBORw0KGg...')
gives us 640 bytes, and including the<PngImage>
tag (DATALENGTH(N'<PngImage>iVBORw0KGg...</PngImage>')
) gives us a grand total of 682 bytes, not characters, but total bytes (only 2.87 times larger than the original) as compared to the 10,056 total bytes of the current method.And how much effort does it take to get the
VARBINARY(MAX)
value out of that Base64 encoded string? Absolutely NONE:Returns:
So, you might want to mention this to whomever is sending you this data. And Base64 encoding is fairly standardized and easy to encode/decode in most languages.
UPDATE
In researching something else I found that the XML datatype in SQL Server is optimized fairly well. It actually does not store the full text of the XML document. At the very least, it creates a dictionary (i.e. array) of unique element and attribute names, assigns each one a number, and uses that number to reference them in the document. This saves an incredible amount of space. So in this particular XML document, not only are the 19 characters of each
<Element></Element>
element within the<image>
node not repeated (which is 38 bytes in UTF-16 encoding), but in the main dictionary the nameElement
only shows up once even though it is used at two different levels of the structure.The optimized structure can only be seen when saving the document to an XML field of a table and then viewing the data page via
DBCC PAGE
. Viewing that shows some amount of bytes used for document overhead, the dictionary of element names (containing "Element" only once), and the 238 elements (representing the bytes of the PNG), each element having 5 bytes of overhead.I re-checked the size of each variation, this time checking the size directly for the original
@xml
variable (as found in the Question) and then converting that to bothNVARCHAR
and toVARCHAR
, and then doing similar checks to the Base64 encoded XML via the following change the 2nd to last test above:And the results:
As you can see, the size varies based on how it is stored. The XML in the Question originally came from a file. So that file, probably regular ASCII / ANSI text file, was 5094 bytes. And if it was stored in a
VARCHAR
field it would be that same size. If the file was saved with a UTF-16 encoding, then it was actually 10,188 bytes and would be the same size if stored in anNVARCHAR
field. But that same XML document, stored in SQL Server in aXML
field or variable, is only 2295 bytes! That is kinda cool :-). Although, 2295 bytes is still about 10 times larger than the original PNG file.But the Base64 encoding still holds up as the best way to store and transmit the binary value in XML. The exact same PNG file is only 690 bytes when stored in a
XML
field or variable. It is also only 708 bytes if that XML were stored in anNVARCHAR
field or UTF-16 text file, and only 354 bytes if stored in aVARCHAR
field or ASCII text file.