Hexadecimal format and storage on a computer hard drive: Does it store in half the bytes

hard drivehexadecimalstorage

Allow me to preface that I am not a computer specialist. More than anything, I am curious as to the information.

In a conversation with a computer science specialist, I was told that a string of decimal numerical values, such 73829182093, could be stored on a hard drive occupying only half of the needed bytes by utilizing a hexadecimal system. As said by the specialist, a string of six decimal numbers could be stored as 3-bytes, because each number could be represented by a hex digit, which is only 4-bits in size. Is this correct with regard to storage on a hard drive? Note, I am referencing storage on a hard drive, not the required memory needed to display.

My previous understanding is that all information was stored in a binary form (0s and 1s) on hard drives, and in blocks of 8-bits, in modern computer hard drives. And that hexadecimal is utilized to facilitate the display of information, so humans aren't required to read through long blocks of bits.

If this is true, does this mean that in a given scenario, a block of 8 bits on a hard drive, under a hexadecimal storage, would instead be encoding for two half-bytes of data, instead of 8 full bits for a character, like the letter "M"? Or on a hard drive, is the half-byte actually represented with the full 8 bits and then just omitted when displayed?

Thank you.

Best Answer

My previous understanding is that all information was stored in a binary form (0s and 1s) on hard drives, and in blocks of 8-bits, in modern computer hard drives. And that hexadecimal is utilized to facilitate the display of information, so humans aren't required to read through long blocks of bits.

That's 100% correct. Hexadecimal is merely a representation of data; there's nothing special about the nature of hexadecimal compared to other formats. It doesn't enable data compression or anything like that.

I think what your friend was referring to is the difference between representing numbers as character strings versus representing numbers as numbers.

For unsigned integers -- which is a representation of numbers in bits (zeroes and ones) from 0 to a certain, fixed, maximum number -- the maximum number that can be represented by N bits is 2^N, minus 1, assuming you start with 0.

So, if you have 8 bits (a.k.a. 1 byte), you can represent every number from 0 to 255 without losing information; you can manipulate these eight bits between 0 and 1 to unambiguously represent every number from 0 to 255, inclusive. Or from 1 to 256, if you prefer. It doesn't matter. Computers tend to represent them starting from 0, though.

If you have 16 bits (2 bytes), you can represent every number from 0 to 65535 (that's 2^16 - 1). 32 bits, every number from 0 to 4294967295. 64 bits, every number from 0 to a number that's 1.8 with nineteen zeroes.

You might know from algebra that 2^N is an exponential function. That means that, even though 64 bits is only eight times more bits than 8 bits, it can store way, way, way more data in that 8-times-more-bits than the number 255*8 (which is only 2040!). 2040 is a very small number compared to approximately 180000000000000000000. And 64 bits can store EVERY number from 0 all the way up to that maximum.

One interesting implication of integers stored in this way is that the programmer must decide in advance how big the storage needs to be, which in turn, determines the maximum number that can be represented by a given integer. If you try to store a number bigger than the storage can handle, you get something called overflow. This happens, for example, if you have an 8-bit integer, that's set to 255, and you ask the computer to add 1 to it. Well, you can't represent 256 within an integer whose range is 0 to 255! What usually happens is it "wraps around" back to the start, and goes back to 0.

There are programs that perform math in a mode called "arbitrary-precision" that automatically resize their storage to grow bigger and bigger depending on how big the numbers being handled are; for example, if you multiplied 255 by 100000, the answer would have to grow beyond 8 bits, and beyond 16 bits, but would fit within a 32-bit integer. If you input a number or performed a math operation that produced a number larger than the maximum for a 64-bit integer, it would have to allocate even more space for it.


HOWEVER -- if you represent numbers as a character string, then each number will take up as much space as a letter in written prose. "ASDF" and "1234" take up exactly the same space. "OneTwoThreeFourFive" (19 characters) takes up the same space as "1234567890123456789". The amount of space required grows linearly with the number of numbers (or letters, or characters, generically) you have. That's because each character can represent any of a myriad of characters within the character set, and numbers are just characters in a character set. A specific sequence of zeroes and ones will produce the number "3", a different sequence will produce "4", etc.

Typically characters are stored taking up either 8 or 16 bits, but some character encodings either take up a variable number of bits depending on the character (like UTF-8), or always take up a larger number of bits (like UCS-32).

If each character takes 8 bits, "OneTwoThreeFourFive" and "1234567890123456789" both take up 152 bits. But "1234567890123456789" can fit within a 64-bit unsigned integer, which... only consumes 64 bits. That's a savings of 88 bits! And we didn't even use any "data compression" tricks like Zip, 7-Zip, RAR, etc.

Related Question