Password hashes: Fixed-length binary fields or single string field

database-agnosticoptimizationperformance

I'm currently having am amusing debate with a friend, and we simply can't agree on the best method to store salted passwords in a database. The two options on the table are:

Storing the hash and the salt together as a string, delimited by some agreed character (for example: "E69B4A103…598D$59FBA6")
Storing the hash and the salt in separate fixed-length binary fields

Obvious advantage to the first is only having to retrieve one field, but at the cost of the field being significantly larger (especially when using SHA512 or a large salt), and performance issued with string manipulation later down the road in the application.

Option two has the advantage of being much smaller, but at the cost of a slightly more complex query to retrieve two fields, not one.

Our question is: which option would make the most sense for a large database where performance is a key factor?

Best Answer

There are 2 bits of information. This means 2 fields. It is that simple.

In practical terms, selecting 2 columns is zero extra complexity. Having to read a large field to parse one bit out is unnecessary complexity and performance overhead.

A couple of SO questions on storing salts with hashes

Related Solutions

Sql-server – Optimising join on large table

Your ix_hugetable looks quite useless because:

it is the clustered index (PK)
the INCLUDE makes no difference because a clustered index INCLUDEs all non-key columns (non-key values at lowest leaf = INCLUDEd = what a clustered index is)

In addition: - added or fk should be first - ID is first = not much use

Try changing the clustered key to (added, fk, id) and drop ix_hugetable. You've already tried (fk, added, id). If nothing else, you'll save a lot of disk space and index maintenance

Another option might be to try the FORCE ORDER hint with table order boh ways and no JOIN/INDEX hints. I try not to use JOIN/INDEX hints personally because you remove options for the optimiser. Many years ago I was told (seminar with a SQL Guru) that FORCE ORDER hint can help when you have huge table JOIN small table: YMMV 7 years later...

Oh, and let us know where the DBA lives so we can arrange for some percussion adjustment

Edit, after 02 Jun update

The 4th column is not part of the non-clustered index so it uses the clustered index.

Try changing the NC index to INCLUDE the value column so it doesn't have to access the value column for the clustered index

create nonclustered index ix_hugetable on dbo.hugetable (
    fk asc, added asc
) include(value)

Note: If value is not nullable then it is the same as COUNT(*) semantically. But for SUM it need the actual value, not existence.

As an example, if you change COUNT(value) to COUNT(DISTINCT value) without changing the index it should break the query again because it has to process value as a value, not as existence.

The query needs 3 columns: added, fk, value. The first 2 are filtered/joined so are key columns. value is just used so can be included. Classic use of a covering index.

Sql-server – How will large index INCLUDE fields affect system performance

Ever is a big word, but, in general, no, I wouldn't put a varchar(2000) field into an INCLUDE.

And yeah, the way that data is stored at the page level can seriously impact performance of the index, depending on how the index is used.

The thing is, the more rows of data you can cram into a page, the fewer pages have to get accessed, the faster your system is, for the most part. Adding a really large column means less information stored on a page, so, in the event of range seeks or scans, more pages have to be read to retreive the data, seriously slowing stuff down.

To know for sure if this is an issue on your query, or on your system, you'd have to monitor the reads, especially the number of pages that the query uses.

Best Answer

Related Solutions

Sql-server – Optimising join on large table

Sql-server – How will large index INCLUDE fields affect system performance

Related Question