When hash indexes should be used in Oracle

indexoracle

Hash indexes are usefull when you have a large table with URLs and you need to query by them. So, a solution would be to have an additional column "url_crc32" that will be filled with hashed URL value via trigger on inserts. An index on url_crc32 would be definitely faster than index on URL column that is a type of text.

It's not very common case to query data by URL. The more frequent case would be to query text data by fragment and hash indexes are useless in such case. So, I'm curious do you use such hash indexes, and if so then when do you use? IMHO Oracle does not have native hash indexes, so that must be done manually.

Best Answer

You'd use a hash value as an identifier for rows of data whenever that's "practical", i.e. you actually need a simple identifier for all your rows, there are no candidate (natural) key fields that are practical (too wide for example), and an ordinary (sequential for instance) generated identifier doesn't cut it (for instance you need that row identifier to be "global" - if the same row was created in two distinct databases, they should have the same identifier).

(One non-trivial example of such a thing would be Git. Each object stored in a git repository is uniquely identified by a SHA-1 hash, which is pretty handy to refer unambiguously to a given commit.)

If you need to use something like that, indeed you'll need to create it yourself with Oracle (adding that hash as a column to the table and indexing it, or with a function-based index).

You could go even further by creating your very own index type with application domain indexes if a plain hash isn't good enough – full-text indexing is, I believe, implemented this way.

There is something built-in to Oracle that is hash-based and doesn't actually need an index to get fast row retrieval: hash clusters. You can theoretically retrieve the target row with as little as a single-block I/O (which normal index+table lookups can't match, and even IOTs can't match unless the table is really small). Do read the When to Use Hash Clusters docs though, they are quite peculiar, and you need a good key (one or more columns) in the first place to use them.