Postgresql – How do i store hash indices on the database

postgresql

I'm storing (references to) images in a database. I want to have reverse image search, so I wish to store a multidimensional hash of the image – for example width to nearest 10px, height to nearest 10px, and perhaps a downsampled version of the image.

Originally I was going to store these hashes on disk, but then I realised I may be able to store in database in a much more organised (and perhaps performant) manner.

Best Answer

There are several considerations:

  • Consistency: if you store everything in the database, consistency between database entries and the hash are automatically guaranteed by the transaction system. That simplifies architecture and development.

  • Read/Write Performance: reading bigger data from the database performs much worse than reading files from the file system.

  • Backup/Maintenance: large databases are more difficult to back up than large file systems.

  • Search performance: unless you build special GiST index support for operators on these hashes (which requires C programming and dealing with the guts of PostgreSQL), you won't be able to efficiently search for the hashes in the database.