Postgresql – How to model data which requires index on all columns

database-designnosqlpostgresqlrdbms

I have following use case where there are bunch of IDs (GUID) which are mapped to each other. I need to lookup all the IDs given any single ID. For example, let's say my data has following columns:

uid, id1, id2, id3, id4, id5

How to support the following queries:

1. SELECT uid FROM table WHERE id1=x;
2. SELECT id1, id2, id3, id4 FROM table WHERE uid=xyz;

The number of IDs can change (I have to add new columns).

I can create indexes on all columns but that would not be very effective. Should I model this data differently?
Is there any NoSQL database which can help model this use-case?

To give more business context for clarity, we have data coming from multiple data sources which have their own custom_id. But they all identify the same person (UID). Thats why we need to map all the data source ids to our own internal UID. Hence the question.

  • The number of columns have max bound of 20-25.
  • They are nullable.
  • The number of id's can be different per row (thats why nullable).
  • Datatypes are STRING for all.
  • UID will be primary key.

Best Answer

You can model your table as a dictionary with two columns:

  1. UID
  2. ID_VALUE

Both would be indexed with dedicated index.

You can then add as many ID_VALUE rows for any UID and search for an UID given ID_VALUE.

Queries would then look like

SELECT UID FROM table WHERE ID_VALUE=x;
SELECT ID_VALUE FROM table WHERE UID=xyz;

Second query would return more than one row.


EDIT

I need to lookup all the id's given any single id

As long as you have all IDs bound by a common column, in this case UID, this should be possible. Consider following query:

SELECT t2.id_value
  FROM table t1
      ,table t2
 WHERE t1.uid = t2.uid
   AND t1.id_value = x

This will return all id's given a single id X.

It really doesn't matter what are the values of ID_VALUE. They may be other UIDs pointing to this one or any other arbitrary number. If you want to search of any ID_VALUE including the UID value, you can include this UID value into the set too so you will have one row where UID = ID_VALUE.

EXAMPLE

Let's say you have following data in your model:

UID   ID1   ID2  ID3
--------------------
  1     2     3    4
  2     1     4
  3     4
  4     2     3

You can model this data in a dictionary table like so:

UID   ID_VALUE
--------------
  1          2
  1          3
  1          4
  2          1
  2          4
  3          4
  4          2
  4          3

If you want to include the UID in the list so the last query in my example would return ALL of IDs including the UID, you would then have:

UID   ID_VALUE
--------------
  1          2
  1          3
  1          4
  1          1 <---
  2          1
  2          4
  2          2 <---
  3          4
  3          3 <---
  4          2
  4          3
  4          4 <---

I've marked the rows added to satisfy this requirement. Executing the query:

SELECT t2.uid
      ,t2.id_value
  FROM table t1
      ,table t2
 WHERE t1.uid = t2.uid
   AND t1.id_value = 1

Will result in:

UID   ID_VALUE
--------------
  1          2
  1          3
  1          4
  1          1 
  2          1
  2          4
  2          2

This will return all IDs given any single id as per requirement. Note that ID_VALUE of 1 can be found in both sets with UID = 1 and UID = 2, both are returned.

This would be equivalent to executing this query on your flat design:

SELECT * FROM table WHERE UID=1 OR ID1=1 OR ID2=1 OR ID3=1