Postgresql – How to model data which requires index on all columns

database-designnosqlpostgresqlrdbms

I have following use case where there are bunch of IDs (GUID) which are mapped to each other. I need to lookup all the IDs given any single ID. For example, let's say my data has following columns:

uid, id1, id2, id3, id4, id5

How to support the following queries:

1. SELECT uid FROM table WHERE id1=x;
2. SELECT id1, id2, id3, id4 FROM table WHERE uid=xyz;

The number of IDs can change (I have to add new columns).

I can create indexes on all columns but that would not be very effective. Should I model this data differently?
Is there any NoSQL database which can help model this use-case?

To give more business context for clarity, we have data coming from multiple data sources which have their own custom_id. But they all identify the same person (UID). Thats why we need to map all the data source ids to our own internal UID. Hence the question.

The number of columns have max bound of 20-25.
They are nullable.
The number of id's can be different per row (thats why nullable).
Datatypes are STRING for all.
UID will be primary key.

Best Answer

You can model your table as a dictionary with two columns:

UID
ID_VALUE

Both would be indexed with dedicated index.

You can then add as many ID_VALUE rows for any UID and search for an UID given ID_VALUE.

Queries would then look like

SELECT UID FROM table WHERE ID_VALUE=x;
SELECT ID_VALUE FROM table WHERE UID=xyz;

Second query would return more than one row.

EDIT

I need to lookup all the id's given any single id

As long as you have all IDs bound by a common column, in this case UID, this should be possible. Consider following query:

SELECT t2.id_value
  FROM table t1
      ,table t2
 WHERE t1.uid = t2.uid
   AND t1.id_value = x

This will return all id's given a single id X.

It really doesn't matter what are the values of ID_VALUE. They may be other UIDs pointing to this one or any other arbitrary number. If you want to search of any ID_VALUE including the UID value, you can include this UID value into the set too so you will have one row where UID = ID_VALUE.

EXAMPLE

Let's say you have following data in your model:

UID   ID1   ID2  ID3
--------------------
  1     2     3    4
  2     1     4
  3     4
  4     2     3

You can model this data in a dictionary table like so:

UID   ID_VALUE
--------------
  1          2
  1          3
  1          4
  2          1
  2          4
  3          4
  4          2
  4          3

If you want to include the UID in the list so the last query in my example would return ALL of IDs including the UID, you would then have:

UID   ID_VALUE
--------------
  1          2
  1          3
  1          4
  1          1 <---
  2          1
  2          4
  2          2 <---
  3          4
  3          3 <---
  4          2
  4          3
  4          4 <---

I've marked the rows added to satisfy this requirement. Executing the query:

SELECT t2.uid
      ,t2.id_value
  FROM table t1
      ,table t2
 WHERE t1.uid = t2.uid
   AND t1.id_value = 1

Will result in:

UID   ID_VALUE
--------------
  1          2
  1          3
  1          4
  1          1 
  2          1
  2          4
  2          2

This will return all IDs given any single id as per requirement. Note that ID_VALUE of 1 can be found in both sets with UID = 1 and UID = 2, both are returned.

This would be equivalent to executing this query on your flat design:

SELECT * FROM table WHERE UID=1 OR ID1=1 OR ID2=1 OR ID3=1

Related Solutions

PostgreSQL insert into table (not origin) based on a condition on fields on different tables

You seem to be under the impression that some kind of automatic "row number" would exist. That is not the case. Unlike rows in a spreadsheed, tables in a relational database have no natural order.

This query should do the job, but it relies on the the contents of name_1 and name_2 to make the connection. If you rely on a row number, you have to add an actual column for that.

INSERT INTO table2 (id1, id2, score1, score2, comment_string)
SELECT t1.id1, t1.id2, t1.score1, t1.score2
      ,CASE WHEN t3.name_1 IS NULL
            AND  t4.name_2 IS NULL THEN 'removed_because ...'
       ELSE END AS comment_string
FROM   table_1 t1
LEFT   JOIN table_3 t3 USING (name_1)
LEFT   JOIN table_4 t4 USING (name_2)
ORDER  BY id1; -- undeclared in Q

Based on the assumption that table_3.name_1 and table_4.name_2 are unique. Else, the query could create a "proxy cross join", possibly multiplying rows, if there are several matches.
More about this caveat in this related answer on SO.

How to store list of unknown size in database

You should create another table with three column

Start City [PK]
End City [PK]
Midway City [PK]

you have triple primary key now if you have city_start and city_finish and for example 4 cities between city_start and city_finish then you should add 4 rows to you table like this

(city_start,city_finish,midway_city_one)
(city_start,city_finish,midway_city_two)
(city_start,city_finish,midway_city_three)
(city_start,city_finish,midway_city_four)

now you know that from city_start to city_finish you will visit midway_city_one

Best Answer

Related Solutions

PostgreSQL insert into table (not origin) based on a condition on fields on different tables

How to store list of unknown size in database

Related Question