I have following use case where there are bunch of IDs (GUID) which are mapped to each other. I need to lookup all the IDs given any single ID. For example, let's say my data has following columns:
uid, id1, id2, id3, id4, id5
How to support the following queries:
1. SELECT uid FROM table WHERE id1=x;
2. SELECT id1, id2, id3, id4 FROM table WHERE uid=xyz;
The number of IDs can change (I have to add new columns).
I can create indexes on all columns but that would not be very effective. Should I model this data differently?
Is there any NoSQL database which can help model this use-case?
To give more business context for clarity, we have data coming from multiple data sources which have their own custom_id. But they all identify the same person (UID). Thats why we need to map all the data source ids to our own internal UID. Hence the question.
- The number of columns have max bound of 20-25.
- They are nullable.
- The number of id's can be different per row (thats why nullable).
- Datatypes are STRING for all.
- UID will be primary key.
Best Answer
You can model your table as a dictionary with two columns:
Both would be indexed with dedicated index.
You can then add as many ID_VALUE rows for any UID and search for an UID given ID_VALUE.
Queries would then look like
Second query would return more than one row.
EDIT
As long as you have all IDs bound by a common column, in this case UID, this should be possible. Consider following query:
This will return all id's given a single id X.
It really doesn't matter what are the values of ID_VALUE. They may be other UIDs pointing to this one or any other arbitrary number. If you want to search of any ID_VALUE including the UID value, you can include this UID value into the set too so you will have one row where UID = ID_VALUE.
EXAMPLE
Let's say you have following data in your model:
You can model this data in a dictionary table like so:
If you want to include the UID in the list so the last query in my example would return ALL of IDs including the UID, you would then have:
I've marked the rows added to satisfy this requirement. Executing the query:
Will result in:
This will return all IDs given any single id as per requirement. Note that ID_VALUE of 1 can be found in both sets with UID = 1 and UID = 2, both are returned.
This would be equivalent to executing this query on your flat design: