I want to be able to keep track of the _id index values that get the MongoDB collection uses when I export the MongoDB collection (via CSV) and import the data into a PostgreSQL DB table (basically for archiving older data not frequently reported on from a MongoDB collection to a PostgreSQL DB table).
That said, if I ever wanted to add the records back into MongoDB for any reason, I want to ensure that the _id MongoDB collection values stay aligned with other collections in MongoDB and tables in PostgreSQL.
MongoDB uses a BSON object 24 character hash for its index (_id) and I want to be able to archive the data from MongoDBto a PostgreSQL DB using that _id value. It looks like it's possible to set a PostgreSQL table column datatype to character(24) and set it to the primary key as a hash using CREATE INDEX idx_hash ON 'pg_table_name_goes_here' USING HASH (_id);
rather than requiring it to be a BSON like I originally though I would have to do, but will that work in a typical PostgreSQL table JOIN? and is there a better way to do this?
To put it another way, I'm taking a fair amount of data (about 5 to 8+ Million records a month) that is originating from a MongoDB and trying to archive previous/older data into PostgreSQL. So I want to keep that archived data in PostgreSQL exactly (or as close as possible) the same, in the event I might want to move the archived PostgreSQL back into some kind of NoSQL datastore in the future. In the meantime, I want to make querying the data in PostgreSQL as fast as possible. I want identical index columns / values.
Is this possible (using PostgreSQL v12 and MongoDB v4.4)?
I've been going down some pretty deep rabbit holes and finding a lot of weird and bizarre things about PostgreSQL. I've traditionally used the MySQL family of Relational DBs and PostgreSQL has some great features but it's been a little more awkward to get the hang of than I expected. I stumbled across this video series, and it makes sense: idx and seems to fit my use case as well as line up with your above comment. FWIW I did find this: pgbson but the complexity for deployments doesn't seem worth it.
Best Answer
Values are stored in columns in a relational database, not "in an index". To store such a character (aka "string") value, create a column of type
varchar(24)
to hold the_id
value from MongoDB.If you want to ensure uniqueness, then create a unique B-Tree index on the column or defined it as the primary key.
Something along the lines:
The fact that the value is computed by a "hash algorithm" in MongoDB has nothing to do with a "hash index" in Postgres.