SQLAlchemy – How to Bulk Insert Subclasses

postgresqlpythonsqlalchemy

I am trying to bulk insert SQL-Alchemy Subclasses into the parent table and their respective tables ie fruits tables -> Apple Table and so I insert a table of APPLE and it will insert both the row into the fruits table then give me the id of the row in fruits table and put it Apple
This works when inserting one row at a time, but I need it to work with bulk insertion due to performance

I have tried to bulk insert which failed and I tried single row insertion it works with single row insertion but the thing is this data is not really unique except for the id of the row which is auto-generated so its going to be really hard to do a bulk insert to the parent table then do a bulk insert into the subclass table where the data matches and use the id by a mapping function

            for data in apple_list:
                db.session.add(Apple(
                    brand=data["brand"],
                    picked_date=data["picked_date"],
                    type=data["type"],
                    color=data["color"],
                    sub_type=data["sub_type"],
                ))

what I want is something more like bulk insertion

          db.session.bulk_insert_mappings(model_classes['Apple'], apple_list)

Actual results are that when it tries to insert it errors out on the insertion due to it not having the foreign primary key that tells the row for the fruits table to the apple table

Expect to insert without any errors and to populate both tables like when inserting both rows

Best Answer

I figured it out so SQLALCHEMY has a parameter in bulk insert mapping called return_defaults

WARNING: This is straight from the docs return_defaults – when True, rows that are missing values which generate defaults, namely integer primary key defaults and sequences, will be inserted one at a time, so that the primary key value is available. In particular this will allow joined-inheritance and other multi-table mappings to insert correctly without the need to provide primary key values ahead of time; however, Session.bulk_insert_mappings.return_defaults greatly reduces the performance gains of the method overall. If the rows to be inserted only refer to a single table, then there is no reason this flag should be set as the returned default information is not used.

so then all you have to do is this

db.session.bulk_insert_mappings(model_classes['Apple'], apple_list, return_defaults=True) 

it's still alot faster than db.session.add