Since you are using mongoimport
to do these upserts without --upsertFields
, it will be using the _id index to do the upsert (see the note in the docs). That means that it will be scanning that index, and if you are including the _id
field in the json dump that should be fine in terms of that particular search as long as that index is in memory. If you are not including the _id
, then that will mean a full scan.
You can alter this behavior by using the aforementioned upsertFields
option, though you will want to make sure those fields you pick are indexed (a compound index of the fields used rather than _id).
The overhead from the other indexes is actually probably just the overhead that occurs when you are updating several indexes as part of a data load.
Finally, if you are doing this repeatedly and intend to keep going, I would recommend creating your own tool to do this and having more control over the entire process. While mongoimport
is a fine choice for simple tasks, more complex imports are better off handled by a more complex and customizable tool.
Your problem is that when you add a new column to the_person
, its child, the_person_two
will have this field appended at the end of columns list (4th position), so after has_default
column. See:
db=> \d temp_person
Column | Type | Modifiers
-----------+-------------------+-----------------------------------------------------------------
person_id | integer | not null default nextval('temp_person_person_id_seq'::regclass)
name | character varying |
foo | text |
db=> \d temp_person_two
Column | Type | Modifiers
-------------+----------------------+-----------------------------------------------------------------
person_id | integer | not null default nextval('temp_person_person_id_seq'::regclass)
name | character varying |
has_default | character varying(4) | not null default 'en'::character varying
foo | text |
So, when you execute this:
INSERT INTO temp_person_two VALUES ( NEW.* );
PostgreSQL will actually understand that you want to insert on the first three columns of temp_person_two
(as NEW.*
will expand to three values), generating something similar to this:
INSERT INTO temp_person_two(person_id,name,has_default)
VALUES ( NEW.person_id, NEW.name, NEW.foo );
So, temp_person_two.has_default
will get the value of NEW.foo
, which is NULL
in your case.
The solution is to simply expand the column names:
INSERT INTO temp_person_two(person_id,name,foo)
VALUES ( NEW.person_id, NEW.name, NEW.foo );
or, you could also use this:
INSERT INTO temp_person_two(person_id,name,foo)
VALUES ( NEW.* );
But this is weak, as any changes on column positions may break your statements, so I'd recommend the first one.
EDIT:
So the conclusion and the lesson learned here is:
Always explicitly type the names of the columns and the values when issuing an INSERT command, in fact, when issuing any SQL command at all... =D
This will save you a lot of time solving problems like that in future.
Best Answer
Prepare the values to insert with a recursive CTE that pre-calculates the
id
s and defines an order. Then insert it in that order:The beauty of the solution is that this uses the sequence behind
a.id
(a_id_seq
in the example), so the sequence automatically has the correct value after we are done.This solution assumes that the data in
a_source
are correct, i.e., do not contain cycles.