I'm loading the panama papers into neo4j using this tutorial,and the last step, loading the edges, has been running for 12 hours. This is the import statement.
USING PERIODIC COMMIT LOAD CSV WITH HEADERS FROM 'file:/path/all_edges.csv' AS csvLine
MATCH (n1 { id: toInt(csvLine.node_1)}),(n2 { id: toInt(csvLine.node_2)})
CREATE (n1)-[:ACCOC {role: csvLine.rel_type}]->(n2)
The edges file is 89MB. Is this normal, or how can I check on the status of this process?
Best Answer
The query itself is problematic.
It's not using labels in the match pattern, which means it will have to perform an AllNodesScan to find both
n1
andn2
. So that's happening twice per row in your CSV file. An EXPLAIN of the query would have showed you this in the query plan.You need to add labels into these match patterns for
n1
andn2
, which would at least get you to two NodeByLabelScans per row, but to make this performant you need to also have an index on the relevant label for theid
property.