Neo4j taking 12 hours to load

neo4j

I'm loading the panama papers into neo4j using this tutorial,and the last step, loading the edges, has been running for 12 hours. This is the import statement.

USING PERIODIC COMMIT LOAD CSV WITH HEADERS FROM 'file:/path/all_edges.csv' AS csvLine
MATCH (n1 { id: toInt(csvLine.node_1)}),(n2 { id: toInt(csvLine.node_2)})
CREATE (n1)-[:ACCOC {role: csvLine.rel_type}]->(n2)

The edges file is 89MB. Is this normal, or how can I check on the status of this process?

Best Answer

The query itself is problematic.

It's not using labels in the match pattern, which means it will have to perform an AllNodesScan to find both n1 and n2. So that's happening twice per row in your CSV file. An EXPLAIN of the query would have showed you this in the query plan.

You need to add labels into these match patterns for n1 and n2, which would at least get you to two NodeByLabelScans per row, but to make this performant you need to also have an index on the relevant label for the id property.