Some CSV import tricks in Neo4j

The CSV file import facility in Neo4J is interesting in that it allows you to run Cypher queries iteratively over your dataset. This gives us a lot of flexibility and relieves us of the need for transforming our data to a Neo4J specific format. We can export tables with for example foreign keys to other tables and reconstruct our relationships during import. The key to doing this successfully is adding nodes and relationships on demand with the MERGE clause.

There are some quite good resources on CSV import in Neo4J already. Instead of repeating it we’ll just drop the linkshere:

http://neo4j.com/docs/stable/queryn-load-csv.html
http://neo4j.com/docs/stable/query-load-csv.html
http://jexp.de/blog/2014/06/load-csv-into-neo4j-quickly-and-successfully/
http://blog.graphenedb.com/blog/2015/01/13/importing-data-into-neo4j-via-csv/
http://jexp.de/blog/2014/06/using-load-csv-to-import-git-history-into-neo4j/

To recap the main points:

Set up constraints and indexes before importing. Without them the import queries will be very slow
Use USING PERIODIC COMMIT to speed up the importing process.
Make sure your heap is big enough, especially if committing multiple queries. It is recommended that the entire dataset fits in the cache (see http://neo4j.com/docs/stable/configuration-caches.html#_file_buffer_cache).
Values are read as strings and must be converted before insertion.
You can provide defaults with the COALESCE clause.
Use MERGE and MATCH to add data duplicated across tables or in denormalized tables.

With these main points in mind we’ll look at some examples and tricks.

MERGE instead of CREATE UNIQUE

In general it seems better to use MERGE rather than CREATE UNIQUE. This last clause tend to have some surprising corner cases, while MERGE on the whole is behaves for more predictably. It is important to remember though that MERGE will create any node that in a pattern if the whole pattern doesn’t match if constraints are not violated. Often that is not what you want, and instead you need to use MATCH to find existing part of a pattern and call MERGE matching just the part of the pattern you intend to create.
In this example we don’t have uniqueness constraints on the user property and matching the in the MERGE would create duplicate :USER nodes.

USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM “file://blabla.csv” AS csvLine
MATCH (uf :USER { twitter_name: toInt(csvLine.follower_name) })
MATCH (u :USER { twitter_name: toInt(csvLine.twitter_name) })
MERGE uf -[:FOLLOWS]-> u

Skipping incomplete rows

This is rather straightforward with a WHERE keyword, but is made more complicated than it needs to because it has to be attached to clause. Most straightforward way to introduce a clause is with the WITH statement, but as WITH separates parts of the query we need to parse the whole CSV line here and bind it in the clause. Accessing the CSV line object won’t work within the clause.

USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM “file://blabla.csv” AS csvLine
WITH toInt(csvLine.twitterid) as twitterid, toInt(csvLine.seat) as seat, csvLine.firstname as firstname,
csvLine.lastname as lastname, csvLine.party as party, csvLine.region as region, csvLine.type as type
WHERE twitterid IS NOT NULL
MERGE (u :USER { seat: seat, firstname: firstname, lastname: lastname, party: party, region: region,
type: type, twitterid: twitterid })

Accumulating relationship strength

In many cases the rows in a CSV file describes a list of relationships where duplicate lines describes a recurring and stronger relationship. Here we’d rather not create multiple links and have to count the link cardinality to get the strength between two nodes, but rather accumulate the repeated relationships in a property on the relationship. This is easy using the ON MATCH and ON CREATE part of a MERGE clause.

USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM “file://blabla.csv” AS csvLine
MATCH (u :USER { twitterid: toInt(csvLine.twitterid) })
MERGE (h :HASHTAG { text: csvLine.text})
MERGE u -[t :TWEETED]-> h
ON MATCH SET t.count = t.count + 1 ON CREATE SET t.count = 1

Some CSV import tricks in Neo4j

MERGE instead of CREATE UNIQUE

Skipping incomplete rows

Accumulating relationship strength

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112