Getting Started

Installation

Upload the jar and connection details

  1. Download the most recent version of the connector from AWS Glue connector - GitHub.

  2. Upload the connector, neo4j-aws-glue-<version>.jar file to Amazon S3 bucket.

  3. Store the connection details for Neo4j or AuraDB in AWS Secret Manager as key/value pairs for user and password.

  4. Record/copy the Secret ARN as this will be required later when using the secret from the connection.

Instead of using the naming from the environment file (neo4j_user and neo4j_password) this guide uses user and password.

Create a custom connector

In the AWS Glue console

  1. Select Create custom connector.

  2. Give the connector a name. For example Neo4j-Connector-for-AWS-Glue.

  3. For Connector S3 URL provide the link to the connector in the S3 bucket (from above).

  4. Provide the class name org.neo4j.jdbc.Neo4jDriver.

  5. Provide the following JDBC base URL jdbc:neo4j+s://<aura>?enableSQLTranslation=true to enable the JDBC driver’s built-in translator feature to translate SQL commands to Cypher®.

  6. Set the URL parameter delimiter to &.

Creating a connection

A connection can only be created after you have created the custom connector in the previous step. It is essential to use the key/value pairs of user and password provided in the AWS Secrets Manager because those properties must be used for the actual connection.

  1. Provide a connection name. For example aws-glue-connection-to-neo4j-auradb.

  2. For connection credential type, select <default>.

  3. For AWS Secret, select Secret from this account.

  4. Provide the name of the secret used to store the credentials above.

  5. Expand Additional Options and provide the following additional parameters as pairs.

Key

Property

user

${user}

password

${password}

Quickstart

The following Quickstarts demonstrate how to use the AWS Glue Connector to perform an import of a virtual data set and export a subset of the data to Parquet files on S3 using the Visual ETL Editor. The Quickstarts assume the Getting Started guide has been used to install the connector and configure a connection to the database, please change the connection name accordingly.

Create the blueprint Schema in AuraDB/Neo4j

Use Query or Browser to create a blueprint schema that describes the nodes, relationships and properties that will be imported.

CREATE (m:Movie{title:'The Matrix', released:1999})
CREATE (p:Person{name:'Keanu Reeves', born:'1964'})
CREATE (p)-[:ACTED_IN{roles: ['Neo']}]->(m)

Importing nodes into AuraDB/Neo4j

  1. In the AWS Glue console, select Visual ETL, title the job (for example "Import data from Parquet on S3 into AuraDB").

  2. Select Amazon S3 from the list of sources, and select S3 source type, S3 location and Browse to S3 to locate the files for import.

  3. Select the Parquet files containing the nodes for import first.

  4. Select Infer schema.

  5. From the list of targets select Neo4j-Connector-for-AWS-Glue.

  6. For the target properties provide the aws-glue-connection-to-neo4j-auradb.

  7. The table name should be set to the name of the labels that are to be imported, for example Movie.

  8. The schema will be displayed, if you want to drop any columns then add a Transform - DropFields step (or other suitable transformations). If this data is from another Neo4j database then you should drop the v$id.

  9. Add flows for each set of import files and table names. For nodes these may be added in parallel.

  10. Start the import job.

Importing relationships into AuraDB/Neo4j

Once the nodes have been imported you can import the relationships. You can add these as additional steps after the steps to import the nodes, for example in series OR create a new import job.

  1. In the AWS Glue console, select Visual ETL and give a name to the job (for example "Import relationship data from Parquet on S3 into AuraDB").

  2. Select Amazon S3 from the list of sources, then select S3 source typeS3 location and Browse to S3 to locate the files to import.

  3. Select the Parquet files containing the nodes and relationships to import first.

  4. Use a transformer such as SQL Query to create the needed output pattern, which in this example is`(v$person_id, roles, v$movie_id`).

  5. From the list of targets select Neo4j-Connector-for-AWS-Glue.

  6. For the target properties provide the aws-glue-connection-to-neo4j-auradb.

  7. The table name should be set to the (virtual) relationship table as Person_ACTED_IN_Movie.

  8. Start the import job.

Exporting from AuraDB/Neo4j

  1. In the AWS Glue console, select Visual ETL, title the job, for example Export data from AuraDB to Parquet on S3.

  2. Select Neo4j-Connector-for-AWS-Glue from the list of Sources.

  3. For the Target properties provide the aws-glue-connection-to-neo4j-auradb.

  4. The Table name should be set to the name of the labels that are to be imported, for example Movie.

  5. The Data preview should populate with a sample of the data.

  6. Browse the list of targets and select S3 source type, S3 location and browse to S3 location for storing the Parquet files.

  7. The schema will be displayed, if you want to drop any columns then add a Transform - DropFields steps (or other suitable transformations). If this data is from another Neo4j database then you should drop the v$id.

  8. Add flows for each set of import files and table names. For nodes these may be added in parallel.

  9. Start the import job.