What is Graph Data Science?
Most databases use SQL to store datapoints. Graph databases, like Neo4j, focus more on the relations between those datapoints. Graph Data Science is a data science environment for Neo4j to help data scientists with one of their most time consuming tasks: data analysis, also called: Exploratory Data Analysis.
Data scientists, who are analysing datasets, are often looking for hidden relations in the data. A lot of the predictive power in data science and machine learning lies in the relationships in one or more datasets. With Graph Data Science they can now visualise these relations in Neo4j, to get insights of the data much easier.
Neo4j’s Lead Product Manager and Data Scientist Alicia Frame explains: “A common misconception in data science is that more data increases accuracy and reduces false positives. In reality, many data science models overlook the most predictive elements within data – the connections and structures that lie within. Neo4j for Graph Data Science was conceived for this purpose – to improve the predictive accuracy of machine learning, or answer previously unanswerable analytics questions, using the relationships inherent within existing data.”
GDS Plugin
The image above shows several software components of Neo4j. Graph Data Science is enabled by connecting the GDS plugin to the central Neo4j database. A next step is visualising the data and its relations in Neo4j Bloom. A data scientist can use with Graph Visuals in Neo4j Bloom to present results to his team in an easy way.
Neo4j for Graph Data Science is a set of tools with the following components:
- the Neo4j Graph Data Science™ (GDS) Library to support data scientists with their Exploratory Data Analysis.
- the Neo4j Database for data storage.
- the Neo4j Bloom visualisation tool to present the results of a data science project to a team of non-technical people.
The GDS Library is the successor to the Neo4j Graph Algorithms Library. Therefore they cannot be used in the same installation at the same time. However, the good news is, that there is a Migration Guide available on the site of Neo4j.
Cypher
Most databases use Structured Query Language or SQL. It is used to perform operations on the data in the database. SQL is a kind of a programming language, called a declarative language. Neo4j has CYPHER as a declarative language and the syntax shows a lot of similarities compared to SQL. The Cypher code, used in the old Graph Algorithms library, is not compatible with the code in the new GDS library. Therefore you cannot use both plugins at the same time.
Download
You can download the GDS plugin free of charge. This requires a working Neo4j environment.
If you are new to Neo4j you can:
- install Neo4j Desktop for free on your own computer
- create a free Neo4j Sandbox online
- deploy a self-hosted solution in the cloud, on one of the three major cloud providers: AWS, Azure or Google Cloud Platform