Basic traversing of a graph with the Gremlin language

In previous blog posts, I explained the use-case, how to load data into an Azure Cosmos DB graph database with the bulk executor library, how to write your first queries with the Gremlin language and how to Manage vertices’ properties This blog post is dedicated to explain how to jump from a vertex to an adjacent vertex using the existing edges.

You’ll have to learn a few additional operations to traverse a graph. The first set of operations are inE and outE. These two operations let you select all the edges respectivelly ending and starting from the selected vertex. The example here under show the result (in green) of the operation outE for a given vertex.

The following query is returning all the edges having the label participates and starting from the vertex representing a given karateka.

g.V().has("karateka", "fullName", "Charlier, Alice").outE("participates")

The result of this query is a list of edges. From this result we can see that the starting node is always the given karateka and that each edge is linking to a bout.

[
{
"id": "4a33a619-593a-469e-88ff-983e951149ed",
"label": "participates",
"type": "edge",
"inVLabel": "bout",
"outVLabel": "karateka",
"inV": "bout.120",
"outV": "karateka.70",
"properties": {
"color": "aka"
}
},
{
"id": "517e4b17-95ea-4ad0-a74c-45c083587a46",
"label": "participates",
"type": "edge",
"inVLabel": "bout",
"outVLabel": "karateka",
"inV": "bout.116",
"outV": "karateka.70",
"properties": {
"color": "aka"
}
},
… (many more ojects)

The same kind of operations exist for selecting vertices being the end or the start of a selected edge. These functions are named outV and inV.

The following query is starting from a karateka, then jumping the the edges having the label participates and then jumping to all the vertices being the end of the previously selected edges.

g.V()
.has("karateka", "fullName", "Charlier, Alice")
.outE("participates")
.inV()

The result of this query is a list of bouts (based on our previously defined graph schema).

[
{
"id": "bout.120",
"label": "bout",
"type": "vertex",
"properties": {
"pk": [
{
"id": "bout.120|pk",
"value": 1
}
]
}
},
{
"id": "bout.116",
"label": "bout",
"type": "vertex",
"properties": {
"pk": [
{
"id": "bout.116|pk",
"value": 1
}
]
}
},

Most of the time, you don’t really want to select the edges. They are just a some means to go from one vertex to any adjacent vertex. for convenience, Gremlin is supporting two operations in and out. They are the equivalent of respectively outE followed by inV and inE followed by outV.

The following drawing explains that starting from the black vertex and using a in operation, you’ll directly select the three adjacent vertices.

A good usage of the traversing of a graph will be to know the names of all the karateka having defeated a given karateka. To write this query we’ll first select the initial karateka, then going to all the bouts where the edge is labelled loses and corresponding to all the bouts where the karateka has been defeated. Having this list of bouts we just need to follow the edges labelled as wins to know the name of the winner.

The following query applies this pattern:

g.V()
.has("karateka", "fullName", "Charlier, Alice")
.out("loses")
.in("wins")

The result of this query is a list of karatekas. If we want to just display their names, we can add a values operations.

g.V()
.has("karateka", "fullName", "Charlier, Alice")
.out("loses")
.in("wins")
.values("fullName")

The result will be a list of strings (results have been anonymized for GDPR reasons):

[
"B****, Rania",
"H****, Maëlie",
"P****, Denisa"
]

We can go a bit further and check if these karatekas have already been defeated by someone or not. To achieve this, I’ll apply the exact same pattern and use the edges loses and wins from the second karateka.

g.V()
.has("karateka", "fullName", "Charlier, Alice")
.out("loses")
.in("wins")
.out("loses")
.in("wins")
.values("fullName")

The result is also a list of string:

[
"D****, Louise",
"F****, Julia",
"F****, Julia",
"S****, Kiara",
"K****, Lee-lou",
"Z****, Manon",
"E****, Lena",
"A****, Ezdina",
"B****, Margaux",
"B****, Zineb",
"B****, Rania",
"D****, Louise"
]

As you can see the first and last names but also the second and third are identical. The reason is that these two karatekas have defeated twice one of the three karatekas listed above (or once two of them). That’s really important to understand that Gremlin doesn’t automatically deduplicate the vertices. If you want to achieve this, just use the function dedup

g.V()
.has("karateka", "fullName", "Charlier, Alice")
.out("loses")
.in("wins")
.out("loses")
.in("wins")
.dedup()
.values("fullName")

Note that I applied the function dedup to the vertices and not to the property fullName. The reason is to avoid to consider as duplicates two karatekas that are just homonyms.

The next blog post will be about the step modulator … and how to return a result where the selected vertices are not the last vertices traversed by the query!

Advertisements

One comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s