First steps with Gremlin to query a graph loaded in Azure Cosmos DB

In previous blog posts, I explained the use-case and also how to load data into an Azure Cosmos DB graph database with the bulk executor library. This blog post is dedicated to explain how to perform basic queries with Gremlin on this dataset.

You can use two tools to query a graph hosted by Azure Cosmos DB. The first tool is embedded in the Azure Portal and is named Data Explorer.

cosmos-data-explorer

The second tool is an extension of Visual Studio Code that is freely available and is named Azure Cosmos DB. This extension let you browse and query your MongoDB databases both locally and in the cloud using scrapbooks but also to write queries in Gremlin and display results as a Graph or as JSON documents.

cosmos-visual-code-extension

Before we go further, a little bit of theory about the Gremlin query language.

  • Each query is starting by g
  • V() stands for vertices and returns one or more vertices
  • E() stands for edges and returns one or more edges
  • hasLabel("label") filter Vertices/Edges based on label (type)
  • hasId("id") filter Vertices(/Edges) based on Id (must be unique)
  • has("propertyName", "value") filter Vertices based on value of any property

With these few elements, it’s already possible to write some interesting queries. The two first queries will respectivelly retrieve all the bouts and all the karatekas:

g.V().hasLabel("bout")
g.V().hasLabel("karateka")

If you want to retrieve a specific karateka and you know her id, you can apply the two variantes. It will return a unique result.

g.V().hasId("karateka.1") 
g.V("karateka.1")

The graph view is not really providing a value

cosmos-query-id-graph

But the JSON view, offer the opportunity to confirm that when returning a vertex, we’re also returning each of the properties!

cosmos-query-id-json

Most of the time, you don’t know the if of a vertex and you’ll need to perform a search through the graph to find it. In the first example, we’re looking for a karateka named Alice and in the second we’re looking for two karatekas.

g.V().has("karateka", "fullName", "Charlier, Alice") 
g.V().has("karateka", "fullName", 
   within(["Charlier, Alice", "Charlier, Clémence"]))

The first query is identical in terms of result to the following query:

g.V().has("fullName", "Charlier, Alice")

But the first version is more performant. Indeed, by specifying that you’re looking for a karateka, the engine will avoid to search within all the vertices that have not the label karateka. It’s a best practice to always specify this information when possible.

In the next blog posts we’ll see how to add or remove some properties to a vertex or and edge.

3 comments

Leave a comment