Counting items with the SQL API on Azure Cosmos DB

Azure Cosmos DB is Microsoft’s globally distributed, horizontally partitioned, multi-model database service. The terms “Multi-model database service” mean that you can seamlessly access the content of your database from different APIs. One of these API is named SQL and is a kind of mix between JSON and SQL. If you’re used to ANSI-SQL, it’s probably the easiest API to start with.

This SQL API supports the keyword COUNT but it’s not necessary the keyword that will be the more useful when you want to know how many elements are available in an array.

Before going further, to illustrate this article, we’ll need to setup a database and a collection (named families), you can achieve this with the UI of the portal. Then load a few items in your collection. To achieve this task, click on “New document” on the tab documents of your collection, paste the json corresponding to the Andersen family from this page, click on “Save”. Then click again on “New document” and paste the content of the json related to the Wakefield family from the same page. Don’t forget to click on “Save”. Your two families are now part of the database’s content and we’ll be able to query them.

The first example is about how counting the children by family. In ANSI-SQL, the approach would probably be to select the familyId then perform a count distinct child grouping by familyId. With the SQL API of CosmosDB, you’ll need to take another approach and use the keyword ARRAY_LENGTH. This keyword returns the number of elements of the specified array expression. In our case the array expression is just the array containing the children of a family. Our select will become:

SELECT
f.id AS FamilyId,
ARRAY_LENGTH(f.children) AS ChildrenCount
FROM
families f

The result is effectively a list of FamilyId and ChidrenCountcosmosdb-count-001

We can do something a bit more complex and check how many pets have each child. To succeed we’ll to start from the children level and then take the length of the sub-array pets. This can be done, using the keywords JOIN and IN to force an iteration on existing arrays. In this case we set the set the granularity of the query to be a child and for each of them we’re retrieving her familyId, givenName and count the pets.

SELECT
f.id as FamilyId,
c.givenName,
ARRAY_LENGTH(c.pets) as PetsCount
FROM
families f
JOIN c IN f.children

If you want to improve the rendering of the result, you can use the coalesce operator ??. If a value is not returned by the query then the result can be replaced by another value. In this case, if the givenName is not returned we’ll use the firstName and if the array of pets is not existing then we’ll set the count of pets to zero.

SELECT
f.id as FamilyId,
c.givenName ?? c.firstName,
ARRAY_LENGTH(c.pets) ?? 0 as PetsCount
FROM
families f
JOIN c IN f.children

cosmosdb-count-002

Last question to answer, how many pets have we got in our database? To answer this kind of question, you’ll need to perform an aggregation. The idea will be to know how many pets do each child have and then sum the result that we obtained for each child. Translated in SQL:

SELECT
SUM(ARRAY_LENGTH(c.pets)) as PetsCount
FROM
families f
JOIN c IN f.children

cosmosdb-count-003

Another way to achieve the same result is to change the granularity of the query and start from the pets. In this case, we’ll need to count distinct pets and we can finally use the keyword COUNT.

SELECT
COUNT(p) as PetsCount
FROM
families f
JOIN c IN f.children
JOIN p IN c.pets

cosmosdb-count-003

Advertisements

One comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s