Azure Cosmos DB is Microsoft’s globally distributed, horizontally partitioned, multi-model database service. The terms “Multi-model database service” mean that you can seamlessly access the content of your database from different APIs. One of these API is named SQL and is a kind of mix between JSON and SQL. If you’re used to ANSI-SQL, it’s probably the easiest API to start with.
This SQL API supports the keyword COUNT but it’s not necessary the keyword that will be the more useful when you want to know how many elements are available in an array.
Before going further, to illustrate this article, we’ll need to setup a database and a collection (named families), you can achieve this with the UI of the portal. Then load a few items in your collection. To achieve this task, click on “New document” on the tab documents of your collection, paste the json corresponding to the Andersen family from this page, click on “Save”. Then click again on “New document” and paste the content of the json related to the Wakefield family from the same page. Don’t forget to click on “Save”. Your two families are now part of the database’s content and we’ll be able to query them.
The first example is about how counting the children by family. In ANSI-SQL, the approach would probably be to select the familyId then perform a count distinct child grouping by familyId. With the SQL API of CosmosDB, you’ll need to take another approach and use the keyword
ARRAY_LENGTH. This keyword returns the number of elements of the specified array expression. In our case the array expression is just the array containing the children of a family. Our select will become:
SELECT f.id AS FamilyId, ARRAY_LENGTH(f.children) AS ChildrenCount FROM families f
The result is effectively a list of
We can do something a bit more complex and check how many pets have each child. To succeed we’ll to start from the
children level and then take the length of the sub-array
pets. This can be done, using the keywords
IN to force an iteration on existing arrays. In this case we set the set the granularity of the query to be a child and for each of them we’re retrieving her familyId, givenName and count the pets.
SELECT f.id as FamilyId, c.givenName, ARRAY_LENGTH(c.pets) as PetsCount FROM families f JOIN c IN f.children
If you want to improve the rendering of the result, you can use the coalesce operator
??. If a value is not returned by the query then the result can be replaced by another value. In this case, if the givenName is not returned we’ll use the firstName and if the array of pets is not existing then we’ll set the count of pets to zero.
SELECT f.id as FamilyId, c.givenName ?? c.firstName, ARRAY_LENGTH(c.pets) ?? 0 as PetsCount FROM families f JOIN c IN f.children
Last question to answer, how many pets have we got in our database? To answer this kind of question, you’ll need to perform an aggregation. The idea will be to know how many pets do each child have and then sum the result that we obtained for each child. Translated in SQL:
SELECT SUM(ARRAY_LENGTH(c.pets)) as PetsCount FROM families f JOIN c IN f.children
Another way to achieve the same result is to change the granularity of the query and start from the pets. In this case, we’ll need to count distinct pets and we can finally use the keyword
SELECT COUNT(p) as PetsCount FROM families f JOIN c IN f.children JOIN p IN c.pets