Weird JSON structure and search for a value with Power Query/M

In a recent project, I was facing a really weird JSON structure. Each element of the JSON was more or less a dictionary with sparse keys. It means that the field names were different from an element to the other and that I was not really interested by the field name but more by the value of this field, independently of the field name.

The structure of the JSON was the following: an array, with anonymous elements having different fields.

[
	{
		"field1" : "0",
		"field2" : "Foo"
	},
	{
		"field1" : "1",
		"field3" : "2",
		"field4" : "Foo" 
	},
	{
		"field4" : "3",
		"field5" : "4",
		"field6" : "5" 
	},
	{
		"field5" : "Foo"
	}
]

From this JSON, I needed to filter out all the elements without at least one of the value of any field set to “Foo”. From the previous example, it means that the third element should have been discarded (no “Foo” value).

When facing JSON files, most of the time, we transform the list of records to a table and expand the records. Using the UI, it will generate the following code:

let
    source = Json.Document(
        File.Contents("C:\Users\cedri\Desktop\sample.json"))
    , tableFromSource = Table.FromList(
        source
        , Splitter.SplitByNothing(), null, null, ExtraValues.Error)
    , expandedFields = Table.ExpandRecordColumn(
         tableFromSource
         , "Column1"
         , {"field1", "field2", "field3", "field4"}, {"field1", "field2", "field3", "field4"})
in
    expandedFields 

The first issue that we can immediately spot, is the lack of field5 in my final table. Indeed the UI doesn’t check the fourth element and don’t propose this field during the expand operation.

I could use it by hand, but in my case the name of the fields was completely random from one record to the other and I could not enlist all of them. even if I could overcome this issue, it will still be complex to iterate over all the columns and check the available values!

We definitively need another approach to solve this. In fact, we can use the dynamic structure of the record at our advantage, no need to expand it and transform the record into a static structure.

source = Json.Document(
        File.Contents("C:\Users\cedri\Desktop\sample.json")
    )
, tableFromSource = Table.FromList(
        source
        , Splitter.SplitByNothing(), null, null, ExtraValues.Error
    )

Using the function Record.FieldValues we can extract all the values from the different fields of the record. We’ll add this of values as a new column next to the record.

, addListOfValues = Table.AddColumn(
    tableFromSource
    , "ListOfValues", each Record.FieldValues([Column1])
)

Once we’ve this list of values, it’s straightforward to check if the value “Foo” is contained in this list, we just need to use the function List.Contains.

, addContainsFoo = Table.AddColumn(
    addListOfValues
    , "isContainingFoo"
    , each List.Contains([ListOfValues], "Foo")
    , type logical
)

We just have to filter on our new column to discard all the elements without a Foo value.

The full code is available here under:

let
   source = Json.Document(File.Contents("C:\Users\cedri\Desktop\sample.json")),
   tableFromSource = Table.FromList(source, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
   addListOfValues = Table.AddColumn(tableFromSource, "ListOfValues", each Record.FieldValues([Column1])),
   addContainsFoo = Table.AddColumn(addListOfValues, "isContainingFoo", each List.Contains([ListOfValues], "Foo"), type logical),
   filteredRows = Table.SelectRows(addContainsFoo, each ([isContainingFoo] = true))
in
   filteredRows

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s