Supplies are limited. Contact info@espc.tech right away to save your spot before the conference sells out.
Get your discountScore big with last-minute savings on the final tickets to FabCon Vienna. Secure your discount
Hi,
I have a connection to a data source that is regularly updating with user entered data. One of the columns it brings in contains data in the following format:
{"Label1":"Value1","Label2":"Value2","Label3":"Value3"}
I need to transform this to the following:
Row | Label1 | Label2 | Label3 |
1 | Value1 | Value2 | Value3 |
The issue with this column is that while the format structure is the same the labels and values are variable.
So the next row could contain the following:
{"Label3":"Value3","Label1":"Value1","Label4":"Value4"}
So the data set should now look like:
Row | Label1 | Label2 | Label3 | Label4 |
1 | Value1 | Value2 | Value3 | |
2 | Value1 | Value3 | Value4 |
Using Power BI's JSON parser was working fantastically to perform this task except until it encoutered duplicate values (user entered error but there's no validation control on the source) within the field:
{"Label3":"Value3","Label3":"Value3","Label4":"Value4"}
or if that field was blank it then returns errors.
Any ideas on a better way to do the transform?
Cheers,
Andrew.
Solved! Go to Solution.
I would use Replace Values to remove the { and } and then a series of Split, Unpivot and Pivot steps to get to what you want. A Remove Duplicates step at the right point would take care of your duplicates.
Here's a demo to get you started:
let Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUaqOUfJJTErNMYxRsopRCkvMKU0FMnWgokYIUSOEqDFCFMisVYrViVYyQjILVV4Hrw0mCFETmFnGBM3CLophViwA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Key = _t, Dynamic = _t]), #"Changed Type" = Table.TransformColumnTypes(Source,{{"Key", Int64.Type}, {"Dynamic", type text}}), #"Replaced Value" = Table.ReplaceValue(#"Changed Type","{","",Replacer.ReplaceText,{"Dynamic"}), #"Replaced Value1" = Table.ReplaceValue(#"Replaced Value","}","",Replacer.ReplaceText,{"Dynamic"}), #"Split Column by Delimiter" = Table.SplitColumn(#"Replaced Value1","Dynamic",Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv),{"Dynamic.1", "Dynamic.2", "Dynamic.3"}), #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Dynamic.1", type text}, {"Dynamic.2", type text}, {"Dynamic.3", type text}}), #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Changed Type1", {"Key"}, "Attribute", "Value"), #"Split Column by Delimiter1" = Table.SplitColumn(#"Unpivoted Other Columns","Value",Splitter.SplitTextByEachDelimiter({":"}, QuoteStyle.Csv, false),{"Value.1", "Value.2"}), #"Changed Type2" = Table.TransformColumnTypes(#"Split Column by Delimiter1",{{"Value.1", type text}, {"Value.2", type text}}), #"Removed Other Columns" = Table.SelectColumns(#"Changed Type2",{"Key", "Value.1", "Value.2"}), #"Removed Duplicates" = Table.Distinct(#"Removed Other Columns"), #"Pivoted Column" = Table.Pivot(#"Removed Duplicates", List.Distinct(#"Removed Duplicates"[Value.1]), "Value.1", "Value.2") in #"Pivoted Column"
I would use Replace Values to remove the { and } and then a series of Split, Unpivot and Pivot steps to get to what you want. A Remove Duplicates step at the right point would take care of your duplicates.
Here's a demo to get you started:
let Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUaqOUfJJTErNMYxRsopRCkvMKU0FMnWgokYIUSOEqDFCFMisVYrViVYyQjILVV4Hrw0mCFETmFnGBM3CLophViwA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Key = _t, Dynamic = _t]), #"Changed Type" = Table.TransformColumnTypes(Source,{{"Key", Int64.Type}, {"Dynamic", type text}}), #"Replaced Value" = Table.ReplaceValue(#"Changed Type","{","",Replacer.ReplaceText,{"Dynamic"}), #"Replaced Value1" = Table.ReplaceValue(#"Replaced Value","}","",Replacer.ReplaceText,{"Dynamic"}), #"Split Column by Delimiter" = Table.SplitColumn(#"Replaced Value1","Dynamic",Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv),{"Dynamic.1", "Dynamic.2", "Dynamic.3"}), #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Dynamic.1", type text}, {"Dynamic.2", type text}, {"Dynamic.3", type text}}), #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Changed Type1", {"Key"}, "Attribute", "Value"), #"Split Column by Delimiter1" = Table.SplitColumn(#"Unpivoted Other Columns","Value",Splitter.SplitTextByEachDelimiter({":"}, QuoteStyle.Csv, false),{"Value.1", "Value.2"}), #"Changed Type2" = Table.TransformColumnTypes(#"Split Column by Delimiter1",{{"Value.1", type text}, {"Value.2", type text}}), #"Removed Other Columns" = Table.SelectColumns(#"Changed Type2",{"Key", "Value.1", "Value.2"}), #"Removed Duplicates" = Table.Distinct(#"Removed Other Columns"), #"Pivoted Column" = Table.Pivot(#"Removed Duplicates", List.Distinct(#"Removed Duplicates"[Value.1]), "Value.1", "Value.2") in #"Pivoted Column"
Hi Mike,
Worked some more on your solution and it got me most of the way there.
The problem I was encountering is that when I was doing the split even though it had multiple delimiting commas, for some reason Power BI would only split into two columns (don't think this is expected behaviour but couldn't find out why it was doing this).
By using this code
SplitByDelimiter = (table, column, delimiter) => let Count = List.Count(List.Select(Text.ToList(Table.Column(table, column){0}), each _ = delimiter)) + 1, Names = List.Transform(List.Numbers(1, Count), each column & "." & Text.From(_)), Types = List.Transform(Names, each {_, type text}), Split = Table.SplitColumn(table, column, Splitter.SplitTextByDelimiter(delimiter), Names), Types = Table.TransformColumnTypes(Split, Types) in Types,
it allowed the split to iterate until all comma delimits had been processed.
Thanks sincerely for the prompt response and assist.
Cheers,
Andrew
Thanks Mike.
To clarify my example from my original description I see there are three problems on transformation that I need to overcome.
1. No logical order to the data in the field but needs to be transformed under the correct columns
For example :-
Field 1 - {"Column1":"Value1","Column2":"Value2","Column3":"Value3"}
Field 2 - {"Column3":"Value3","Column1":"Value1","Column4":"Value4"}
Both Field 1 and 2 "Value3" need to be in the "Column3" column
2. Duplicate values in the fields imported
For example :-
Field 3 - {"Column3":"Value3","Column3":"Value3","Column4":"Value4"}
One "Value3" should end up in "Column3" column.
3. More data added in to the field on new refresh. Requirement to create new columns automatically.
For example :-
Field 4 - {"Column1":"Value1","Column2":"Value2","Column3":"Value3","Column4":"Value4","Column5":"Value5"}
A new "Column5" needs to be created and "Value5" appropriately located.
All these examples should create a table resulting in something like this:
Row | Column1 | Column2 | Column3 | Column4 | Column5 |
1 | Value1 | Value2 | Value3 | ||
2 | Value1 | Value3 | Value4 | ||
3 | Value3 | Value4 | |||
4 | Value1 | Value2 | Value3 | Value4 | Value5 |
The JSON convertor in Power BI handled this all brilliantly except for encountering duplicates as per example 2.