Don't miss your chance to take the Fabric Data Engineer (DP-700) exam on us!
Learn moreWe've captured the moments from FabCon & SQLCon that everyone is talking about, and we are bringing them to the community, live and on-demand. Starts on April 14th. Register now
I'll post the simple version of my problem first, then follow with the full explanation in case anyone cares and/or has suggestions to help me zoom out and conceptualize the whole thing differently..
====================================
Simple version:
I have a table with mixed records, lists, and values and need to build a table summarizing what data types are contained in each column.
In other words, I want to transform this:
| Column1 | Column2 | Column3 | Column4 |
| [List] | [Record] | text | null |
| null | [List] | [List] | [Record] |
| [List] | [List] | text | [Record] |
| ... | ... | ... | ... |
Into this:
| Column1 | Column2 | Column3 | Column4 |
| {"L"} | {"R","L"} | {"L","V"} | {"R"} |
or this:
| ColumnName | Types |
Column1 | {"L"} |
| Column2 | {"R","L"} |
| Column3 | {"L","V"} |
| Column4 | {"R"} |
| ... | ... |
The following code builds the list for each individual column, but I'm having trouble changing it into a function that iterates over the whole table:
//Get a list containing all of the values in the column
ColumnContents = Table.Column(TableToExpand, ColumnName),
//Iterate over each value and create a list of unique types: record = "R", list = "L", blank or null = {}, everything else = "V"
ColumnTypes = List.Distinct(List.Combine(List.Transform(ColumnContents, each
if _ is record then {"R"}
else if Value.Is(_ , type list) then {"L"}
else if _ = null or _ = "" then {}
else {"V"} ))),
Any suggestions?
====================================================================
Full version with context:
I'm importing a large dataset via API. Any given column might have mixed data types that need to be separated out. I'm not concerned about mixed numeric/text/logical/etc., but records/tables/lists/values must be placed into their own columns so I can eventually work with their contents.
I'm using a pair of functions (posted below) to (1) split columns with mixed data types into multiple columns containing a single type each, (2) expand records in the resultant table into new columns, and (3) expand lists to new rows.
The end result converts this:
| Main Index | Col1 | Col2 | Col3 | Col4 |
| 1 | [List] | null | text3.1 | null |
| 2 | [List] | [Record] | text3.2 | [Record] |
| 3 | null | [List] | [Record] | [Record] |
| 4 | [List] | [List] | text3.4 | [Record] |
| 5 | null | [Record] | [Record] | text4.5 |
| 6 | ... | ... | ... | ... |
into something like this:
| MI | Col1 | Col1.I% | Col2.rs.Val | Col2.rs.what | Col2.ls | Col2ls.I% | Col3.rs.flds | Col3.rs.split | Col3.vs | Col4.rs.yep | Col4.rs.this | Col4.rs.entry | Col4.vs |
| 1 | C1_1.1 | 1 | null | null | null | 1 | null | null | text3.1 | null | null | null | null |
| 1 | C1_1.2 | 1 | null | null | null | 2 | null | null | null | null | null | null | null |
| 2 | C1_2.1 | 2 | C2.Val2 | C2.what2 | null | 3 | null | null | text3.2 | C4.yep.2 | C4.this.2 | C4.entry.2 | null |
| 2 | C1_2.2 | 2 | null | null | null | 4 | C3.flds.2 | C3.split.2 | null | null | null | null | null |
| 2 | C1_2.3 | 2 | null | null | null | 5 | null | null | null | null | null | null | null |
| 3 | null | 2 | null | null | C2_3.1 | 6 | null | null | null | C4.yep.3 | null | C4.entry.3 | null |
| 3 | null | 3 | null | null | C2_3.2 | 6 | C3.flds.3 | C3.split.5 | null | null | null | null | null |
| 4 | C1_4.1 | 4 | null | null | C2_4.1 | 7 | null | null | text3.44 | null | C4.this.4 | C4.entry.4 | null |
| 4 | null | 4 | null | null | C2_4.2 | 7 | null | null | null | null | null | null | null |
| 4 | C1_4.2 | 4 | null | null | null | 8 | null | null | null | null | null | null | null |
| 5 | null | 5 | C2.Val.5 | null | null | 9 | C3.flds.5 | C3.split.5 | null | null | null | null | text4.5 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
All fine and good so far. I wind up with a table with all lists expanded to new rows, records expanded to new columns, and static values separated in their own columns. Except now some of those new columns have mixed lists/records/values of their own (i.e., the red & green values in the above table can be records, lists, or static values) and I have to run the function again over the new table, and again on the resultant table, and again, and again, and again until there's nothing but values left.
As you'd expect, it gets crazy slow looping through a large dataset over and over like that. I'm currently splitting the static values off into their own tables after each pass so it only examines columns that need it, but I have to parse for their data types multiple times for each column to do so. I'm trying to improve performance by summarizing the data types in a separate function and then only referencing the remaining columns on the next pass. In other words, I'm trying to do the table transform illustrated in the "simplified" section above.
I've tried a couple of iterations of List.Distinct(List.Combine(List.Transform(..., List.Generate(..., etc., but I'm having trouble getting the parameters right (I keep winding up with "Cannot convert x to type y..." errors).
So my questions then are
__________func ExpandAllConsolidated:____________
let Source = (TableToExpand as table, optional ColName as text, optional ColNumber as number) =>
let
//If the column number is missing, make it 0
ColumnNumber =
if (ColNumber=null) then
if ColName<>null then Table.Schema(TableToExpand){[Name=ColName]}[Position]
else 0
else ColNumber,
//Supplying a ColName parameter overrides the column-finding logic
ColumnName =
if (ColName<>null) then ColName else
//Find the column name relating to the column number
Table.ColumnNames(TableToExpand){ColumnNumber},
//Get a list containing all of the values in the column
ColumnContents = Table.Column(TableToExpand, ColumnName),
//Iterate over each value and list unique types found: record = "R", list = "L", blank or null = {}, everything else = "V"
ColumnTypes = List.Distinct(List.Combine(List.Transform(ColumnContents, each
if _ is record then {"R"}
else if Value.Is(_ , type list) then {"L"}
else if _ = null or _ = "" then {}
else {"V"} ))),
//Does the column contain more than one type?
ColumnIsMixed = List.Count(ColumnTypes) > 1,
//If the column is all records then get a list of all of the columns in the (sub)table
//Then get a distinct list of all of these column names
ColumnsToExpand =
if ColumnTypes = {"R"}
then List.Distinct(List.Combine(List.Transform(ColumnContents, each
if _ is record then Record.FieldNames(_)
else {})))
else {},
//Is there anything to expand in this column?
CanExpandCurrentColumn = if ColumnsToExpand <> null then List.Count(ColumnsToExpand)>0 else false,
//If so, then determine names for new columns
NewColumnNames = if ColumnsToExpand <> null then List.Transform(ColumnsToExpand, each ColumnName & "." & _) else {},
//Check if column is all lists
ColumnsToExtract = if ColumnTypes = {"L"} then {ColumnName} else {},
//Is the current column expandable to new rows?
CanExtractCurrentColumn = if ColumnsToExtract <> null then List.Count(ColumnsToExtract)>0 else false,
//If current column will wind up being deleted, keep column number the same. Otherwise, increment by one
NextColumnNumber = if CanExpandCurrentColumn or ColumnIsMixed then ColumnNumber else ColumnNumber+1,
//Expand records to new columns
RecordXform =
if CanExpandCurrentColumn
then Table.ExpandRecordColumn(TableToExpand, ColumnName, ColumnsToExpand, NewColumnNames)
else TableToExpand,
//Expand lists to new rows
ListXform =
if CanExtractCurrentColumn then
let //Add index column to help recombine list entries at a later step
ListXformIndex = Table.AddIndexColumn(TableToExpand, ColumnName & ".I%", 1, 1, Int64.Type),
//Expand lists to new rows
ListXformExpand = Table.ExpandListColumn(ListXformIndex, ColumnName)
in ListXformExpand
else RecordXform,
// Split mixed columns into new columns dedicated to each data type
MixedXform = if ColumnIsMixed then SplitOne(TableToExpand,ColumnName,ColumnTypes) else ListXform, //func SplitOne found below
//Manipulate column based on type
TableExpansion =
if ColumnIsMixed then MixedXform
else try
if ColumnTypes = {"R"} then RecordXform
else if ColumnTypes = {"L"} then ListXform
else TableToExpand
otherwise TableToExpand,
//Iterate over each column
OutputTable =
try
if NextColumnNumber>(Table.ColumnCount(TableExpansion)-1) then TableExpansion //Stop if we've reached the end of the table
else ExpandAllConsolidated(TableExpansion, null, NextColumnNumber) //"ExpandAllConsolidated" = this function
otherwise TableExpansion,
//Clean up empty cells
OutputTableFixed = Table.ReplaceValue( OutputTable,"",null,Replacer.ReplaceValue,Table.ColumnNames(OutputTable))
in OutputTableFixed
in Source
__________func SplitOne:____________
let
Source = (TableToExpand as table, ColumnName as text, ColumnTypes as list) =>
let
//split records to new column
mixedColRec =
try
if List.Contains(ColumnTypes,"R") then Table.AddColumn(TableToExpand, Text.Combine({ColumnName,".rs"}), each
if Record.Field(_, ColumnName) = null or Record.Field(_, ColumnName) = "" then null
else if Value.Is(Record.Field(_, ColumnName), type record) then Record.Field(_, ColumnName)
else null )
else TableToExpand
otherwise TableToExpand,
//split lists to new column
mixedColList =
if List.Contains(ColumnTypes,"L")
then Table.AddColumn(TableToExpand, Text.Combine({ColumnName,".ls"},""), each
if Record.Field(_, ColumnName) = null or Record.Field(_, ColumnName) = ""
then null
else if Value.Is(Record.Field(_, ColumnName), type list)
then Record.Field(_,ColumnName)
else null)
else mixedColRec,
//split values to new columns
mixedColVal =
if List.Contains(ColumnTypes,"V") then Table.AddColumn(mixedColList, Text.Combine({ColumnName,".vs"},""), each
if Record.Field(_, ColumnName) = null or Record.Field(_, ColumnName) = "" then null
else if not Value.Is(Record.Field(_, ColumnName), type list) and not Value.Is(Record.Field(_, ColumnName), type record) then Record.Field(_,ColumnName)
else null)
else mixedColList,
//delete otriginal column
mixedColDel = Table.RemoveColumns(mixedColVal,ColumnName)
in mixedColDel
in Source
Following up re: a potentially more efficient approach, here's a general outline of what I was thinking (without the proper grammar, obviously):
let Source = (TableToExpand as table) =>
let
ColumnList = Table.ColumnNames(TableToExpand),
ColumnContents = Table.Column(TableToExpand, each ColumnList),
//List types
ColumnTypesOneCol = List.Distinct(List.Combine(List.Transform(ColumnContents, each
if _ is record then {"R"}
else if Value.Is(-- , type list) then {"L"}
else if _ = null or _ = "" then {}
else {"V"} ))),
//Accumulate types
ColumnTypesAllCols = /*[[iterate ColumnTypesOneCol over the whole table - this is what the original post was about]]*/
//Get lists of all columns of each type
StaticColumns = /*[[Filter ColumnList for ColumnTypesAllCols{DataTypes} = {"V"}]]*/
RecordColumns = /*[[Filter ColumnList for ColumnTypesAllCols{DataTypes} = {"R"}]]*/
ListColumns = /*[[Filter ColumnList for ColumnTypesAllCols{DataTypes} = {"L"}]]*/
MixedColumns = /*[[Filter ColumnList for List.Count(ColumnTypesAllCols){DataTypes} > 1]]*/
//Expand all columns with only records/values
RecordXform = Table.ExpandRecordColumn(TableToExpand, /*[[_each_ RecordColumns]]*/, RecordColumns, [[NewRecordColumnNames]])
//Expand all columns with only lists
ListXform = let //Add index column to help recombine list entries at a later step
ListXformIndex = Table.AddIndexColumn(RecordXform, ColumnName & ".I%", 1, 1, Int64.Type),
//Expand lists to new rows
ListXformExpand = Table.ExpandListColumn(ListXformIndex, /*[[_each_ ListColumns]]*/ )
in RecordXform
//Split apart columns of mixed type
MixedXform = SplitOne(ListXform,/*[[_each_ MixedColumns]]*/,ColumnTypes) else ListXform
//Iterate over each column
OutputTable =
if If Table.ColumnCount(TableToExpand) = Table.ColumnCount(StaticColumns) then MixedXform //Stop if all columns query hold only static values
else ThisQuery(MixedXform) //re-run query
in OutputTable
in Source
If you have recently started exploring Fabric, we'd love to hear how it's going. Your feedback can help with product improvements.
A new Power BI DataViz World Championship is coming this June! Don't miss out on submitting your entry.
Share feedback directly with Fabric product managers, participate in targeted research studies and influence the Fabric roadmap.
| User | Count |
|---|---|
| 5 | |
| 4 | |
| 3 | |
| 2 | |
| 2 |
| User | Count |
|---|---|
| 8 | |
| 7 | |
| 6 | |
| 6 | |
| 5 |