Get certified in Microsoft Fabric—for free! For a limited time, the Microsoft Fabric Community team will be offering free DP-600 exam vouchers. Prepare now
I have a single csv dataset that looks like below. I have multiple csv files like this which i want to combine once i figure out the correct transformation for a single file. Here is the Sample Excel File containing a single csv data :
into this expected Output :
The Input data has broadly 4 sections :
1] The 1st 3 columns are used to create `Questions`, `Answers` & `Label` columns.
2] a section of Percentage columns
3] a section of Stat testing columns
4] a section of 1 columns
I want to `Merge` each sets of columns `R1...R10` and then `Unpivot` (`Stack`) them one above the other as shown in the output. The `R1...R10` columns count may vary each time, but they will be the same for each section i.e. for `Percentage columns`, `Stat testing columns` & `1 columns`.
While googling, i am came across an Imke Feldmann post on Merging and unpivoting columns dynamically using a custom function. However, i am unable to select and merge the different sections of columns dynamically. Also, i found Bill Szysz's post which is somewhat similar and trying to figure out a way to modify his 2nd solution i.e. Combine Method on a similar post. Here is Bill Szysz's Combine Method solution.
let
Source = Table.PromoteHeaders(Excel.CurrentWorkbook(){[Name="UglyData"]}[Content], [PromoteAllScalars=true]),
Lists = Table.FromColumns({List.Transform({0..((List.Count(Table.ColumnNames(Source))-6)/5)-1}, each List.Range(Table.ColumnNames(Source), _*5+6, 5)) }),
AlmostReady = Table.Combine(Table.AddColumn(Lists, "Columns", each Table.FromColumns({Table.Column(Source, Table.ColumnNames(Source){0})} & {List.Repeat({[Column1]{0}}, Table.RowCount(Source))} & List.Transform(List.Skip([Column1], 1), each Table.Column(Source, _)), {Table.ColumnNames(Source){0}, "Name"} & List.Transform(List.Skip([Column1], 1), each Text.BeforeDelimiter(_, "_")) ) )[Columns]),
Ready = Table.AddColumn(AlmostReady, "Total", each List.Sum(List.Skip(Record.ToList(_), 2)))
in
Ready
The problem is, Bill has used fixed column and row counts ( 5 & 6) whereas my columns and rows count could change. for e.g. today there are 10 sets of columns per section, tomorrow there may be 5 or 20. But the columns count will always be the same across each section i.e. for Percentage section, Stats section and Number section. Also, the sequence of the Columns may change but maintain the same sequence across the sections:
e.g:
Percentage section | | Stats section | | Numbers section |
R1 | R3 | R2 | R5 | R4 | | R1 | R3 | R2 | R5 | R4 | | R1 | R3 | R2 | R5 | R4 |
This data is tricky and beyond my amateur skills. Is this possible to do?
@ImkeF, @AlB, Jimmy801, @shaowu459
Solved! Go to Solution.
according to discussion, final code is:
let
Source = Table.PromoteHeaders(Csv.Document(File.Contents("C:\Users\Clark\Desktop\SampleCSV.csv"),[Delimiter=",", Columns=36, Encoding=1252, QuoteStyle=QuoteStyle.None])),
title = Table.ColumnNames(Source),
count = List.Count(List.Select(title,each Text.StartsWith(_,title{5}))),
num = (List.Count(title)-3)/count,
Group = Table.SelectRows(
Table.Combine(
Table.Group(
Source,
"Que",
{"n",each
let
header={"Que","Question","Answer","Label","Agg"}&List.Range(title,4,num-1),
a=Table.ToRows(_),
b=List.FirstN(a{0},2)&{a{1}{1}},
c=List.TransformMany(
List.Skip(a,2),
each List.Split(List.Skip(_,3),num),
(m,n)=>b&List.Range(m,1,2)&List.Skip(n)) in #table(header,c)
},
0,
(x,y)=>Byte.From(Text.Length(y)>0)
)[n]),
each Text.Length([Label])>0
)
in
Group
Morning, please try and check this one.
let
Source = Excel.CurrentWorkbook(){[Name="Table2"]}[Content],
Group = Table.SelectRows(Table.Combine(Table.Group(Source,"Que",{"n",each let header={"Que","Question","Answer","Label","Agg"}&List.Range(Table.ColumnNames(Source),4,10),a=Table.ToRows(_), b=List.FirstN(a{0},2)&{a{1}{1}}, c=List.TransformMany(List.Skip(a,2),each List.Split(List.Skip(_,3),11),(m,n)=>b&List.Range(m,1,2)&List.Skip(n)) in #table(header,c)},0,(x,y)=>Byte.From(y<>null))[n]),each [Label]<>null)
in
Group
@shaowu459 Thanks for your reply.
I am getting this error:
i then tried promoting the headers before the grouping, but it showed all rows empty, filtered on Label.
This is some brilliant piece of code! However, as i try to follow along, could you confirm, if you are hardcoding the 1st set of columns? e.g. here :
.... & List.Range(Table.ColumnNames(Source),4,10), ....
The first part of column names is hard coded {"Que","Question","Answer","Label","Agg"}, according to your description, i think this part will not change.
When i import the csv file, the Source looks like this :
How about change Source in my code to Table.PromoteHeaders(Source)?
Group = Table.SelectRows(Table.Combine(Table.Group(Table.PromoteHeaders(Source),"Que",{"n",each let header={"Que","Question","Answer","Label","Agg"}&List.Range(Table.ColumnNames(Table.PromoteHeaders(Source)),4,10),a=Table.ToRows(_), b=List.FirstN(a{0},2)&{a{1}{1}}, c=List.TransformMany(List.Skip(a,2),each List.Split(List.Skip(_,3),11),(m,n)=>b&List.Range(m,1,2)&List.Skip(n)) in #table(header,c)},0,(x,y)=>Byte.From(y<>null))[n]),each [Label]<>null)
This is how i get it (same as before) after making your suggested change :
according to discussion, final code is:
let
Source = Table.PromoteHeaders(Csv.Document(File.Contents("C:\Users\Clark\Desktop\SampleCSV.csv"),[Delimiter=",", Columns=36, Encoding=1252, QuoteStyle=QuoteStyle.None])),
title = Table.ColumnNames(Source),
count = List.Count(List.Select(title,each Text.StartsWith(_,title{5}))),
num = (List.Count(title)-3)/count,
Group = Table.SelectRows(
Table.Combine(
Table.Group(
Source,
"Que",
{"n",each
let
header={"Que","Question","Answer","Label","Agg"}&List.Range(title,4,num-1),
a=Table.ToRows(_),
b=List.FirstN(a{0},2)&{a{1}{1}},
c=List.TransformMany(
List.Skip(a,2),
each List.Split(List.Skip(_,3),num),
(m,n)=>b&List.Range(m,1,2)&List.Skip(n)) in #table(header,c)
},
0,
(x,y)=>Byte.From(Text.Length(y)>0)
)[n]),
each Text.Length([Label])>0
)
in
Group
Genius! 🙂
Very clean and elegant code. Thank you very much.
I grouped table by column "Que", the error message you get says there is no column named "Que", please check your database, is the first column named "Que"?
Hi @Anonymous
If I understand correctly, you already have a solution that works only partially, with some issues. If so, can you post it here so that we can take it as base and try to fix/improve it?
Please mark the question solved when done and consider giving a thumbs up if posts are helpful.
Contact me privately for support with any larger-scale BI needs, tutoring, etc.
Cheers
@AlB, i dont have a partially created solution though i was trying to use and modify @ImkeF custom function and Bill Szysz's 2nd solution i.e. `Combine Method` on a similar post. Here is Bill Szysz solution.
let
Source = Table.PromoteHeaders(Excel.CurrentWorkbook(){[Name="UglyData"]}[Content], [PromoteAllScalars=true]),
Lists = Table.FromColumns({List.Transform({0..((List.Count(Table.ColumnNames(Source))-6)/5)-1}, each List.Range(Table.ColumnNames(Source), _*5+6, 5)) }),
AlmostReady = Table.Combine(Table.AddColumn(Lists, "Columns", each Table.FromColumns({Table.Column(Source, Table.ColumnNames(Source){0})} & {List.Repeat({[Column1]{0}}, Table.RowCount(Source))} & List.Transform(List.Skip([Column1], 1), each Table.Column(Source, _)), {Table.ColumnNames(Source){0}, "Name"} & List.Transform(List.Skip([Column1], 1), each Text.BeforeDelimiter(_, "_")) ) )[Columns]),
Ready = Table.AddColumn(AlmostReady, "Total", each List.Sum(List.Skip(Record.ToList(_), 2)))
in
Ready
The problem is, he uses fixed column and row counts (5 & 6) whereas my columns and rows count could change. for e.g. today there are 10 sets of columns per section, tomorrow there may be 5 or 20. But the columns count will always be the same for each Percentage section, Stats section and Number section. Also, the sequence of the Columns may change but will be same across the sections:
e.g:
Percentage section | | Stats section | | Numbers section |
R1 | R3 | R2 | R5 | R4 | | R1 | R3 | R2 | R5 | R4 | | R1 | R3 | R2 | R5 | R4 |
Hi @Anonymous ,
you can make it dynamic with this solution:
let
fnStackDynamic =
(Partition as table, NoOfColumns as number) =>
let
#"Unpivoted Other Columns1" = Table.UnpivotOtherColumns(Partition, {"Que", "Que Ans Lbl", "Agg"}, "Attribute", "Value"),
#"Filtered Rows2" = Table.SelectRows(#"Unpivoted Other Columns1", each not Text.StartsWith([Attribute], "Column")),
#"Added Index" = Table.AddIndexColumn(#"Filtered Rows2", "Index2", 0, 1, Int64.Type),
#"Inserted Integer-Division" = Table.AddColumn(#"Added Index", "RowIndex", each Number.IntegerDivide([Index2], (NoOfColumns-6)/3), Int64.Type),
#"Extracted Text Before Delimiter" = Table.TransformColumns(#"Inserted Integer-Division", {{"Attribute", each Text.BeforeDelimiter(_, "_"), type text}}),
#"Removed Columns" = Table.RemoveColumns(#"Extracted Text Before Delimiter",{"Index2"}),
#"Pivoted Column" = Table.Pivot(#"Removed Columns", List.Distinct(#"Removed Columns"[Attribute]), "Attribute", "Value")
in
#"Pivoted Column",
Source = Excel.CurrentWorkbook(){[Name="Table2"]}[Content],
#"Filled Down" = Table.FillDown(Source,{"Column1"}),
#"Promoted Headers" = Table.PromoteHeaders(#"Filled Down", [PromoteAllScalars=true]),
ColumnNames = Table.ColumnNames(#"Promoted Headers"),
NoOfColumns = List.Count(ColumnNames),
#"Replaced Value" = Table.ReplaceValue(#"Promoted Headers", null,"",Replacer.ReplaceValue, ColumnNames),
#"Filtered Rows1" = Table.SelectRows(#"Replaced Value", each Text.StartsWith([Que Ans Lbl], "Label")),
#"Grouped Rows" = Table.Group(#"Filtered Rows1", {"Que"}, {{"Partition", each _}}, GroupKind.Local),
#"Added Custom" = Table.AddColumn(#"Grouped Rows", "Custom", each fnStackDynamic([Partition], NoOfColumns)),
ConvertedRows = Table.RemoveColumns(#"Added Custom",{"Partition"}),
Custom1 = #"Promoted Headers",
#"Grouped Rows1" = Table.Group(Custom1, {"Que"}, {{"First2Rows", each [Question=[Que Ans Lbl]{0}, Answer=[Que Ans Lbl]{1}]}}),
#"Expanded First2Rows1" = Table.ExpandRecordColumn(#"Grouped Rows1", "First2Rows", {"Question", "Answer"}, {"Question", "Answer"}),
#"Merged Queries" = Table.NestedJoin(#"Expanded First2Rows1", {"Que"}, ConvertedRows, {"Que"}, "Custom", JoinKind.LeftOuter),
#"Expanded Custom1" = Table.ExpandTableColumn(#"Merged Queries", "Custom", {"Custom"}, {"Custom"}),
#"Expanded Custom" = Table.ExpandTableColumn(#"Expanded Custom1", "Custom", List.Range(ColumnNames, 1,2) & List.Range(List.RemoveFirstN(ColumnNames,3), 1, (NoOfColumns -3) / 3-1 ))
in
#"Expanded Custom"
Please also see file attached.
Imke Feldmann (The BIccountant)
If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!
How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries
Hi Imke,
Thanks for your reply.
I am getting following error:
It seems to come from this step where the custom function gets applied:
Check out the October 2024 Power BI update to learn about new features.
Learn from experts, get hands-on experience, and win awesome prizes.
User | Count |
---|---|
69 | |
61 | |
18 | |
16 | |
13 |