Join us at FabCon Atlanta from March 16 - 20, 2026, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.
Register now!The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now! Learn more
Hello,
I have to deal with .csv files with millions of rows.
Data are extracted from a building management software.
Data are already unpivoted, that is why there is a huge amount of row
For each row, first data is timeseries, second is the value (comma separated).
The issue concernes the "attribute". It is not added in each row.
It is added on a new line at the beginning of each set, as the minimal example below :
Attribute_1
Date_1, value_1
Date_2, value_2
Date_3, value_3
#linebreak
Attribute_2
Date_1, value_1
Date_2, value_2
Date_3, value_3
My goal is to transform this previous formatting to this :
Date_1, value_1, Attribute_1
Date_2, value_2, Attribute_1
Date_3, value_3, Attribute_1
Date_1, value_1, Attribute_2
Date_2, value_2, Attribute_2
Date_3, value_3, Attribute_2
Thanks for your help.
Solved! Go to Solution.
Hi @Atiroocky
This is how you can achieve that, so I used the sample you shared with us:
And this is what I got after a few transformations:
How did I achieve that?
1) Create a conditional column and use the following logic:
If the value is recognized as null, replace the blank value by null, if you prefer a custom column, you can use the following code
if column2 = "" then column1 else null
2)Fill Down the Attribute
Right-click the header of your new "Attribute" column.
Select Fill > Down.
all the null,values will be replaced by the attribute name from the row above, propagating all the way down until the next attribute is found.
3) exclude the linebreak lines, select one column which sould be filled all the time and excluded the null or empty values
And tadam, you have the right table
If the table is to heavy for this transformation, you can use a dataflow instead of Power BI desktop
If you need more help, do not hesistate to ask
Ps: this is the generated code
let
Source = Csv.Document(File.Contents("C:\Users\dgodfroid\Desktop\test.csv"),[Delimiter=",", Columns=2, Encoding=1252, QuoteStyle=QuoteStyle.None]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}, {"Column2", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Custom.1", each if [Column2] = "" then [Column1] else null),
#"Filled Down" = Table.FillDown(#"Added Custom",{"Custom.1"}),
#"Filtered Rows" = Table.SelectRows(#"Filled Down", each ([Column2] <> ""))
in
#"Filtered Rows"
Here's another method which relys on Grouping. It seems to operate more rapidly on your sample data, but you should test both methods against your large data set.
I started by pasting your sample data into a *.csv file.
Please read the code comments to better understand the algorithm.
let
Source = Csv.Document(File.Contents(FullPathToCSV),[Delimiter=",", Columns=2, Encoding=1252, QuoteStyle=QuoteStyle.None]),
//Remove rows with #linebreak
#"Filtered Rows" = Table.SelectRows(Source, each ([Column1] <> "#linebreak")),
//Group by Attribute, then add the Attribute Column and remove the row
// Note: Attribute is defined by the blank in column2 as it comes from a CSV file
#"Group by Attribute" = Table.Group(#"Filtered Rows","Column2",{
{"Rows",(t)=>[a=t{0}[Column1],
b=Table.AddColumn(Table.Skip(t),"Attribute", each a)][b],
//change the types according to the actual types.
//for example, Column1 might be date and Column2 might be number or similar
type table[Column1=any, Column2=any, Attribute=text]}
}, GroupKind.Local,(x,y)=>Number.From(x="" and y="")),
//Remove Grouping Column
#"Removed Columns" = Table.RemoveColumns(#"Group by Attribute",{"Column2"}),
//Expand Grouped Rows and rename
#"Expanded Rows" = Table.ExpandTableColumn(#"Removed Columns", "Rows",
{"Column1", "Column2", "Attribute"},
{"Date","Value","Attribute"})
in
#"Expanded Rows"
Source:
Results:
Please note that if you really want a comma-separated result, as you show in your question, you will need to add a Merge Column step to the above which you can easily do from the UI:
Table.CombineColumns(#"Expanded Rows",{"Date", "Value", "Attribute"},Combiner.CombineTextByDelimiter(", ", QuoteStyle.None),"Merged")
Here's another method which relys on Grouping. It seems to operate more rapidly on your sample data, but you should test both methods against your large data set.
I started by pasting your sample data into a *.csv file.
Please read the code comments to better understand the algorithm.
let
Source = Csv.Document(File.Contents(FullPathToCSV),[Delimiter=",", Columns=2, Encoding=1252, QuoteStyle=QuoteStyle.None]),
//Remove rows with #linebreak
#"Filtered Rows" = Table.SelectRows(Source, each ([Column1] <> "#linebreak")),
//Group by Attribute, then add the Attribute Column and remove the row
// Note: Attribute is defined by the blank in column2 as it comes from a CSV file
#"Group by Attribute" = Table.Group(#"Filtered Rows","Column2",{
{"Rows",(t)=>[a=t{0}[Column1],
b=Table.AddColumn(Table.Skip(t),"Attribute", each a)][b],
//change the types according to the actual types.
//for example, Column1 might be date and Column2 might be number or similar
type table[Column1=any, Column2=any, Attribute=text]}
}, GroupKind.Local,(x,y)=>Number.From(x="" and y="")),
//Remove Grouping Column
#"Removed Columns" = Table.RemoveColumns(#"Group by Attribute",{"Column2"}),
//Expand Grouped Rows and rename
#"Expanded Rows" = Table.ExpandTableColumn(#"Removed Columns", "Rows",
{"Column1", "Column2", "Attribute"},
{"Date","Value","Attribute"})
in
#"Expanded Rows"
Source:
Results:
Please note that if you really want a comma-separated result, as you show in your question, you will need to add a Merge Column step to the above which you can easily do from the UI:
Table.CombineColumns(#"Expanded Rows",{"Date", "Value", "Attribute"},Combiner.CombineTextByDelimiter(", ", QuoteStyle.None),"Merged")
Hi @Atiroocky
This is how you can achieve that, so I used the sample you shared with us:
And this is what I got after a few transformations:
How did I achieve that?
1) Create a conditional column and use the following logic:
If the value is recognized as null, replace the blank value by null, if you prefer a custom column, you can use the following code
if column2 = "" then column1 else null
2)Fill Down the Attribute
Right-click the header of your new "Attribute" column.
Select Fill > Down.
all the null,values will be replaced by the attribute name from the row above, propagating all the way down until the next attribute is found.
3) exclude the linebreak lines, select one column which sould be filled all the time and excluded the null or empty values
And tadam, you have the right table
If the table is to heavy for this transformation, you can use a dataflow instead of Power BI desktop
If you need more help, do not hesistate to ask
Ps: this is the generated code
let
Source = Csv.Document(File.Contents("C:\Users\dgodfroid\Desktop\test.csv"),[Delimiter=",", Columns=2, Encoding=1252, QuoteStyle=QuoteStyle.None]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}, {"Column2", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Custom.1", each if [Column2] = "" then [Column1] else null),
#"Filled Down" = Table.FillDown(#"Added Custom",{"Custom.1"}),
#"Filtered Rows" = Table.SelectRows(#"Filled Down", each ([Column2] <> ""))
in
#"Filtered Rows"
Thank you @Cookistador , you just nailed it !
Processing the large amount of data was very quick.
Just note that in the first step, with the conditional column, if we want "equals = blank" the value field of the condition MUST be leaved. If you write something down on it and erase it (= blank ?), you cannot validate the condition. In this case, have to delete the condition and make another one without writing anything in the value field.
The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now!