Solved: Handle CSV files where the # of duplicate column n...

Syndicate_Admin · ‎08-06-2021

I have a CSV file that can contain duplicate column names, which are automatically sequenced when brought into the Query.

eg. CSV file has:

Key

Label

Component

Loaded with column names:

Key

Label

Label_1

Label_2

Component

Component_3

Component_4

The sequence number can change whenever additional columns are added. For instance, if a fourthLabel column is on the file, the load will impact both Label and COmponent column names, e.g.

Key

Label

Label_1

Label_2

Label_3

Component

Component_4

Component_5

I figured out how to merge the columns in the query, but whenever additional columns are added I have to go into the Advanced Editor and add the new column and rename other columns.

I am wondering if there is a way to set up logic to loop thru the column names and include all columns with the same starting column name into one merged column.

So in the initial example:

Merged Labels would include Label, Label_1, Label_2

Merged Components would include Component, Component_3, Component_4

Then when a fourth Label column shows on the file,

Merged Labels would include Label, Label_1, Label_2, Label_3

Merged Components would include Component, Component_4, Component_5

Any ideas?

Thanks

Jakinta · ‎08-07-2021

Hope this helps...

let
    Source = Table.FromColumns ( {{1..10}, {11..20}, {21..30}, {31..40}, {41..50}}, {"Label", "Label1", "Label2", "Component", "Component1"}),
    Labels = List.Select( Table.ColumnNames(Source), each Text.Contains(_, "Label") ),
    Comps = List.Select( Table.ColumnNames(Source), each Text.Contains(_, "Component") ),
    ToText = Table.TransformColumnTypes( Source, List.Transform( Table.ColumnNames(Source), each {_, type text})),
    MergedLabels = Table.CombineColumns(ToText,Labels,Combiner.CombineTextByDelimiter("~", QuoteStyle.None),"Merged Labels"),
    MergedComps = Table.CombineColumns(MergedLabels,Comps,Combiner.CombineTextByDelimiter("~", QuoteStyle.None),"Merged Components")
in
    MergedComps

View solution in original post

dmanning · ‎08-06-2021

Here is the full query. The lines in red are the ones I have to modify whenever the # of duplicate columns names change.

let
Source = Csv.Document(File.Contents(DataFolder2 & "\DSS_JIRA_Vitech.csv"),[Delimiter=",", Encoding=1252, QuoteStyle=QuoteStyle.Csv]),
#"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
Output=Table.RenameColumns(#"Promoted Headers",List.Transform(Table.ColumnNames(#"Promoted Headers"), each {_, Text.Replace(_,"/","")})),
#"Renamed Columns" = Table.RenameColumns(#"Output",{{"Custom field (Story Points)", "Story Points"}
, {"Custom field (Drop Date)", "Drop Date"},{"Custom field (Epic Link)", "Feature Link"}
,{"Custom field (Epic Name)", "Feature Name"}}),
#"Changed Type" = Table.TransformColumnTypes(#"Renamed Columns",{{"Story Points", Int64.Type}}),
#"Sprint-Current: added" = Table.AddColumn(#"Changed Type", "Sprint-Current", each if [Sprint] = "" then "Unassigned" else [Sprint]),
#"Label: merge columns" = Table.AddColumn(#"Sprint-Current: added","Merged Labels"
, each Text.Combine(List.Select({[Labels], [Labels_1], [Labels_2], [Labels_3], [Labels_4], [Labels_5], [Labels_6]}, each _<> "" and _ <> null),"~")),
#"Component: merge columns" = Table.AddColumn(#"Label: merge columns","Merged Components"
, each Text.Combine(List.Select({[Components], [Components_11], [Components_12], [Components_13]}, each _<> "" and _ <> null),"~")),
#"Label-Primary: added" = Table.AddColumn(#"Component: merge columns", "Label-Primary",
each if Text.PositionOf([Merged Components],"MSS")>-1 then "MSS"
else if Text.PositionOf([Merged Components],"ESS")>-1 then "ESS"
else if Text.PositionOf([Merged Components],"BSS")>-1 then "BSS"
else if Text.PositionOf([Merged Components],"SalesCenter")>-1 then "BSS"
else if [Merged Components] = "" then "Unassigned"
else "Unknown"),
#"JIRA Source: added" = Table.AddColumn(#"Label-Primary: added", "JIRA Source", each "Vitech"),
#"Story Points: null to zero" = Table.ReplaceValue(#"JIRA Source: added",null,0,Replacer.ReplaceValue,{"Story Points"}),
#"Removed Other Columns" = Table.SelectColumns(#"Story Points: null to zero",{"Issue Type", "Issue key", "Summary", "Status", "Story Points", "Sprint-Current", "Label-Primary", "Feature Link", "Feature Name", "Created", "Updated", "Resolved", "Resolution", "JIRA Source","Merged Labels","Merged Components"})
in
#"Removed Other Columns"

Jakinta · ‎08-07-2021

Hope this helps...

let
    Source = Table.FromColumns ( {{1..10}, {11..20}, {21..30}, {31..40}, {41..50}}, {"Label", "Label1", "Label2", "Component", "Component1"}),
    Labels = List.Select( Table.ColumnNames(Source), each Text.Contains(_, "Label") ),
    Comps = List.Select( Table.ColumnNames(Source), each Text.Contains(_, "Component") ),
    ToText = Table.TransformColumnTypes( Source, List.Transform( Table.ColumnNames(Source), each {_, type text})),
    MergedLabels = Table.CombineColumns(ToText,Labels,Combiner.CombineTextByDelimiter("~", QuoteStyle.None),"Merged Labels"),
    MergedComps = Table.CombineColumns(MergedLabels,Comps,Combiner.CombineTextByDelimiter("~", QuoteStyle.None),"Merged Components")
in
    MergedComps

dmanning · ‎08-11-2021

This did help. I had to do some additional work because I needed to remove the extra delimiters generated for when the columns had no value, but I ended up with something that provided the desired results.

Thanks for your support.

AlexisOlson · ‎08-06-2021

What does the code for your merge step look like? Feel free to paste the M code for the whole query.

Handle CSV files where the # of duplicate column names can change

Helpful resources

Join our Fabric User Panel

Power BI Monthly Update - June 2025

Fabric Community Update - June 2025

Join the #PBI10 DataViz contest

Handle CSV files where the # of duplicate column names can change

Helpful resources

Join our Fabric User Panel

Power BI Monthly Update - June 2025

Fabric Community Update - June 2025