Solved: Re: Dynamic Columns Specification in CSV.Document

CMMaliniJoshi · ‎03-08-2023

Dear All

Kindly help me with a workaround in power query for below mentioned situation.

I have to combine two reports in CSV. Both of the CSVs have top three rows which I need to remove first.

File1 contains below mentioned fields: (Hypothetical field names)

Date,Sales,Product,Type,No_of_inst,EMI,Last_EMI_Dt,Cust_Type_Code,Customer_Type

File2 contains below mentioned fields:

Date,Sales,Product,Type,DP,Stock_St_Dt,Cust_Type_Code,Cust_Type,

2. Both of the CSVs have got different no. of columns. File1 – 9 Columns, File2 – 8 Columns

3. The no. of columns always reduce or increase every time new reports are provided to me.

4. Out of all the columns in these CSVs , I require only specific columns. I have prepared a list of required in fields in excel. The list of required fields can increase or reduce in future. At present required fields (Hypothetical) are as under:

Date,Sales,Product,Type,No_of_inst,Customer_Type (From File1)

Date,Sales,Product,Type,DP,Cust_Type (From File2)

Some of the fields are required from File1 and some from File2.

Combined list of required fields is as under:

Date,Sales,Product,Type,No_of_inst,Customer_Type (Cust_Type), DP

5. Some of the columns have same names in both of the CSVs. But some columns have same kind of data but different names in CSVs.

Date,Sales,Product,Type (File1)

Date,Sales,Product,Type (File2)

Above fields have same names in both of the CSVs.

Customer_Type (From File1) and Cust_Type (From File2)

Above field is having different field names but they are having same data. So the data in these columns should appear in only 1 column in the output.

6. I cannot mention specific no. of columns while using CSV.Document's "Columns" argument, because no. of columns can increase or reduce next time. I cannot skip to provide “Columns” argument. If I do so, the CSV.Document retrieves only two columns based on data in First 3 rows which are actually redundant and have to be removed as mentioned in point no. 1 above.

7. I cannot provide a list for "Columns" argument in CSV.Document because I have to remove top 3 rows before specifying columns. When I tried to provide a list in "Columns" argument in CSV.Document, it named the columns wrongly because the sequence of field names in my list differs with sequence of field names in CSV files and the sequence of field names can again change in future CSVs.

8. The field names in the CSVs are not proper for final output. Hence, I have to rename some / all of the required field names as under:

Current_Field_Name	Required_Field_Name
Date	Report_Date
Sales	Sales_Amount
Product	Product_Description
Type	Product_Type
No_of_inst	No_of_EMIs
Customer_Type(Cust_Type)	Constitution
DP	Allowed_Amount

I will maintain above list in excel so that I can dynamically update field names. I intend to use Table.ToRows or List.Zip but I am not sure how to use it as I am facing constraints as mentioned in foregoing points.

I request you all to help me with a solution which can take care all of the above requirements.

Regards

v-jingzhang · ‎03-13-2023

Hi @CMMaliniJoshi

I create a sample according to your description. You may download my sample file to see the solution. The files I used are in the zip file.

Please notice: in "Field Names" query, you need to have multiple rows for columns that have same kind of data but different names in CSVs.

In "RequiredFields of Report1(Report2)", it should have all required field names in a single column.

For each report data query, we can provide a large enough number (e.g. 20 in my sample) for the "Columns" parameter in Csv.Document. This number should be larger than the possible maximum no. of columns in the future. Then remove top 3 rows, promote the first row as headers, select only required columns and rename these columns. Then append two queries into a single query to combine the data.

Change column data types in the combined query after you get all data for futher modeling/calculation.

Best Regards,
Community Support Team _ Jing
If this post helps, please Accept it as Solution to help other members find it.

View solution in original post

wdx223_Daniel · ‎03-08-2023

think it's better to provide some sample to test, and make you easily to deploy the code to your real data.

CMMaliniJoshi · ‎03-08-2023

Dear wdx223_daniel

Thanks for taking interest. I am providing herewith some sample data. Hope it helps. My actual data is company provided hence, I cannot share here.

Report 1

ABC CO. LTD, CENTRAL BRANCH
REPORT AS ON 01.02.2023

Date,Sales,Product,Type,No_of_inst,EMI,Last_EMI_Dt,Cust_Type_Code,Customer_Type
20.Feb.25,2269.96,Product1,Short_Term,119,6145,31.Jan.35,1,Individual
06.Feb.24,2090.39,Product2,Short_Term,110,7023,30.Apr.33,4,Company
09.Apr.24,58.07,Product3,Short_Term,107,6099,31.Mar.33,4,Company
10.Jul.24,720.39,Product4,Short_Term,109,5539,31.Aug.33,2,Proprietorship
01.Sep.24,1283.18,Product5,Short_Term,113,9856,28.Feb.34,3,Partnership
16.Aug.25,2149.1,Product3,Short_Term,119,7637,31.Jul.35,4,Company
29.Oct.24,2147.55,Product1,Short_Term,110,9566,31.Dec.33,1,Individual
19.Mar.24,736.56,Product5,Short_Term,111,5259,30.Jun.33,4,Company
29.Nov.25,894.57,Product1,Short_Term,117,6581,31.Aug.35,4,Company
06.Jun.24,682.96,Product3,Short_Term,109,6413,31.Jul.33,3,Partnership
19.Oct.24,2025.6,Product1,Short_Term,115,8197,31.May.34,1,Individual
06.Aug.25,1521.47,Product1,Short_Term,111,9744,30.Nov.34,3,Partnership
19.Jan.25,1425.18,Product4,Short_Term,106,8167,30.Nov.33,3,Partnership
07.Jan.24,215.61,Product4,Short_Term,113,8674,30.Jun.33,2,Proprietorship
15.May.24,2206.33,Product5,Short_Term,109,9000,30.Jun.33,4,Company

Report2

ABC CO. LTD, CENTRAL BRANCH
REPORT AS ON 01.02.2023

Date,Sales,Product,Type,DP,Stock_St_Dt,Cust_Type_Code,Cust_Type
24.Feb.25,41.05,Product3,Long_Term,12,16.Mar.25,1,Individual
12.May.25,1929.81,Product1,Long_Term,23,01.Jun.25,4,Company
24.May.24,2475.21,Product3,Long_Term,34,13.Jun.24,1,Individual
26.Feb.24,1472.55,Product1,Long_Term,45,17.Mar.24,4,Company
11.Aug.25,1196.43,Product1,Long_Term,56,31.Aug.25,1,Individual
01.Aug.25,2462.81,Product1,Long_Term,67,21.Aug.25,4,Company
05.Oct.25,689,Product3,Long_Term,78,25.Oct.25,2,Proprietorship
25.Nov.25,896.54,Product1,Long_Term,89,15.Dec.25,3,Partnership
06.Nov.25,1975.55,Product5,Long_Term,100,26.Nov.25,2,Proprietorship
18.May.24,562.39,Product5,Long_Term,111,07.Jun.24,1,Individual
02.Nov.24,1271.88,Product1,Long_Term,122,22.Nov.24,3,Partnership
03.Mar.24,1673.21,Product5,Long_Term,133,23.Mar.24,3,Partnership
26.Feb.24,190.09,Product3,Long_Term,144,17.Mar.24,3,Partnership
01.Jun.25,1780.44,Product4,Long_Term,155,21.Jun.25,3,Partnership
08.Mar.24,1840.18,Product4,Long_Term,166,28.Mar.24,1,Individual

v-jingzhang · ‎03-13-2023

Hi @CMMaliniJoshi

I create a sample according to your description. You may download my sample file to see the solution. The files I used are in the zip file.

Please notice: in "Field Names" query, you need to have multiple rows for columns that have same kind of data but different names in CSVs.

In "RequiredFields of Report1(Report2)", it should have all required field names in a single column.

For each report data query, we can provide a large enough number (e.g. 20 in my sample) for the "Columns" parameter in Csv.Document. This number should be larger than the possible maximum no. of columns in the future. Then remove top 3 rows, promote the first row as headers, select only required columns and rename these columns. Then append two queries into a single query to combine the data.

Change column data types in the combined query after you get all data for futher modeling/calculation.

Best Regards,
Community Support Team _ Jing
If this post helps, please Accept it as Solution to help other members find it.

CMMaliniJoshi · ‎03-14-2023

Dear v-jingzhang

Thanks a lot for taking interest. The solution is definitely useful and it is on the lines of my requirement. However, kindly help / advise on -

1. Last step on each of Report1 / Report2 quereis (Reproduced hereunder). Since, I am new to Power Query I am unable to understand this constructoin.

"= Table.TransformColumnNames(#"Removed Other Columns", each let vPosition = List.PositionOf(#"Current Field Names", _) in if vPosition >= 0 then #"Required Field Names"{vPosition} else _)"

This step is super cool. Please advise how it works.

2. The queries are providing me the solution which I need except one point. As advised by you, the columns argument in CSV.Document has to be provided as an input. It would be much helpful if there is a way to make it dynamic / automize the counting of all columns before retrieval of data from binary file CSV. Please advise if you can provide solution for that.

Regards

v-jingzhang · ‎03-15-2023

Hi @CMMaliniJoshi

1. You could first refer to the following docs to learn about the expressions and functions I used in that statement.

Table.TransformColumnNames

List.PositionOf

M Language Let

M Language Conditionals

Table.TransformColumnNames intends to transform each column name in the query one by one. Before transforming, the variable "vPosition" will try to find if a name exists in the #"Current Field Names" list. If it exists, it will return its offset in the list. If it doesn't exist, it will return -1 by default. Then the "if" conditional statement will return different values according to the vPosition's result. When vPosition is larger than or equal to 0, this means the original column names exists in #"Current Field Names" list, so it returns the corresponding position's name from #"Required Field Names" list. When vPosition is smaller than 0, this means the original column name doesn't exist, so it returns its name as it is in case of any error.

2. For the second point, I'm sorry I don't have any solution. We can only know the no. of columns after removing top 3 rows after the retrieval of data from CSV file. However the columns argument needs to be provided when the file is parsed as CSV file, so I have no idea how to get the no. of columns dynamically before Power Query parses the file as CSV.

Best Regards,
Jing

CMMaliniJoshi · ‎03-16-2023

Dear Jing,

Thank you very much for your excellent explanation for point no. 1. Since there is no option known for point no. 2, I wish to conclude this query to be marked as resolved as rest of my requirements are complete as per your nice solution. Thanks a lot for helping me.

At this point, i have one last request if you can guide me for below mentioned point :

Why do we need to use a "let" and "in" as used in your solution above in last step. Can't it accept it by specifying "each" followed by rest of the statement / function.

With best regards

v-jingzhang · ‎03-16-2023

Hi @CMMaliniJoshi

A "let" and "in" is not a must. You can use below code, which works the same. I used "let" and "int" because it can store the result in a variable and the variable can be used in the rest of the statement directly. This can avoid reevaluating it several times. This is my habit.

= Table.TransformColumnNames(#"Removed Other Columns", each if List.PositionOf(#"Current Field Names", _) >= 0 then #"Required Field Names"{List.PositionOf(#"Current Field Names", _)} else _)

To mark the thread as resolved, you can click the "Accept as Solution" button on my first reply. Thank you!

Best regards,

Jing

CMMaliniJoshi · ‎03-17-2023

Dear Jing

Thanks a lot for your guidance.

With best regards

Dynamic Columns Specification in CSV.Document

Helpful resources

FabCon & SQLCon – Barcelona 2026

Data Days 2026

Power BI DataViz World Championships - June 2026

Get Fabric or SQL Certified for Free.

Dynamic Columns Specification in CSV.Document

Helpful resources

FabCon & SQLCon – Barcelona 2026

Data Days 2026

Power BI DataViz World Championships - June 2026