Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Enhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.

Reply
Anonymous
Not applicable

Data Inconsistency excel files different structure - build a model

Hi Guys,

I need to build a model using excel files which has different structure into a unique structure.

 

My client has a bunch of excel files coming from an APP (CostX) 

The excel files have different structure for example:

 

  • Different descriptions for the same item
  • Bunch of spaces
  • has no headers
  • The address is in the first rown then the second row start the columns (name, description of the item, cost, etc)
  • subtotals in the middle then some spaces
  • has no pattern for these inconsistencies 

The challenge here is the build a model in power bi to automate when the users save these files in the client's database.

 

Basically I need to build a model to retrieve all these excel files into a unique pattern to build the dashboards, 

 

any idea or help would be very appreciate, thank you so much guys.

 

Fabricio,

1 ACCEPTED SOLUTION
Anonymous
Not applicable

HI @Anonymous ,

For large amount of randomly structure files, I'd like to suggest you do these transform/conversation by external tools instead of M query code. (notice: these M query codes of transform are complex, low performance and hard to maintain, you need to manually fix these steps codes every time major data structure changes)

In fact, current I haven't found any simple solutions to automatically handle fully randomly data structure formatting, maybe you can try to machine learning?

Guide to File Formats for Machine Learning: Columnar, Training, Inferencing, and the Feature Store 

Regards,

Xiaoxin Sheng

View solution in original post

6 REPLIES 6
Anonymous
Not applicable

HI @Anonymous,

>>Different descriptions for the same item

For this scenario, I'd like to suggest you create a dictionary table or use 'fuzzy match' feature to mapping different descriptions of similar items.

Automatically create data dictionary for your Power Pivot model

Fuzzy Match with Power Query 

>>Bunch of spaces

You can take a look at the following blog about use 'Text.Split' function to remove space characters into values:

Clean WhiteSpace in PowerQuery 

>>has no headers

In my opinion, I'd like to suggest you use 'Promoted Headers' function to define the table header if your header is stored in first row of table records or use 'Table.RenameColumns' with 'List.zip' and 'Table.ColumnNames' functions to achieve auto change default table names with default value:

	Table.RenameColumns(PreviousStep,List.Zip({Table.ColumnNames(PreviousStep), {"Columname1","Columname2","Columname3","Columname4","Columname5"}}))

Notice: list.zip function expression should have similar count of items in left and right parameters.

>>The address is in the first rown then the second row start the columns (name, description of the item, cost, etc)

You can add an index column to your table, then you can add a custom column to extract the next/previous odd/even index records(current index +/- 1) then remove empty row to display address and detail descriptions at the same row.

>>subtotals in the middle then some spaces

I'd like to suggest you remove these blank columns and subtotal column.  Power bi will aggregation your records based on its category and legends fields when you use them in visuals, so the subtotal fields are not necessarily.

Work with aggregates (sum, average, and so on) in the Power BI service 

I hope the above description will help with your scenarios.

Regards,

Xiaoxin Sheng

Anonymous
Not applicable

Hi @Anonymous 

 

thanks for your message

 

I'll try to use the fuzzy Match,

 

for you solution, I can do manually for one or two excel files but do you think would be good if I have hundreds of excel files? I don't know excel how many but I know this app where I'm getting the excel files provides all in different patterns, for example It doesn't provide exctely the same strucure of spaces or  columns names and etc,

 

so I need somehow to automate this ETL,

 

Fabricio,

Anonymous
Not applicable

HI @Anonymous ,

For large amount of randomly structure files, I'd like to suggest you do these transform/conversation by external tools instead of M query code. (notice: these M query codes of transform are complex, low performance and hard to maintain, you need to manually fix these steps codes every time major data structure changes)

In fact, current I haven't found any simple solutions to automatically handle fully randomly data structure formatting, maybe you can try to machine learning?

Guide to File Formats for Machine Learning: Columnar, Training, Inferencing, and the Feature Store 

Regards,

Xiaoxin Sheng

Hi @Anonymous ,

 

You need to create a function with Power query that will allow you to automate the treatment of each of the files in a block check the post below:

 

https://www.poweredsolutions.co/2019/02/19/parameters-and-functions-in-power-bi-power-query-custom-functions/

 

The only question here is that you probably need to apply this to a folder or a sharepoint folder to get all the files as a parameter in your custom function.

 

Regards,

MFelix


Regards

Miguel Félix


Did I answer your question? Mark my post as a solution!

Proud to be a Super User!

Check out my blog: Power BI em Português



MFelix
Super User
Super User

Hi @Anonymous,

When you refer yhqt the fike as inconsistency is that every download is different or the files are always in the same format although is not a table format?

If the files are always downloaded in the same format you can make the query to automate the cleaning of the files to become a table format.

If you can share a sample I can make a small query to exemplify what can be done.

Regards,
MFelix

Regards

Miguel Félix


Did I answer your question? Mark my post as a solution!

Proud to be a Super User!

Check out my blog: Power BI em Português



Anonymous
Not applicable

Hi @MFelix 

 

Thanks for your email,

 

Please see below 2 samples, both of the files are regarding to the same Project and the app provides this files in excel but in different format.

 

https://drive.google.com/file/d/17hJ32NxKC3Rz1f5ze7nxMtiJ472eJLWq/view?usp=sharing

 

https://drive.google.com/open?id=1Mbrh7IkGkxTAPrFPYp2BaFw8h4QiT8PD

 

Would be a good solution if the query can do the ETL and bring into the model in just one shape for all the following files.

 

Fabricio

Helpful resources

Announcements
July 2025 community update carousel

Fabric Community Update - July 2025

Find out what's new and trending in the Fabric community.

July PBI25 Carousel

Power BI Monthly Update - July 2025

Check out the July 2025 Power BI update to learn about new features.