The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
Good morning everyone,
I need some help manipulating data. I’ve tried but get stuck at several steps—maybe my approach is wrong. I still consider myself a beginner.
My goal is to have a dashboard that automatically pulls all the XLS files located in a specific folder.
I’m sure all the XLS files in that folder are set up the same way and all have the same sheet name from which to take the data (sheet "BUDGET").
Each XLS file corresponds to a different project.
Attached is a link to access my Google Drive where there are 2 example files:
Now I explain step by step what I need to do.
General premise: I need the system to always work. For example, if PBI encounters an error in a specific file, I prefer that file to be excluded from the analysis so that the dashboard shows fewer but only useful items. Later I will fix the problematic XLS file.
Load from a specific folder all XLS files inside it (consider that new files will be added over time, so I need to be able to refresh in PBI so it automatically includes any new files).
The system must only consider sheets named exactly "BUDGET". It must exclude all sheets named differently (including ones named e.g. "BUDGET_OLD", "BUDGET CLIENT", or others).
I need to do the following steps to further process the data:
Remove all empty rows
Specify that the first row should NOT be used as header
Column 1 must be named "Client"
Column 2 must be named "Project"
Column 3 must be named "Zone"
Column 4 must be named "Cat_Code"
Column 5 must be named "Subcat_Code_Unique"
Column 6 must be named "Subcat_Code"
Column 7 must be named "Subcat_Description"
Column 8 must be named "UM"
Column 10 must be named "AAA CONTRACT"
Column 12 must be named "AAA BUDGET"
Column 14 must be named "AAA COST".
Filling "Subcat_Code_Unique": I temporarily filter to fill values in column 5.
The temporary filter must consider values from column 6 ("Subcat_Code") and keep only rows where the value is numeric, excluding nulls, text, or symbols.
Then write in column 5 ("Subcat_Code_Unique") the concatenation of columns (1 + "" + 2 + "" + 3 + "" + 4 + "" + 6).
By temporarily filtering column 6 for values with a letter followed by a closing parenthesis, I can determine the values in column 7 that correspond to "Cat_Description".
These values are common across the different XLS files.
I also need the "Cat_Code" values to be consistent across all files read. Meaning each XLS file contains the same codes, and in the dashboard I want to filter—for example for Cat_Code "A"—all values from different projects.
The columns "AAA CONTRACT" and "AAA BUDGET" contain direct values entered in each XLS cell; differently, the next column "AAA COST" corresponds to the sum of subsequent columns which have costs for different subcontractors Sub1, Sub2, Sub3, etc. The "AAA COST" column thus reports row by row the total costs of each Sub.
In the dashboard, I need to relate, for each Sub (Sub1, Sub2, Sub3, ...), the corresponding costs linked to each Subcat.
So I want to analyze data showing, for each project, costs divided by Cat and Subcat, and be able to see costs per Sub divided by Zone, Cat, and Subcat.
Finally, I need to exclude any additional columns after those named below the generic Sub heading "Sub".
I don’t know if this is easy or even possible to do.
I would be very grateful if you could help me.
Best
Marco
Hi guys,
I'm here once again.
I followed your advice and I bought the book "Collect, Combine, and Transform Data Using Power Query in Excel and Power BI".
Very helpful and thank you all for suggest.
Now I did something on PBI with that xls files.
I little modified the XLS files to be easyer for me and for PBI to manage them.
But now I'm here becouse I stille have some problems.
Here you can find XLS Files and PBIX files.
google drive file
I don't know if I use the best way to reach my goal but I'm confident that this is a good path.
if you download and open the PBIX file you will see that all it's ok.
But I have a little problem that I didn't resolve.
I repeat that my goal is to have data from different XLS files in the folder.
XLS are all made with same columns from 1 to AAA COSTS. After that we can have different number of columns.
and the same is obviously for rows.
The problem is that I need to sum AAA CONTRACT, AAA BUDGET and AAA COSTS also if inside the colums at the right of AAA COSTS will have no data.
infact if you look at XLS files and data inside PBI dashboard you will see that I loose some data when I applied the step "Unpivoted" inside the Power Query.
To see what data I'm missing, you need to keep an eye on the Subcat_Description column where: for both the XLS file TSC_LIM and the TSC_NOL file, you'll see that the row corresponding to the "var1" entry is missing. In fact, if you notice, in correspondence with this row, in the columns major than AAA COSTS there are no numeric values.
Now I don't know how to proceed, how to make PBI take the lines in question into account.
I kyndly ask if someone of you can help me to modify the Power Query steps or suggest me the correct way to resolve my problem.
Thank you in advance.
Best
Marco
Hi @Marco_88
Based on my understanding of your requirements, I have listed the points below. Please take a look.
It looks like you're losing some rows (like "var1") in Power BI because of how the subcontractor columns are being unpivoted. Basically, if a row has no values in those columns (the ones after "AAA COSTS"), Power BI might just drop it during the unpivot step.
To avoid that, try unpivoting the columns in a way that keeps all rows even the ones where everything is blank. One easy way is to use Unpivot Other Columns and make sure you're not accidentally filtering anything out. You can also replace the nulls with zeros afterward if that helps keep things clean.
This way, all your data stays intact, even if some rows have no subcontractor costs, and your dashboard reflects the full picture.
If I misunderstand your needs or you still have problems on it, please feel free to let us know.
Hi @Marco_88
Hope everything’s going smoothly on your end. I wanted to check if the issue got sorted. if you have any other issues please reach community.
Hi guys,
I'm here once again.
I followed your advice and I bought the book "Collect, Combine, and Transform Data Using Power Query in Excel and Power BI".
Very helpful and thank you all for suggest.
Now I did something on PBI with that xls files.
I little modified the XLS files to be easyer for me and for PBI to manage them.
But now I'm here becouse I stille have some problems.
Here you can find XLS Files and PBIX files.
I don't know if I use the best way to reach my goal but I'm confident that this is a good path.
if you download and open the PBIX file you will see that all it's ok.
But I have a little problem that I didn't resolve.
I repeat that my goal is to have data from different XLS files in the folder.
XLS are all made with same columns from 1 to AAA COSTS. After that we can have different number of columns.
and the same is obviously for rows.
The problem is that I need to sum AAA CONTRACT, AAA BUDGET and AAA COSTS also if inside the colums at the right of AAA COSTS will have no data.
infact if you look at XLS files and data inside PBI dashboard you will see that I loose some data when I applied the step "Unpivoted" inside the Power Query.
To see what data I'm missing, you need to keep an eye on the Subcat_Description column where: for both the XLS file TSC_LIM and the TSC_NOL file, you'll see that the row corresponding to the "var1" entry is missing. In fact, if you notice, in correspondence with this row, in the columns major than AAA COSTS there are no numeric values.
Now I don't know how to proceed, how to make PBI take the lines in question into account.
I kyndly ask if someone of you can help me to modify the Power Query steps or suggest me the correct way to resolve my problem.
Thank you in advance.
Best
Marco
Hi @Marco_88
Thank you for reaching out to the Microsoft Fabric Forum Community.
Their suggestions are very useful and accurate. Please try the recommended steps, and if the issue persists, feel free to reach out here. We are happy to assist you.
Thanks.
Hi @Marco_88
I wanted to check if you had the opportunity to review the information provided by users. Please feel free to contact us if you have any further questions.
@Marco_88 I would be surprised to see anyone document all those steps for you, so instead I highly recommend getting Gil Raviv's book called, "Collect, Combine, and Transform Data Using Power Query in Excel and Power BI." It's one of the most useful resources I've bought to guide on accomplishing what you're looking to.
Hi bchager,
thanks for the advice, I wasn't familiar with this book. I just bought it, so I hope to be independent soon. I'm sorry I asked so many questions, but it was important to me that the reader could get the pig picture.
Hi @bchager
I was about going to say the same as well. OP is asking for a lot in a single post.
Hi @Marco_88
When connecting to a folder, you'll see a list of all available files—including all file types and even temporary files. The Content column will have a double down-arrow icon next to its name. Clicking this icon prompts Power Query to automatically apply transformations, which might not always align with your specific needs. Note: Prior to clicking this column, make sure to select .xlsx files only and remove those with $ in the filename (temporary files).
Go to Transform Sample File. Any transformations applied there will automatically be applied to all Excel files in the folder. You can undo steps like promoted headers, add a concatenated column, select columns, or apply filters as needed.
Now, go back to Query1 and locate the Invoke Custom Function step. Modify each #"Transform File"([Content]) to try #"Transform File"([Content]) otherwise null. This change ensures that if the automatically created custom function returns an error when invoked, the row will return null instead of causing the query to fail.
@danextian Good advice!
@Marco_88 I also recommend using Copilot to help you accomplish what you're looking to in the Transform Sample File query. You can upload sample data and tell Copilot exactly what data transformation steps need to be applied using M code and it will write it for you. I've recently done the same on raw data that needed many transformations before being usable, and it's brilliant. You may have to tweak the code here and there or tell Copilot where it still doesn't look right, but it should ultimately get your data to where it needs to be.