Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get certified in Microsoft Fabric—for free! For a limited time, the Microsoft Fabric Community team will be offering free DP-600 exam vouchers. Prepare now

Reply
twister8889
Helper V
Helper V

Improve performance to refresh data power query - multiples files

Hi guys,

 

I have an import to power bi, the folder that contains multiples excel files, in this folder has 300 files, 400 MB the total size. The transformantion is very simple, but this is taking a lot of time to refresh. Any suggestion to improve the performance? In my dashboard I need to show sum ( cash + stock ) by month for each day of month year.

The excel is one by day for example: Excel1_01012020; Excel2_02012020 (ddmmyyyy)

 

1- Combine and edit method script

let
Source = Excel.Workbook(#"Sample File Parameter1", null, true),
Sheet1_Sheet = Source{[Item="Sheet1",Kind="Sheet"]}[Data],
#"Promoted Headers" = Table.PromoteHeaders(Sheet1_Sheet, [PromoteAllScalars=true]),
#"Kept First Rows" = Table.FirstN(#"Promoted Headers",each [Clients] <> null),
#"Removed Other Columns" = Table.SelectColumns(#"Kept First Rows",{"Clients", "Stock", "Cash"}),
#"Changed Type" = Table.TransformColumnTypes(#"Removed Other Columns",{{"Stock", Int64.Type}, {"Cash", Int64.Type}, {"Clients", type text}}),
#"Removed Bottom Rows" = Table.RemoveLastN(#"Changed Type",1)
in
#"Removed Bottom Rows"

 

2-Other queries script

let
Source = Folder.Files("C:\Users\User01\Desktop\Folder1\New Folder Performance"),
#"Filtered Hidden Files1" = Table.SelectRows(Source, each [Attributes]?[Hidden]? <> true),
#"Invoke Custom Function1" = Table.AddColumn(#"Filtered Hidden Files1", "Transform File from New Folder Performance", each #"Transform File from New Folder Performance"([Content])),
#"Removed Other Columns" = Table.SelectColumns(#"Invoke Custom Function1",{"Name", "Transform File from New Folder Performance"}),
#"Expanded Transform File from New Folder Performance" = Table.ExpandTableColumn(#"Removed Other Columns", "Transform File from New Folder Performance", {"Clients", "Stock", "Cash"}, {"Clients", "Stock", "Cash"}),
#"Extracted Text Between Delimiters" = Table.TransformColumns(#"Expanded Transform File from New Folder Performance", {{"Name", each Text.BetweenDelimiters(_, "_", "_"), type text}}),
#"Inserted Literal" = Table.AddColumn(#"Extracted Text Between Delimiters", "Date", each "01/01/2020", type text),
#"Changed Type" = Table.TransformColumnTypes(#"Inserted Literal",{{"Date", type date}}),
#"Inserted Month Name" = Table.AddColumn(#"Changed Type", "Month Name", each Date.MonthName([Date]), type text),
#"Inserted Year" = Table.AddColumn(#"Inserted Month Name", "Year", each Date.Year([Date]), Int64.Type),
#"Inserted Merged Column" = Table.AddColumn(#"Inserted Year", "Month", each Text.Combine({[Month Name], Text.From([Year], "pt-PT")}, "-"), type text),
#"Inserted Day" = Table.AddColumn(#"Inserted Merged Column", "Day", each Date.Day([Date]), Int64.Type)
in
#"Inserted Day"

 

I'm trying to undestand how can improve the refresh to stop taking hours to load data.

Table.buffer? Where use this? Another way to import the data? I appreciate your help

 

1 ACCEPTED SOLUTION

Hello @twister8889 

 

then your problem is not Power Query and your code but your infrastructure, and on this regard I cannot help you. You have to talk to your administrators. 

Something about your code... I cannot see that you are creating your date-columns out of the filename but you create a static date column by using this syntax

#"Inserted Literal" = Table.AddColumn(#"Extracted Text Between Delimiters", "Date", each "01/01/2020", type text),

 You add a new column named "date" and adding a text-value. Then you are using the date-column for creating all your other columns like year etc. As @edhans  is also mentioning is that in a datamodel is better to avoid redundant data and this means that is better to create a date-dimension with all your data (like year) and then connect to your fact table through your date-column.

About your performance you can try this code... maybe it's faster

let
Source = Folder.Files("C:\Users\User01\Desktop\Folder1\New Folder Performance"),
#"Filtered Hidden Files1" = Table.SelectRows(Source, each [Attributes]?[Hidden]? <> true),
#"Invoke Custom Function1" = Table.AddColumn(#"Filtered Hidden Files1", "Transform File from New Folder Performance", each #"Transform File from New Folder Performance"([Content])),
#"Removed Other Columns" = Table.SelectColumns(#"Invoke Custom Function1",{"Name", "Transform File from New Folder Performance"}),
#"Expanded Transform File from New Folder Performance" = Table.Buffer(Table.ExpandTableColumn(#"Removed Other Columns", "Transform File from New Folder Performance", {"Clients", "Stock", "Cash"}, {"Clients", "Stock", "Cash"})),
#"Extracted Text Between Delimiters" = Table.TransformColumns(#"Expanded Transform File from New Folder Performance", {{"Name", each Text.BetweenDelimiters(_, "_", "_"), type text}}),
#"Inserted Literal" = Table.AddColumn(#"Extracted Text Between Delimiters", "Date", each "01/01/2020", type text),
#"Changed Type" = Table.TransformColumnTypes(#"Inserted Literal",{{"Date", type date}}),
#"Inserted Month Name" = Table.AddColumn(#"Changed Type", "Month Name", each Date.MonthName([Date]), type text),
#"Inserted Year" = Table.AddColumn(#"Inserted Month Name", "Year", each Date.Year([Date]), Int64.Type),
#"Inserted Merged Column" = Table.AddColumn(#"Inserted Year", "Month", each Text.Combine({[Month Name], Text.From([Year], "pt-PT")}, "-"), type text),
#"Inserted Day" = Table.AddColumn(#"Inserted Merged Column", "Day", each Date.Day([Date]), Int64.Type)
in
#"Inserted Day"


If this post helps or solves your problem, please mark it as solution (to help other users find useful content and to acknowledge the work of users that helped you)
Kudoes are nice too

Have fun

Jimmy

View solution in original post

6 REPLIES 6
Jimmy801
Community Champion
Community Champion

Hello @twister8889 

 

cannot see really a problem with your code. However you can try this. Maybe it helps. Beforehand some other question. You are extracting the date of your file names but then adding a new column with "01/01/2020" and calculating out from your static date the columns Year, Monthname etc. I cannot see any reason for this. You could do all this steps within 2 steps.

Now here the code.. maybe its faster

let
Source = Folder.Files("C:\Users\User01\Desktop\Folder1\New Folder Performance"),
#"Filtered Hidden Files1" = Table.SelectRows(Source, each [Attributes]?[Hidden]? <> true),
#"Invoke Custom Function1" = Table.AddColumn(#"Filtered Hidden Files1", "Transform File from New Folder Performance", each #"Transform File from New Folder Performance"([Content])),
#"Removed Other Columns" = Table.Buffer(Table.SelectColumns(#"Invoke Custom Function1",{"Name", "Transform File from New Folder Performance"})),
#"Expanded Transform File from New Folder Performance" = Table.ExpandTableColumn(#"Removed Other Columns", "Transform File from New Folder Performance", {"Clients", "Stock", "Cash"}, {"Clients", "Stock", "Cash"}),
#"Extracted Text Between Delimiters" = Table.TransformColumns(#"Expanded Transform File from New Folder Performance", {{"Name", each Text.BetweenDelimiters(_, "_", "_"), type text}}),
#"Inserted Literal" = Table.AddColumn(#"Extracted Text Between Delimiters", "Date", each "01/01/2020", type text),
#"Changed Type" = Table.TransformColumnTypes(#"Inserted Literal",{{"Date", type date}}),
#"Inserted Month Name" = Table.AddColumn(#"Changed Type", "Month Name", each Date.MonthName([Date]), type text),
#"Inserted Year" = Table.AddColumn(#"Inserted Month Name", "Year", each Date.Year([Date]), Int64.Type),
#"Inserted Merged Column" = Table.AddColumn(#"Inserted Year", "Month", each Text.Combine({[Month Name], Text.From([Year], "pt-PT")}, "-"), type text),
#"Inserted Day" = Table.AddColumn(#"Inserted Merged Column", "Day", each Date.Day([Date]), Int64.Type)
in
#"Inserted Day"


If this post helps or solves your problem, please mark it as solution (to help other users find useful content and to acknowledge the work of users that helped you)
Kudoes are nice too

Have fun

Jimmy

First of all, thank you for your answer...

I need the split date for use in my visualization table, I need to show values by Day, Month Name-Year. So the date is in the file name, I extracted the date and split it.  ( make sense now? )

I don't understand how to do this: You could do all these steps within 2 steps


I created this question considering the company environment, I'm using the VPN and the folder with excels is in the shared folder, and this is taking hours to refresh. Now I tried to reproduce the excel structure in my laptop, with a local folder with 400 MB size, each excel file 464 KB and 928 items, and this is taking 1 minute and 20 seconds.

 

My problem is the VPN, network?.... other things are impacting me? Because as you are said, I cannot see another approach to improve the logic or power query code.

Hello @twister8889 

 

then your problem is not Power Query and your code but your infrastructure, and on this regard I cannot help you. You have to talk to your administrators. 

Something about your code... I cannot see that you are creating your date-columns out of the filename but you create a static date column by using this syntax

#"Inserted Literal" = Table.AddColumn(#"Extracted Text Between Delimiters", "Date", each "01/01/2020", type text),

 You add a new column named "date" and adding a text-value. Then you are using the date-column for creating all your other columns like year etc. As @edhans  is also mentioning is that in a datamodel is better to avoid redundant data and this means that is better to create a date-dimension with all your data (like year) and then connect to your fact table through your date-column.

About your performance you can try this code... maybe it's faster

let
Source = Folder.Files("C:\Users\User01\Desktop\Folder1\New Folder Performance"),
#"Filtered Hidden Files1" = Table.SelectRows(Source, each [Attributes]?[Hidden]? <> true),
#"Invoke Custom Function1" = Table.AddColumn(#"Filtered Hidden Files1", "Transform File from New Folder Performance", each #"Transform File from New Folder Performance"([Content])),
#"Removed Other Columns" = Table.SelectColumns(#"Invoke Custom Function1",{"Name", "Transform File from New Folder Performance"}),
#"Expanded Transform File from New Folder Performance" = Table.Buffer(Table.ExpandTableColumn(#"Removed Other Columns", "Transform File from New Folder Performance", {"Clients", "Stock", "Cash"}, {"Clients", "Stock", "Cash"})),
#"Extracted Text Between Delimiters" = Table.TransformColumns(#"Expanded Transform File from New Folder Performance", {{"Name", each Text.BetweenDelimiters(_, "_", "_"), type text}}),
#"Inserted Literal" = Table.AddColumn(#"Extracted Text Between Delimiters", "Date", each "01/01/2020", type text),
#"Changed Type" = Table.TransformColumnTypes(#"Inserted Literal",{{"Date", type date}}),
#"Inserted Month Name" = Table.AddColumn(#"Changed Type", "Month Name", each Date.MonthName([Date]), type text),
#"Inserted Year" = Table.AddColumn(#"Inserted Month Name", "Year", each Date.Year([Date]), Int64.Type),
#"Inserted Merged Column" = Table.AddColumn(#"Inserted Year", "Month", each Text.Combine({[Month Name], Text.From([Year], "pt-PT")}, "-"), type text),
#"Inserted Day" = Table.AddColumn(#"Inserted Merged Column", "Day", each Date.Day([Date]), Int64.Type)
in
#"Inserted Day"


If this post helps or solves your problem, please mark it as solution (to help other users find useful content and to acknowledge the work of users that helped you)
Kudoes are nice too

Have fun

Jimmy

Hello @Jimmy801 

Still, about the column date 01/01/2020. I tried to Extracted Text Between Delimiters _ and _ because the filename is Excel1_01012020_Team. So I tried to use the add column by example, and after I needed to transform this into a date, month, monthname, year, day. However I'm testing now, and I don't have the expected result, maybe I need to do this another way.


I think that the problem is the 'structure' that I'm using VPN, because in my laptop and local folder the refresh it's not so much slow ( minor than 2 minutes )

 

I understand about the star schema, and about the creation dim date, but this request is to count the rows by excel (date, year, month, and day). Imagining that the power query is ok, I don't know if it's necessary to create the date dim, but I will create it to test in the company environment.

@twister8889 You are putting everything in one big flat table. That isn't efficient for Power BI. Just have your date as a pure date in that table. Then have a date table that is separate that will have day, month, name, etc. May MANY different fields. By putting everything in one table is hugely inefficent. Power BI expects a Star Schema. Below is a code to my date table, and the importance of a star schema.

 

Microsoft Guidance on Importance of Star Schema

And to show you the importance of this, Amir Netz, the person largely responsible for the existence of Power BI in its current implementation, explains why flat tables are bad for Power BI.

 

Creating a Dynamic Date Table in Power Query

 

Doing these will both improve the performance of the DAX model itself and speed up the refresh time as you can get rid of all of thse Add Column steps and radically reduce the size of the data being loaded.



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting

Hello, @edhans thanks for your answer...

 

I will try to implement this solution, dim date. However, in my case, I think the problem is my structure. I will create this solution with two tables, dim date and fact file with a count of rows, and I go back here with feedback

Helpful resources

Announcements
September Hackathon Carousel

Microsoft Fabric & AI Learning Hackathon

Learn from experts, get hands-on experience, and win awesome prizes.

October NL Carousel

Fabric Community Update - October 2024

Find out what's new and trending in the Fabric Community.