Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started

Reply
georgec96
Helper II
Helper II

Power BI dataflow very slow

Hi guys, I am new to Power BI so sorry in advance if this is a silly question.

 

I have a Power BI dataflow as my data source which is about 4 million rows and 15 columns.

 

I can connect successfully to the dataflow however this is extremely slow and I'm not sure if this is to be expected or a problem with my settings.

1. Power BI Desktop takes ages to import the data which is approx 1.6 GB and that takes around 10-15 mins to load, also every time I do something in Power Query once I press close and apply it downloads that whole data again.

Capture.PNG

 

2. Power Query is really slow on simple queries such as filtering, adding, and removing columns ( each query takes around 2-3 mins to process)

 

Thank you in advance 😁

1 ACCEPTED SOLUTION
KNP
Super User
Super User

Hi @georgec96

 

Another option to consider, if you need all that data, is to use direct query for dataflows.

https://docs.microsoft.com/en-us/power-bi/transform-model/dataflows/dataflows-directquery

 

I've had to use this recently for a large dataset and it works well. It then gives you the option to build aggregation tables in your model to improve performance but also keep the detailed data when necessary.

https://docs.microsoft.com/en-us/power-bi/transform-model/aggregations-advanced#:~:text=Aggregations....

 

Caveat, this will only work with premium or premium per user as you need to be able to turn on the 'Enhanced compute engine settings' for the Dataflow.

 

KNP_0-1644260638886.png

Hope this helps.

 

Have I solved your problem?
Please click Accept as Solution so I don't keep coming back to this post, oh yeah, others may find it useful also ;).
chrome-9xf-Zagzel-B

If you found this post helpful, please give Kudos.
It gives me a sense of instant gratification and, if you give me Kudos enough times, magical unicorns will appear on your screen.
If you find my signature vaguely amusing, please give Kudos.
KIfp67uy-Sr
Proud to be a Super User!PBI-Super-User-Rank-30x30-1x

View solution in original post

8 REPLIES 8
dkuhne
Regular Visitor

Did you find a soultion to this?

 

I am the same as you - Using Dataflows is extremly slow. Every single transformation results in all the data redownloading for each applied step, very slowly. 

I think it relates to API download limits, but thats just a guess.

 

Only solution I can think of, is to do all your transformations in the PowerBi service prior to completing the Dataflow. Then you only have to download it once in your PBIX file

bobmclaren
Regular Visitor

These suggestions are helpful, but I would like to understand better what PBI Desktop is re-downloading the data from my dataflow every time I apply a transformation.  If I am performing a simple operation such as adding two columns together to create a new column, why is PBI re-downloading the data?

@bobmclaren 

 

When you create a calculated column or a measure in Power BI, the data is not re-downloaded because these operations are performed at the Data Model level rather than in Power Query.  Since the calculated column is created at the Data Model level, it does not require Power Query to re-download the data because it has not made changes to the source data as such.

 

In contrast, when adding a new column using Power Query, access to all the data to perform transformations / manipulations is required.  Therefore, with each new change in Power Query, once you hit "Apply" or "Close & Apply," Power Query will need to re-download all of the data again.

 

Importantly, the benefit of making transformations in Power Query is that they do not slow down the Data Model view.  If anything, the transformations / manipulations via Power Query often lead to much faster processing in the front end and far greater system efficiencies / speed.

 

In summary, when you create a calculated column or a measure, the operation is performed at the Data Model level and does not require Power Query to re-download the data.   However, it will slow down the speed / efficiencies in the front end (subject to how much data your Data Model has).  Conversley, when you use Power Query to do the changes, you have much greater speeds / efficiences, but whenever you make  these changes, you then need to "re-download" the data.

 

Hope this helps.

 

Theo 🙂

 

If I have posted a response that resolves your question, please accept it as a solution to formally close the post.

Also, if you are as passionate about Power BI, DAX and data as I am, please feel free to reach out if you have any questions, queries, or if you simply want to connect and talk to another data geek!

Want to connect?www.linkedin.com/in/theoconias

Thank you for your response Theo,

 

I do undertand that changes to things like measures in Power BI do not require a re-download and that makes sense to me.  What I am having trouble understanding is, when I am applying transformations in Power Query, why is the existing data in my preview being re-downloaded?  Today, I created a new column in Power Query by combining two columns, which triggered a re-download of the dataflow.  When I realized that I didn't like the column name that I chose, I simply changed the name of the column, and it re-downloaded all the data again.  This is before I have even clicked "Apply" or "Close and Apply".  This makes for a very slow and frustrating design experience no matter how small I shrink my dataset.  I don't understand why it is necessary, and it leads me to think that I am either doing something wrong, or perhaps the technology (dataflows) is just new, and they haven't had a chance to optimize it yet.  

KNP
Super User
Super User

Hi @georgec96

 

Another option to consider, if you need all that data, is to use direct query for dataflows.

https://docs.microsoft.com/en-us/power-bi/transform-model/dataflows/dataflows-directquery

 

I've had to use this recently for a large dataset and it works well. It then gives you the option to build aggregation tables in your model to improve performance but also keep the detailed data when necessary.

https://docs.microsoft.com/en-us/power-bi/transform-model/aggregations-advanced#:~:text=Aggregations....

 

Caveat, this will only work with premium or premium per user as you need to be able to turn on the 'Enhanced compute engine settings' for the Dataflow.

 

KNP_0-1644260638886.png

Hope this helps.

 

Have I solved your problem?
Please click Accept as Solution so I don't keep coming back to this post, oh yeah, others may find it useful also ;).
chrome-9xf-Zagzel-B

If you found this post helpful, please give Kudos.
It gives me a sense of instant gratification and, if you give me Kudos enough times, magical unicorns will appear on your screen.
If you find my signature vaguely amusing, please give Kudos.
KIfp67uy-Sr
Proud to be a Super User!PBI-Super-User-Rank-30x30-1x
georgec96
Helper II
Helper II

@TheoC,  Thanks alot for your input, that was extremely helpful, I have one more question though.

 

Would a dataset (either as an import or direct query) be quicker than a dataflow when it comes to loading and refreshing times?

Hi @georgec96 I think the theoretical view is that using data flows will be quicker because the transformations occur in the data load process. However, if there is limited transformation and the data itself is quite clean, I would say this theory may be challenged. Take a look here at this link from Reza. He goes into it a bit but the comments on the blog are brilliant too: https://radacad.com/how-to-use-dataflow-to-make-the-refresh-of-power-bi-solution-faster 

 

I hope this has been of some.use!

Theo 🙂

 

 

If I have posted a response that resolves your question, please accept it as a solution to formally close the post.

Also, if you are as passionate about Power BI, DAX and data as I am, please feel free to reach out if you have any questions, queries, or if you simply want to connect and talk to another data geek!

Want to connect?www.linkedin.com/in/theoconias

TheoC
Super User
Super User

Hi @georgec96 

 

A few things that can help:

 

  1. If you can, reduce the number of records you are using to undertake transformation / cleansing processes.  If you can reduce it to say 20,000 records, you will find that the process to undertake the transformations will be far quicker.  Once you have successfully completed these, then you can add the population of records for the load.
  2. Power BI takes a lot more time processing text-based fields than numerical fields.  Two things to consider:
    1. Although 15 columns isn't much, remove what you don't need.
    2. If you have records with numerical values sitting in Text-based format, convert them to numerical values.
  3. The other thing to consider is the nature of your data. If your historical data remains that way and does not require retrospective change (e.g. you're not dealing with something like insurance claim data that the status may not be closed until 5-6 months from now, etc.), you might use parameters to limit the volume of data you apply in your refresh (i.e. if you have 2 years worth of data and only the last month is subject to change, you might create a parameter to only refresh the last month rather than the last 2 years).

Check out this link as well: https://docs.microsoft.com/en-us/power-bi/guidance/power-bi-optimization

 

There are a lot of ways you can reduce the load times and refresh rates. Hopefully the above will help guide you in the right direction.

 

All the best.

Theo

 

If I have posted a response that resolves your question, please accept it as a solution to formally close the post.

Also, if you are as passionate about Power BI, DAX and data as I am, please feel free to reach out if you have any questions, queries, or if you simply want to connect and talk to another data geek!

Want to connect?www.linkedin.com/in/theoconias

Helpful resources

Announcements
Europe Fabric Conference

Europe’s largest Microsoft Fabric Community Conference

Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.

Power BI Carousel June 2024

Power BI Monthly Update - June 2024

Check out the June 2024 Power BI update to learn about new features.

RTI Forums Carousel3

New forum boards available in Real-Time Intelligence.

Ask questions in Eventhouse and KQL, Eventstream, and Reflex.