Power BI is turning 10! Tune in for a special live episode on July 24 with behind-the-scenes stories, product evolution highlights, and a sneak peek at what’s in store for the future.
Save the dateEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
Hi,
I have set a dataflow that is refreshed incrementally due to the size of data (many millions rows of data). That refreshes in 30seconds up to a 1minute every day (very happy).
Now I would like to also incrementally refresh the power bi report/dataset that utilizes the above dataflow.
However if I do that the refresh in the power bi service times out after 5 hours. or it does not seem to work.
What am I doing wrong? Do you have any experience setting this up?
I have seen a reponse from May this year regarding a similar issue where a responder said something along the lines of either incrementally refresh dataflow or the dataset, not both as that apparently "makes no sense". Which to me such response make no sense.
If I need to pull, let say 100mil rows into a dataflow, it needs to be incrementally refreshed so I only pull a small portion of it, every day.
And when I then want to utilize this dataflow/data in a power bi report/dataset I don't see any valid reason why would I need to pull the entiry 100mil rows every day. That is very wastefull.
Solved! Go to Solution.
After a week of trial and errors I finially figured it out. I am hoping that this will help other people in a similar situation...
Issue: Incrementally refreshed dataset against a dataflow is either slow or is not working at all on a larger models
Cause: Query folding is unavailble by default when refreshing against dataflow
Fix: In the dataflow setting that you use in your dataset, Go to Settings ->Enhanced compute engine settings -> select "On" option ("Optimized" is selected by default)
Details: Query folding is normally available when you import tables from a sql database into your dataset. This is also true when you import tables into your dataflow. However when you connect a dataset to a dataflow the "view native query" is always greyed out.
Query folding is switched off by default.
You have to tick "Turn on the enhanced compute engine for this dataflow" in each dataflow against which you are planning to run an incremental refresh. This will make the query folding, correct utilization of StartDate, EndDate parameters in your dataset, work and dataflow now understands what the dataset actually wants and can target import only selected rows.
In my case a dataset (quite large) that was not able to refresh within 5hours, is now able to refresh within 40seconds (Incremenatl refresh on TransactionDate+ LMDT changes are both utilized).
After a week of trial and errors I finially figured it out. I am hoping that this will help other people in a similar situation...
Issue: Incrementally refreshed dataset against a dataflow is either slow or is not working at all on a larger models
Cause: Query folding is unavailble by default when refreshing against dataflow
Fix: In the dataflow setting that you use in your dataset, Go to Settings ->Enhanced compute engine settings -> select "On" option ("Optimized" is selected by default)
Details: Query folding is normally available when you import tables from a sql database into your dataset. This is also true when you import tables into your dataflow. However when you connect a dataset to a dataflow the "view native query" is always greyed out.
Query folding is switched off by default.
You have to tick "Turn on the enhanced compute engine for this dataflow" in each dataflow against which you are planning to run an incremental refresh. This will make the query folding, correct utilization of StartDate, EndDate parameters in your dataset, work and dataflow now understands what the dataset actually wants and can target import only selected rows.
In my case a dataset (quite large) that was not able to refresh within 5hours, is now able to refresh within 40seconds (Incremenatl refresh on TransactionDate+ LMDT changes are both utilized).
Thank you for your input.
The report has to be in working condition sooner or later so if your suggestion is the only thing I can do I will (or just reverse to just using pbix file for reporting and ETL.
As a next step I am thinking to raise a support ticket and get an official view from a Microsoft team. I wonder what their take would be on this - and I will post it here for others that may experience same issue.
Secondary step is to try to test it on our PBI Premium setup and see whether the behaviour is the same (out of curiosity).
One thing to keep in mind: Dataflow incremental refresh is a horrible black box. You have no control over anything.
Dataset incremental refresh follows the general SSAS partition refresh pattern and is highly controllable via XMLA.
Interesting, that is a good point. Thank you.
I don't see any valid reason why would I need to pull the entiry 100mil rows every day. That is very wastefull.
Sort of. Consider eliminating the dataflow and only using the semantic model.
However if I do that the refresh in the power bi service times out after 5 hours
The limit for a partition refresh is 5 hours or 10 GB, whatever comes first. Use bootstrapping to create the partitions without filling them and then refresh them individually, one by one.
Troubleshoot incremental refresh and real-time data - Power BI | Microsoft Learn
Thank you for your quick response.
Regarding partitions - The incremental refreshes are set in both dataset and dataflow in the same way (by month, refresh last 10 days + utilize LMDT on top to grab just the changes.
The total data size is around 4.5GB.
The reason why I try to go via dataflow, is to limit the load on the on-premise server (which is a perfomace bottleneck in my case). Ie. Get the data as fast as possible to load of the raw tables/data into the power bi service, and then do the additionall processin utilizing the power bi service + the additional benefit of dataflows being available to other users on the power bi service.
The dataflow import works much faster than original import via fully fledged pbix report file. I was/ still am hoping I can still utilize the dataflow/dataset this way.
So, original question still stands - is it possible to incementally refresh a pbix model that is connected to incremenatlly refreshed dataflow?
Nothing stopping me from setting both part up with incremental refresh, but the dataset refresh behaviour so far seem odd - there is something I am still not understanding here.
My recommendation in that case would be to not use incremental refresh in the dataset, and instead swallow your engineering pride and re-ingest the dataflow each time.