The ultimate Microsoft Fabric, Power BI, Azure AI, and SQL learning event: Join us in Stockholm, September 24-27, 2024.
Save €200 with code MSCUST on top of early bird pricing!
Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started
Our data warehouse Sales mart has several fact tables and dimension tables, so in building the Power BI dataset(s) to support Sales reporting we need to make a choice about the scope of what to include. This seems to boil down to options on a spectrum between:
In theory the first option is most flexible in supporting the widest range of reports without having to duplicate logic, and allows reports to contain content from different facts (essential for calculating things like conversion rates). However, it's more fragile than a more modular approach because if one table breaks in the data refresh, all the reports fail.
This seems to be a trade-off of efficiency and flexibility on one hand vs fragility on the other.
Is this the right way to see it and are there approaches that can get the best of both - i.e. avoid the trade-off.
Solved! Go to Solution.
Understanding and optimizing dataflows refresh - Power BI | Microsoft Docs
It's slightly unclear as to whether this is just incremental refreshes or all refreshes but I think combined with the above confirms it:
"In any of these refresh scenarios, if a refresh fails the data is not updated, which means that your data might be stale until the latest refresh completes, or you refresh it manually and it completes without error. Refresh occurs at a partition or entity, so if an incremental refresh fails, or an entity has an error, then the entire refresh transaction will not occur. Said another way, if a partition (incremental refresh policy) or entity fails for a particular dataflow, the entire refresh operation is set to fail, and no data gets updated."
If you move all your power query logic into modular data flows then they can refresh separately.
Your golden dataset then has minimal power query for each table to reference the relevant dataflows.
In the event that a data flow fails to refresh your dataset will still refresh but be left with old data from the failed data flow but the rest will be fine.
Thanks @bcdobbs
Do you have a link to any documentation on how dataflows handle a failed refresh in terms of the dataset they contain?
Still looking for some documentation however test below appears to confirm my original suggestion. My understanding is that under the covers data flows are persisted as text files in a data lake so assuming old test files are only over written after a full refresh. (I beleive if you hook up your own data lake to it you can see the files).
Test setup:
1) Two dataflows each referencing a different table stored in SQL Server (each just has one entity but they could be modular):
2) They both have a a step that does a distinct on a named column eg:
4) "TableOneLabel" is renamed in SQL database to "TableOneNewLabel"
5) Both data flows are refreshed and Table One gets a refresh error:
6) Dataset is refreshed. This completes with no errors and original data is present:
Understanding and optimizing dataflows refresh - Power BI | Microsoft Docs
It's slightly unclear as to whether this is just incremental refreshes or all refreshes but I think combined with the above confirms it:
"In any of these refresh scenarios, if a refresh fails the data is not updated, which means that your data might be stale until the latest refresh completes, or you refresh it manually and it completes without error. Refresh occurs at a partition or entity, so if an incremental refresh fails, or an entity has an error, then the entire refresh transaction will not occur. Said another way, if a partition (incremental refresh policy) or entity fails for a particular dataflow, the entire refresh operation is set to fail, and no data gets updated."
@Anonymous You've effectively hit the nail on the head for the most part. The new DirectQuery for Live datasets capability has the potential to provide the best of both worlds so I would check that out if you have not already.
Thanks @Greg_Deckler - appreciate you taking the time to respond and help me understand my options better
Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.
Check out the August 2024 Power BI update to learn about new features.
User | Count |
---|---|
121 | |
85 | |
75 | |
55 | |
46 |
User | Count |
---|---|
134 | |
124 | |
78 | |
64 | |
63 |