Solved: Re: Best Practice Solution Architecture using data...

HSpeelman · ‎10-28-2022

Hi,

I am wondering what the best approach would be where my customer would have a data warehouse and they want to use power BI. Their only source is the data warehouse, and there are several areas that have overlapping dimensions (like DIM_CALENDAR, DIM_TIME, ...)

A solution architecture would be to create the data cycle as follows:

Dataflows -> Dataset -> Reports (where the dataset is based on several dataflow objects. That are in this case 1 dataflow per 1 dimension/ fact table from the data warehouse)

or simply

Dataset -> Reports (where each dataset imports tables that could be used in other datasets)

Requirement of the customer is that everything needs to be kept simple, so they just create datasets without even considering dataflows. However I often read that you need to create dataflows before creating datasets to keep all information in place. Their opinion is that all the logic should be kept in the data warehouse, and that a dataflow only overcomplicates things when sourcing from the data warehouse.

Are there any guidelines in this?

Thank you in advance!

Cheers,

Henny

m_alireza · ‎10-29-2022

Hi @HSpeelman ,

When you mentioned Datasets -> Reports, do you mean every report will just have its own dataset? Or are you referring to a shared dataset? (i.e. you create one 'master' published dataset and then multiple reports connect to that one dataset via live connection...)

This article explains the problems of having every report connected to a different dataset and recommends the concept of a golden dataset (i.e. shared dataset):
https://exceleratorbi.com.au/new-power-bi-reports-golden-dataset/

I believe this works fine for small organisations. However, once you grow larger, our team personally found issues with the golden dataset approach as well. It soon became unmanageable as more tables were added. Refresh times were becoming too slow. Also, if one developer was using the golden dataset, no one else could access it so it was inefficient.

Due to that, we decided to go for a data flow -> dataset -> report model.
This provided us with the following benefits:
1) multiple developers can work on different data flows at the same time (for example, one person can work on the 'contracts' data flow, another on the 'cost' data flow)
2) We could control refreshes better. For example, refreshing cost data flow every 3 hours, but' contract data flow only once a month, etc.

I think data flows are useful if you have the same table (like a master date table) that you want to use across multiple reports.
This article explains the data flow -> dataset -> report architecture quite well:
Power BI Architecture for Multi-Developer - RADACAD

so I dont think there is a one size fits all approach to this. It really depends on the organisation size, how Power BI is governed there, etc.

But I think those two articles will give you a good basis as to whether you should go for a shared dataset approach vs data flow - > dataset approach or neither and just stick with each report having its own dataset.

View solution in original post

m_alireza · ‎10-29-2022

Hi @HSpeelman ,

When you mentioned Datasets -> Reports, do you mean every report will just have its own dataset? Or are you referring to a shared dataset? (i.e. you create one 'master' published dataset and then multiple reports connect to that one dataset via live connection...)

This article explains the problems of having every report connected to a different dataset and recommends the concept of a golden dataset (i.e. shared dataset):
https://exceleratorbi.com.au/new-power-bi-reports-golden-dataset/

I believe this works fine for small organisations. However, once you grow larger, our team personally found issues with the golden dataset approach as well. It soon became unmanageable as more tables were added. Refresh times were becoming too slow. Also, if one developer was using the golden dataset, no one else could access it so it was inefficient.

Due to that, we decided to go for a data flow -> dataset -> report model.
This provided us with the following benefits:
1) multiple developers can work on different data flows at the same time (for example, one person can work on the 'contracts' data flow, another on the 'cost' data flow)
2) We could control refreshes better. For example, refreshing cost data flow every 3 hours, but' contract data flow only once a month, etc.

I think data flows are useful if you have the same table (like a master date table) that you want to use across multiple reports.
This article explains the data flow -> dataset -> report architecture quite well:
Power BI Architecture for Multi-Developer - RADACAD

so I dont think there is a one size fits all approach to this. It really depends on the organisation size, how Power BI is governed there, etc.

But I think those two articles will give you a good basis as to whether you should go for a shared dataset approach vs data flow - > dataset approach or neither and just stick with each report having its own dataset.

Kudy · ‎06-07-2023

Hi @m_alireza ,

very good explained answer. I actually posted a similar question (Multi-developer architecture for Power BI - Microsoft Fabric Community) and was wondering if you, by any chance, had an answer to below?

I've decided to go proceed with dataflow -> dataset -> multiple reports model.
So my question is whether ALL my tables (both big and small) should be in separate dataflows

OR only the small/lookup tables (like calendar or product tables) as Dataflows? I cannot see any size limitations and therefore cant understand why everyone recommends only small tables as dataflows.

Best Practice Solution Architecture using dataflows and datasets

Helpful resources

Fabric Community Update - July 2025

Power BI Monthly Update - July 2025

Join our Fabric User Panel

Party with Power BI’s own Guy in a Cube

Best Practice Solution Architecture using dataflows and datasets

Helpful resources

Fabric Community Update - July 2025

Power BI Monthly Update - July 2025

Join our Fabric User Panel