Solved: Reports using One "golden data set" or multiple ...

Anonymous · ‎05-15-2021

Hello - We are trying to implement an enterprise BI best practice setup for our company.

Our sysadmin has developed about 60 dataflows, most of which connect via API's to our ERP and CRM systems. All of these dataflows "flow" into a dataset which he views as the master data set that everyone should connect to. @MattAllington that sort of addresses this setup.

However, there is a video titled Modern Enterprise Bi from the 2020 summit that seems to show another way. In their demo they show different departmental level datasets (such as one for Finance, and one for Sales) and then bringing them together as a composite model.

Which of these approaches is the prefered approach?

TomMartens · ‎05-15-2021

Hey @Anonymous ,

this is an interesting question, here is my take ...

I consider it a good idea to use dataflows, as this makes the entities contained inside a dataflow reusable.

Now I'm wondering if your sysadmin is also a domain expert in all business aspects touched by these dataflows. If not maybe you might consider taking over the responsibility for the development of single dataflows, at least provide input to some of the entities. Here I'm talking about the role of a data steward, I'm talking about the responsibility for the content of an entity from a business perspective. As we are talking about a toolset that does not need a deep understanding of certain IT development concepts like concurrency, async vs sync, inheritance, etc. I often recommend that data stewardship also means the development of the dataflows. The benefits to "Citizen Data Engineering". This separation of course comes with a burden, spreading and adhering to best practices across different teams to meet agreed quality standards. I consider this the price for becoming more agile.

Now the dataset ...

I consider it a good idea to have a single dataset, as it depicts the enterprise in its entity. It's likely that it takes its time to develop an enterprise dataset, if we need a dataset now, then we might consider creating our own dataset. In the upcoming years there will be much debate about these different approaches, it reminds me of the debate between the acolytes of Inmon (the enterprise data warehouse) and Kimball (one business process after the other). There is a necessity for composite data models, the existence of departmental models.

Creating a composite data model comes with a price tag (DirectQuery for Power BI datasets and Azure Analysis Services (preview) | Microsoft Power BI Blog | Microsoft Power BI), as report users need build permission on the underlying datasets.

Next to that, it is likely that calculation items from one dataset combined with measures from another dataset can return unexpected results.

Of course, there are benefits of a composite model as single components can be distributed across different capacities (assuming Premium or Premium Per User is at hand), this can be useful if certain data has to reside in a certain data center for compliance reasons, as it must not cross borders for data storage. Another reason for a composite model is spreading datasets across capacities to overcome performance bottlenecks due to concurrency (harvesting the benefits of distributed computing)

From my experience composite models is nothing that we should aim for, meaning break an existing model apart.

Hopefully, this helps to make your decision.

Regards,

Tom

Did I answer your question? Mark my post as a solution, this will help others!

Proud to be a Super User!
I accept Kudos 😉
Hamburg, Germany

View solution in original post

TomMartens · ‎05-15-2021

Hey @Anonymous ,

this is an interesting question, here is my take ...

I consider it a good idea to use dataflows, as this makes the entities contained inside a dataflow reusable.

Now I'm wondering if your sysadmin is also a domain expert in all business aspects touched by these dataflows. If not maybe you might consider taking over the responsibility for the development of single dataflows, at least provide input to some of the entities. Here I'm talking about the role of a data steward, I'm talking about the responsibility for the content of an entity from a business perspective. As we are talking about a toolset that does not need a deep understanding of certain IT development concepts like concurrency, async vs sync, inheritance, etc. I often recommend that data stewardship also means the development of the dataflows. The benefits to "Citizen Data Engineering". This separation of course comes with a burden, spreading and adhering to best practices across different teams to meet agreed quality standards. I consider this the price for becoming more agile.

Now the dataset ...

I consider it a good idea to have a single dataset, as it depicts the enterprise in its entity. It's likely that it takes its time to develop an enterprise dataset, if we need a dataset now, then we might consider creating our own dataset. In the upcoming years there will be much debate about these different approaches, it reminds me of the debate between the acolytes of Inmon (the enterprise data warehouse) and Kimball (one business process after the other). There is a necessity for composite data models, the existence of departmental models.

Creating a composite data model comes with a price tag (DirectQuery for Power BI datasets and Azure Analysis Services (preview) | Microsoft Power BI Blog | Microsoft Power BI), as report users need build permission on the underlying datasets.

Next to that, it is likely that calculation items from one dataset combined with measures from another dataset can return unexpected results.

Of course, there are benefits of a composite model as single components can be distributed across different capacities (assuming Premium or Premium Per User is at hand), this can be useful if certain data has to reside in a certain data center for compliance reasons, as it must not cross borders for data storage. Another reason for a composite model is spreading datasets across capacities to overcome performance bottlenecks due to concurrency (harvesting the benefits of distributed computing)

From my experience composite models is nothing that we should aim for, meaning break an existing model apart.

Hopefully, this helps to make your decision.

Regards,

Tom

Did I answer your question? Mark my post as a solution, this will help others!

Proud to be a Super User!
I accept Kudos 😉
Hamburg, Germany

Anonymous · ‎05-16-2021

Excellent, thank you @TomMartens I will certainly take all of this into consideration as we move forward in the coming days and weeks.

Reports using One "golden data set" or multiple certified data sets (and using composite models)?

Helpful resources

FabCon Global Hackathon

Power BI Monthly Update - October 2025

FabCon Atlanta 2026

FabCon is coming to Atlanta

Reports using One "golden data set" or multiple certified data sets (and using composite models)?

Helpful resources

FabCon Global Hackathon

Power BI Monthly Update - October 2025

FabCon Atlanta 2026