Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Calling all Data Engineers! Fabric Data Engineer (Exam DP-700) live sessions are back! Starting October 16th. Sign up.

Reply
Anonymous
Not applicable

Reports using One "golden data set" or multiple certified data sets (and using composite models)?

Hello -  We are trying to implement an enterprise BI best practice setup for our company.   

 

Our sysadmin has developed about 60 dataflows, most of which connect via API's to our ERP and CRM systems.    All of these dataflows "flow" into a dataset which he views as the master data set that everyone should connect to.   @MattAllington  that sort of addresses this setup.   

 

However, there is a video titled Modern Enterprise Bi from the 2020 summit that seems to show another way.  In their demo they show different departmental level datasets (such as one for Finance, and one for Sales) and then bringing them together as a composite model. 

 

Which of these approaches is the prefered approach?     

 

 

1 ACCEPTED SOLUTION
TomMartens
Super User
Super User

Hey @Anonymous ,

 

this is an interesting question, here is my take ...

 

I consider it a good idea to use dataflows, as this makes the entities contained inside a dataflow reusable.

Now I'm wondering if your sysadmin is also a domain expert in all business aspects touched by these dataflows. If not maybe you might consider taking over the responsibility for the development of single dataflows, at least provide input to some of the entities. Here I'm talking about the role of a data steward, I'm talking about the responsibility for the content of an entity from a business perspective. As we are talking about a toolset that does not need a deep understanding of certain IT development concepts like concurrency, async vs sync, inheritance, etc. I often recommend that data stewardship also means the development of the dataflows. The benefits to "Citizen Data Engineering". This separation of course comes with a burden, spreading and adhering to best practices across different teams to meet agreed quality standards. I consider this the price for becoming more agile.

Now the dataset ...

I consider it a good idea to have a single dataset, as it depicts the enterprise in its entity. It's likely that it takes its time to develop an enterprise dataset, if we need a dataset now, then we might consider creating our own dataset. In the upcoming years there will be much debate about these different approaches, it reminds me of the debate between the acolytes of Inmon (the enterprise data warehouse) and Kimball (one business process after the other). There is a necessity for composite data models, the existence of departmental models.

Creating a composite data model comes with a price tag (DirectQuery for Power BI datasets and Azure Analysis Services (preview) | Microsoft Power BI Blog | Microsoft Power BI), as report users need build permission on the underlying datasets.

Next to that, it is likely that calculation items from one dataset combined with measures from another dataset can return unexpected results.

Of course, there are benefits of a composite model as single components can be distributed across different capacities (assuming Premium or Premium Per User is at hand), this can be useful if certain data has to reside in a certain data center for compliance reasons, as it must not cross borders for data storage. Another reason for a composite model is spreading datasets across capacities to overcome performance bottlenecks due to concurrency (harvesting the benefits of distributed computing)

From my experience composite models is nothing that we should aim for, meaning break an existing model apart.

 

Hopefully, this helps to make your decision.

 

Regards,

Tom



Did I answer your question? Mark my post as a solution, this will help others!

Proud to be a Super User!
I accept Kudos 😉
Hamburg, Germany

View solution in original post

2 REPLIES 2
TomMartens
Super User
Super User

Hey @Anonymous ,

 

this is an interesting question, here is my take ...

 

I consider it a good idea to use dataflows, as this makes the entities contained inside a dataflow reusable.

Now I'm wondering if your sysadmin is also a domain expert in all business aspects touched by these dataflows. If not maybe you might consider taking over the responsibility for the development of single dataflows, at least provide input to some of the entities. Here I'm talking about the role of a data steward, I'm talking about the responsibility for the content of an entity from a business perspective. As we are talking about a toolset that does not need a deep understanding of certain IT development concepts like concurrency, async vs sync, inheritance, etc. I often recommend that data stewardship also means the development of the dataflows. The benefits to "Citizen Data Engineering". This separation of course comes with a burden, spreading and adhering to best practices across different teams to meet agreed quality standards. I consider this the price for becoming more agile.

Now the dataset ...

I consider it a good idea to have a single dataset, as it depicts the enterprise in its entity. It's likely that it takes its time to develop an enterprise dataset, if we need a dataset now, then we might consider creating our own dataset. In the upcoming years there will be much debate about these different approaches, it reminds me of the debate between the acolytes of Inmon (the enterprise data warehouse) and Kimball (one business process after the other). There is a necessity for composite data models, the existence of departmental models.

Creating a composite data model comes with a price tag (DirectQuery for Power BI datasets and Azure Analysis Services (preview) | Microsoft Power BI Blog | Microsoft Power BI), as report users need build permission on the underlying datasets.

Next to that, it is likely that calculation items from one dataset combined with measures from another dataset can return unexpected results.

Of course, there are benefits of a composite model as single components can be distributed across different capacities (assuming Premium or Premium Per User is at hand), this can be useful if certain data has to reside in a certain data center for compliance reasons, as it must not cross borders for data storage. Another reason for a composite model is spreading datasets across capacities to overcome performance bottlenecks due to concurrency (harvesting the benefits of distributed computing)

From my experience composite models is nothing that we should aim for, meaning break an existing model apart.

 

Hopefully, this helps to make your decision.

 

Regards,

Tom



Did I answer your question? Mark my post as a solution, this will help others!

Proud to be a Super User!
I accept Kudos 😉
Hamburg, Germany
Anonymous
Not applicable

Excellent, thank you @TomMartens    I will certainly take all of this into consideration as we move forward in the coming days and weeks.  

Helpful resources

Announcements
FabCon Global Hackathon Carousel

FabCon Global Hackathon

Join the Fabric FabCon Global Hackathon—running virtually through Nov 3. Open to all skill levels. $10,000 in prizes!

October Power BI Update Carousel

Power BI Monthly Update - October 2025

Check out the October 2025 Power BI update to learn about new features.

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.

Top Solution Authors
Top Kudoed Authors