Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

The ultimate Microsoft Fabric, Power BI, Azure AI & SQL learning event! Join us in Las Vegas from March 26-28, 2024. Use code MSCUST for a $100 discount. Register Now

Reply
dskrobowski
Helper I
Helper I

When to Use a Dataflow vs a Golden Dataset

Hello, I'm struggling to understand the utility of Power BI dataflows over using a "golden" dataset (promoted/shared for use in multiple reports). I haven't found a post that pitted these two against each other in this way yet.

 

Assume that I have an average-quality data-warehouse function in my org. Assume also that I don't care about sharing prepared data with other parts of the Power Platform; I'm only interested in optimal Power BI architecture and implementation.

 

Here are my thoughts. Please feel free to correct me where I'm wrong.

  1. Both Power BI Dataflows and Power BI Datasets comprise the data. Despite some calling datasets merely the "semantic model on top of data," my understanding is that, yes, the dataset contains this semantic-model layer but that it also consists (in the case of "import mode") of the Vertipaq-compressed data(base). This is why datasets consume space and have size limitations in your PBI account.
  2. The issue of reducing queries on original data sources is solved by both dataflows and "golden" datasets. For the latter (as long as you're using import-mode), promoting, sharing, and reusing the dataset is the solution--hence the "golden" name. The only savings that I could see by using dataflows for query reduction is when you start to have multiple "golden" datasets for different business solutions, but I feel like I'd need to have a lot of these for query reduction to matter. Obviously, if your golden datasets are DirectQuery datasets, then querying dataflows will reduce load on the original source.
  3. Assuming that I'm using import-mode, why would I want to mirror my data twice--once in the dataflow and again in the compressed dataset?
  4. If you're using import-mode already, then trying to implement dataflows will either needlessly mirror your data or (if you DirectQuery to the dataflow for your report) unnecessarily prevent you from using many DAX functions in the dataset-layer (due to DAX limitations for DirectQuery sources).
  5. Given the above and given that I want to leverage the speed and flexibility of compressed columnar data and full DAX, I can't see why I would think of using dataflows.

 

All guidance is greatly appreciated.

1 ACCEPTED SOLUTION
lbendlin
Super User
Super User

Here's my biased opinion:  Dataflows are glorified CSV / Parquet files.  They can be useful when you have a slow, badly performing data source and you want to shield your developers from that. The dataflow data can be spooled efficiently into a dataset.

 

A golden dataset is not only a bunch of tables, it also represents the data model  (although that can be circumvented).  If well maintained (with some incremental refresh and selective partition refresh sprinkled in) it will be a great starting point for a unified company wide data model.  Compared to the professional data stewardship tools it is more of a toy, though.

 

Golden datasets come with their own baggage - specifically all the access issues around dataset chaining, but also design limitations in composite data models.

 

Dataflows do nothing for Power BI report consumers.  To create a pleasant experience for report users you should favor (import mode) datasets over dataflows, and accept the added cost for developers.

 

Recently someone asked if they should use an incremental dataset on top of an incremental dataflow.  That's on the same level as Direct Query against dataflows.

View solution in original post

2 REPLIES 2
lbendlin
Super User
Super User

Here's my biased opinion:  Dataflows are glorified CSV / Parquet files.  They can be useful when you have a slow, badly performing data source and you want to shield your developers from that. The dataflow data can be spooled efficiently into a dataset.

 

A golden dataset is not only a bunch of tables, it also represents the data model  (although that can be circumvented).  If well maintained (with some incremental refresh and selective partition refresh sprinkled in) it will be a great starting point for a unified company wide data model.  Compared to the professional data stewardship tools it is more of a toy, though.

 

Golden datasets come with their own baggage - specifically all the access issues around dataset chaining, but also design limitations in composite data models.

 

Dataflows do nothing for Power BI report consumers.  To create a pleasant experience for report users you should favor (import mode) datasets over dataflows, and accept the added cost for developers.

 

Recently someone asked if they should use an incremental dataset on top of an incremental dataflow.  That's on the same level as Direct Query against dataflows.

Thanks for sharing your perspective!

Helpful resources

Announcements
Fabric Community Conference

Microsoft Fabric Community Conference

Join us at our first-ever Microsoft Fabric Community Conference, March 26-28, 2024 in Las Vegas with 100+ sessions by community experts and Microsoft engineering.

February 2024 Update Carousel

Power BI Monthly Update - February 2024

Check out the February 2024 Power BI update to learn about new features.

Fabric Career Hub

Microsoft Fabric Career Hub

Explore career paths and learn resources in Fabric.

Fabric Partner Community

Microsoft Fabric Partner Community

Engage with the Fabric engineering team, hear of product updates, business opportunities, and resources in the Fabric Partner Community.

Top Solution Authors
Top Kudoed Authors