Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

The ultimate Microsoft Fabric, Power BI, Azure AI & SQL learning event! Join us in Las Vegas from March 26-28, 2024. Use code MSCUST for a $100 discount. Register Now

Reply
LearninPowerBI
New Member

Question about how Dataflows and Dataset work

Hello fellow PowerBI users,

I just started using PowerBI dataflows and have some questions which I am hoping someone can answer. 

I have two dataflows, Dataflow A and B. Dataflow A connects to SQL Server 'XYZ' and a PostgreSQL DB 'PDB'. Dataflow B connects to few tables from Dataflow A and some additional Oracle tables and an Excel file. What I noticed is it also connects to SQL Server 'XYZ' and brings few tables in which could have used from Dataflow A but I am not sure why the developer chose to connect to the SQL Server separately. I verfied the table and underlying data is same at both the places. Now I have been tasked with optimizing the Dataflow B as its consuming lot of resources on the Premium capacity. In order to do that I need to get few things clarified.

 

1. Can we have a standalone dataflow? i.e. there is no Dataset which is getting populated at the end

2. Does the Data from a dataflow get saved anywhere ? I believe yes, in Azure Gen2 storage.

3. When we trigger a datalfow refresh which use another dataflow (In my case if I trigger Dataflow B refresh), does it run another dataflow as well? (in my case Dataflow A) or will it simly get data from Gen2 storage where output fo Dataflow A is saved? (Related to my 2nd question above)

4. If I have both the Datflows set to be refreshed every 2 hours, will there be any conflict?

5. Will using tables from existing Dataflow help with overall refresh time, capacity resource utilization reduction instead of again connecting to tables from Database?

 

Thanks!

1 ACCEPTED SOLUTION
lbendlin
Super User
Super User

1. yes, but why would you do that?

2. yes, Parqet files in the Azure cloud.  Not accessible directly.

3. no

4. define "conflict".  They potentially will be out of sync. But refreshes for dataflows and datasets do not impact the "current"  data until the refresh completes successfully at which point the data will be swapped in.  If the refresh fails then the "current"  data continues to be available.

5. Depends.  The purpose of a dataflow is to shield you (the developer) from slow data sources. It does nothing for your report users, and is not very useful if you have a high performance data source.

View solution in original post

2 REPLIES 2
LearninPowerBI
New Member

Thank you!

lbendlin
Super User
Super User

1. yes, but why would you do that?

2. yes, Parqet files in the Azure cloud.  Not accessible directly.

3. no

4. define "conflict".  They potentially will be out of sync. But refreshes for dataflows and datasets do not impact the "current"  data until the refresh completes successfully at which point the data will be swapped in.  If the refresh fails then the "current"  data continues to be available.

5. Depends.  The purpose of a dataflow is to shield you (the developer) from slow data sources. It does nothing for your report users, and is not very useful if you have a high performance data source.

Helpful resources

Announcements
Fabric Community Conference

Microsoft Fabric Community Conference

Join us at our first-ever Microsoft Fabric Community Conference, March 26-28, 2024 in Las Vegas with 100+ sessions by community experts and Microsoft engineering.

February 2024 Update Carousel

Power BI Monthly Update - February 2024

Check out the February 2024 Power BI update to learn about new features.

Fabric Career Hub

Microsoft Fabric Career Hub

Explore career paths and learn resources in Fabric.

Fabric Partner Community

Microsoft Fabric Partner Community

Engage with the Fabric engineering team, hear of product updates, business opportunities, and resources in the Fabric Partner Community.

Top Solution Authors
Top Kudoed Authors