Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get inspired! Check out the entries from the Power BI DataViz World Championships preliminary rounds and give kudos to your favorites. View the vizzies.

Reply
etane
Helper II
Helper II

How To: Distinct Data Storage and Data Transformation Dataflows

Hello.

 

I am using Fabric space.  And, I am trying to create reporting from a data source that is still work in progress.  

 

Currently, I have a single Dataflow pulling in 10 tables and merging them into a single table.  However, when a modification is applied to the datasource, the Dataflow would cease to refresh until after I fix whatever parameter is now missing or changed from the datasource.

 

I thought to create 10 Dataflows with each pulling one table from the datasource.  Then, merge and transform the data in an 11th dataflow which is linked to the first 10.  Hence, if one of the tables is preventing refresh, it's easy to isolate which table is the culprit, and this table will not prevent other tables from refreshing.  However, I just learned that I cannnot transform a linked dataflow.  So, this plan isn't working out.  

 

So, is there another method where I could separate the datasource load from the data transformation, so I could locate where datasource changes occur before the data transformation?

 

Thanks.

 

FYI datasource is a datawarehouse connected to Service via MySQL connection.

2 ACCEPTED SOLUTIONS
v-veshwara-msft
Community Support
Community Support

Hi @etane ,
Thanks for posting in Microsoft Fabric Community.
To address the issue of separating Data Storage and Data Transformation Dataflows, you can slightly modify the approaches you previously mentioned as follows:

Approach 1: Isolated Queries in a Single Dataflow
Instead of multiple dataflows, use a single dataflow with isolated queries for each table, configured to output as separate entities.
Steps:
1. Modify the Current Dataflow:
Instead of merging the 10 tables within the same query, treat each table as a separate query in the dataflow.Do not merge them—just load each query as a raw table.
2. Output Separate Entities:
In the dataflow, configure each query to output as a separate table.
Add Lakehouse or Warehouse as destination to load these tables.
3. Create a Transformation Dataflow:
Use a second dataflow to pull the loaded tables from the Lakehouse/Warehouse.
Merge and transform these tables in the second dataflow.
4. Handle Schema Changes:
When a schema change occurs in the MySQL source, only the query for the affected table will fail.
You can fix that specific query without affecting the other queries or the transformations in the second dataflow.



Approach 2: Using Individual Dataflows for Each Table
Steps:
1. Individual Dataflows for Table Loading
Create 10 separate dataflows, each for pulling one table from the MySQL data warehouse.
These dataflows will only extract data and load it into Fabric without any transformations.
2. Staging Layer
Save the outputs of these dataflows into a staging Lakehouse or Warehouse within Fabric.
Use the staging area as a temporary holding point for raw data.
3. Transformation Layer
Create a new dataflow or Data Pipeline to combine, merge, and transform the data from the staging area.
Since the staging tables are static within Fabric, the transformations will not break due to changes in the source schema.



Please let us know if you face any issues while implementing these approaches.

If this post helps, please consider accepting as solution to help others find easily and a kudos would be appreciated.

 
Best regards,
Vinay.

 

 

View solution in original post

etane
Helper II
Helper II

I found an easier way to solve the issue.  Dataflow Gen 2 allows me to pull in tables from all the other Dataflows then perform transformation.  So, load data into the first 10 dataflows, then use Dataflow Gen 2 in the 11th dataflow to combine them all.  

And, for the dataflow that does not refresh, in my case it's usually a tempermental Data Gateway issue, I use Power Automate to refresh dataflows that failed to refresh.

View solution in original post

4 REPLIES 4
etane
Helper II
Helper II

I found an easier way to solve the issue.  Dataflow Gen 2 allows me to pull in tables from all the other Dataflows then perform transformation.  So, load data into the first 10 dataflows, then use Dataflow Gen 2 in the 11th dataflow to combine them all.  

And, for the dataflow that does not refresh, in my case it's usually a tempermental Data Gateway issue, I use Power Automate to refresh dataflows that failed to refresh.

v-veshwara-msft
Community Support
Community Support

Hi @etane ,
Thanks for posting in Microsoft Fabric Community.
To address the issue of separating Data Storage and Data Transformation Dataflows, you can slightly modify the approaches you previously mentioned as follows:

Approach 1: Isolated Queries in a Single Dataflow
Instead of multiple dataflows, use a single dataflow with isolated queries for each table, configured to output as separate entities.
Steps:
1. Modify the Current Dataflow:
Instead of merging the 10 tables within the same query, treat each table as a separate query in the dataflow.Do not merge them—just load each query as a raw table.
2. Output Separate Entities:
In the dataflow, configure each query to output as a separate table.
Add Lakehouse or Warehouse as destination to load these tables.
3. Create a Transformation Dataflow:
Use a second dataflow to pull the loaded tables from the Lakehouse/Warehouse.
Merge and transform these tables in the second dataflow.
4. Handle Schema Changes:
When a schema change occurs in the MySQL source, only the query for the affected table will fail.
You can fix that specific query without affecting the other queries or the transformations in the second dataflow.



Approach 2: Using Individual Dataflows for Each Table
Steps:
1. Individual Dataflows for Table Loading
Create 10 separate dataflows, each for pulling one table from the MySQL data warehouse.
These dataflows will only extract data and load it into Fabric without any transformations.
2. Staging Layer
Save the outputs of these dataflows into a staging Lakehouse or Warehouse within Fabric.
Use the staging area as a temporary holding point for raw data.
3. Transformation Layer
Create a new dataflow or Data Pipeline to combine, merge, and transform the data from the staging area.
Since the staging tables are static within Fabric, the transformations will not break due to changes in the source schema.



Please let us know if you face any issues while implementing these approaches.

If this post helps, please consider accepting as solution to help others find easily and a kudos would be appreciated.

 
Best regards,
Vinay.

 

 

@v-veshwara-msft  Thank you for your detailed reply!

 

I am new to Fabric and will have to learn how to use lakehouse and setting up staging area. 

 

I did try pulling in a couple of dataflows via individual dataflows into lakehouse.  I then pulled the two dataflows into a single dataflow in a different workspace then merged the two tables.  However, I am getting the same cannot transform linked tables prompt.  Maybe I am not setting the up the data in the staging area correctly?

 

I am searching for tutorials.  If you know of some, please share.

etane
Helper II
Helper II

Ah shortly after posting this question, I think I found the solution.

 

I pulled another Dataflow, same one as Datatflow 1, then I linked all the Dataflows 2 - 10 to the copy of Dataflow 1.  

Then, I can merge all tables 2-9 to table 1.  

 

Edit: This method doesn't work.  The Dataflow saves  But, it doesn't refresh.  I get the "Error: On-Prem execution not supported for entity... since it refers to another entity" message.

Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!

FebPBI_Carousel

Power BI Monthly Update - February 2025

Check out the February 2025 Power BI update to learn about new features.