Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get certified in Microsoft Fabric—for free! For a limited time, get a free DP-600 exam voucher to use by the end of 2024. Register now

Reply
JoshT
Advocate II
Advocate II

Dataflows - linked table refreshes in different workspaces

I've been experimenting with what happens when you have a dataflow in workspace A, with a linked table referencing workspace A in a dataflow in workspace B. What I've done and what I've found is:

 

  • Create the dataflow in workspace A with some sample data (I was using a simple user table) and refresh it
  • Create the dataflow in workspace B using a linked table referencing the one in workspace A, and don't refresh it
  • Reference the dataflow in workspace B Power BI desktop and refresh the underlying semantic model - table contains the data from the workspace A dataflow (i.e. didn't need to refresh workspace B dataflow)
  • Add new data to dataflow in workspace A and refresh workspace A dataflow, not workspace B dataflow
  • Refresh in Power BI desktop, table does not contain new data
  • Refresh dataflow in workspace B, refresh in Power BI desktop, now table contains new data

 

The documentation on linked tables clearly states that 'linked tables simply point to the tables in other dataflows, and don't copy or duplicate the data'. However, by showing that referencing dataflow in workspace B, we return an older version of the data until that dataflow is refreshed, we can surmise one of two things is happening:

  • dataflow B is in fact copying the data and maintaining its own copy (i.e. documentation is wrong)
  • dataflow B is using a pointer as the documentation says, but different versions of dataflow A are maintained and we're pointing to an older version

I understand that downstream queries might need a refresh, but based on the documentation I would believe that linked tables would simply show the updates in the source. Since there's no efficiency benefits, and as other posts have pointed out, using linked tables requires dataflow B users to have access to dataflow A as well (preventing us from creating a 'presentation layer' of dataflows), why would anyone use linked entities in a different workspace, and not simply disable the load and create a query like 'let Source = LinkedTable in Source' to provide that data?

1 ACCEPTED SOLUTION
johnbasha33
Super User
Super User

@JoshT 

Your observations raise some interesting points about the behavior of linked tables in Power BI dataflows across different workspaces. Let's address your findings and concerns:

1. **Documentation Accuracy**: It's possible that the documentation on linked tables may not fully capture the nuances of their behavior, particularly when used across workspaces. Microsoft's documentation is generally reliable, but there may be scenarios or edge cases where the behavior deviates from what's described.

2. **Data Copy vs. Pointer**: The behavior you observed could suggest that dataflow B might be maintaining its own copy of the data from dataflow A, rather than just pointing to it. This would contradict the documentation's statement that linked tables don't copy or duplicate the data. However, without access to the underlying implementation details of Power BI's dataflows, it's challenging to conclusively determine the exact mechanism at play.

3. **Versioning and Refresh**: Another possibility is that dataflow B is indeed using a pointer to dataflow A, but it's referencing a specific version or snapshot of dataflow A. This would explain why refreshing dataflow A doesn't automatically propagate changes to dataflow B until the latter is refreshed explicitly.

4. **Usage and Best Practices**: Given the complexities and potential limitations of linked tables, it's worth considering whether they're the most appropriate solution for your scenario. As you noted, there may be little efficiency benefit compared to directly querying the source dataflow. Additionally, managing access permissions across multiple workspaces can add complexity.

5. **Alternative Approaches**: Disabling the load of linked entities and creating custom queries to reference dataflows directly may indeed offer more flexibility and control over data access and refresh behavior. This approach allows you to explicitly manage the flow of data and updates without relying on the behavior of linked tables.

In conclusion, while linked tables provide a convenient way to reference data across different workspaces in Power BI dataflows, their behavior and limitations warrant careful consideration. It's essential to understand how they operate in your specific scenario and evaluate alternative approaches to ensure optimal data management and performance.

Did I answer your question? Mark my post as a solution! Appreciate your Kudos !!

View solution in original post

2 REPLIES 2
johnbasha33
Super User
Super User

@JoshT 

Your observations raise some interesting points about the behavior of linked tables in Power BI dataflows across different workspaces. Let's address your findings and concerns:

1. **Documentation Accuracy**: It's possible that the documentation on linked tables may not fully capture the nuances of their behavior, particularly when used across workspaces. Microsoft's documentation is generally reliable, but there may be scenarios or edge cases where the behavior deviates from what's described.

2. **Data Copy vs. Pointer**: The behavior you observed could suggest that dataflow B might be maintaining its own copy of the data from dataflow A, rather than just pointing to it. This would contradict the documentation's statement that linked tables don't copy or duplicate the data. However, without access to the underlying implementation details of Power BI's dataflows, it's challenging to conclusively determine the exact mechanism at play.

3. **Versioning and Refresh**: Another possibility is that dataflow B is indeed using a pointer to dataflow A, but it's referencing a specific version or snapshot of dataflow A. This would explain why refreshing dataflow A doesn't automatically propagate changes to dataflow B until the latter is refreshed explicitly.

4. **Usage and Best Practices**: Given the complexities and potential limitations of linked tables, it's worth considering whether they're the most appropriate solution for your scenario. As you noted, there may be little efficiency benefit compared to directly querying the source dataflow. Additionally, managing access permissions across multiple workspaces can add complexity.

5. **Alternative Approaches**: Disabling the load of linked entities and creating custom queries to reference dataflows directly may indeed offer more flexibility and control over data access and refresh behavior. This approach allows you to explicitly manage the flow of data and updates without relying on the behavior of linked tables.

In conclusion, while linked tables provide a convenient way to reference data across different workspaces in Power BI dataflows, their behavior and limitations warrant careful consideration. It's essential to understand how they operate in your specific scenario and evaluate alternative approaches to ensure optimal data management and performance.

Did I answer your question? Mark my post as a solution! Appreciate your Kudos !!

Thanks for your response @johnbasha33 , helps me know I'm not going mad. I would imagine the most likely scenario is that the data is being copied and the docs should clarify this, otherwise we might have seen some functionality around versioning of dataflows.

 

My main reason for asking this question was to help one of our users with an architectural decision - it seems like the conclusion is that there simply isn't any reason to or advantage of using linked tables when we're in a different workspace, unless it happens to be in premium anyway. No performance benefits, no additional functionality

Helpful resources

Announcements
November Carousel

Fabric Community Update - November 2024

Find out what's new and trending in the Fabric Community.

Live Sessions with Fabric DB

Be one of the first to start using Fabric Databases

Starting December 3, join live sessions with database experts and the Fabric product team to learn just how easy it is to get started.

Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early Bird pricing ends December 9th.

Nov PBI Update Carousel

Power BI Monthly Update - November 2024

Check out the November 2024 Power BI update to learn about new features.