Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.
In other words, what is the advantage of referencing tables in OneLake vs. referencing Delta Parquet tables via shortcut that are stored in ADLS gen2? This is a situation where one team wants to use Azure Synapse (not Fabric) on ADLS gen2 and another team wants to use that same ADLS gen2 data in Fabric via shortcuts. Is the Fabric team missing any features that would be available if that same data were in OneLake in this scenario?
Solved! Go to Solution.
Hi ,
Thanks for explaining your ask.
Yes, there are some features that the Fabric team would be losing by having the data in ADLS Gen2 and reading it via shortcuts vs. having it in OneLake.
Performance: OneLake tables are optimized for performance, especially for queries that involve joins and aggregations. The Fabric team may experience slower performance for queries that involve joins and aggregations if they are reading the data from ADLS Gen2 via shortcuts.
Security: OneLake tables can be secured with role-based access control (RBAC), which makes it easier to manage who has access to the data. The Fabric team would need to manage permissions to the ADLS Gen2 storage account and the Delta Parquet tables if they are reading the data from ADLS Gen2 via shortcuts. This could be more complex than managing permissions to OneLake tables, especially if there are multiple teams accessing the data.
Governance: OneLake tables can be governed with policies, which makes it easier to ensure that the data is used in a compliant manner. The Fabric team would need to develop and implement their own policies to govern the use of the data in ADLS Gen2 if they are reading the data from ADLS Gen2 via shortcuts. This could be more complex than using the built-in policies in OneLake.
In addition to these features, OneLake also offers a number of other advantages, such as:
Overall, OneLake is a more comprehensive and robust data management platform than ADLS Gen2. If you are planning to use Fabric to access the data, it is recommended to use OneLake to store and manage the data.
If the Fabric team needs the best performance, security, and governance features, then the data can be stored in OneLake. However, if the Fabric team is only concerned with reading the data, and does not need the additional features of OneLake, then it is possible to read the data from ADLS Gen2 via shortcuts.
Hope this helps. Please let me know if you have any further queries.
I have attached some links. Please refer them for more information:
https://www.infoworld.com/article/3704608/understanding-onelake-and-lakehouses-in-microsoft-fabric.h...
https://radacad.com/saying-yes-to-fabric-onelake-shortcuts-no-to-duplicate#:~:text=For%20example%2C%...
I believe you are correct, however I have not seen any documentation that clarifies if Direct Lake Mode in Power BI is possible when leveraging v-ordered Delta Parquet tables via shortcut to ADLS Gen2, or if the v-ordered Delta Parquet tables must truly live in OneLake.
Closest thing I have found is the below graphic seems to indicate that as long as you have a a Fabric/Premium capacity, a Lakehouse, and v-ordered Delta Parquet tables, it should work.
Thanks for your response. The scenario is that we have two teams: one on Fabric and another on Azure. They want to share the same set of data. Does it make any difference to the Fabric team if that data is in ADLSgen2 and read via shortcuts, or if it is in OneLake? In other words is the Fabric team losing any features by having that data in ADLSgen2 vs. having it in OneLake? Let me know if you need further clarification.
Hi ,
Thanks for explaining your ask.
Yes, there are some features that the Fabric team would be losing by having the data in ADLS Gen2 and reading it via shortcuts vs. having it in OneLake.
Performance: OneLake tables are optimized for performance, especially for queries that involve joins and aggregations. The Fabric team may experience slower performance for queries that involve joins and aggregations if they are reading the data from ADLS Gen2 via shortcuts.
Security: OneLake tables can be secured with role-based access control (RBAC), which makes it easier to manage who has access to the data. The Fabric team would need to manage permissions to the ADLS Gen2 storage account and the Delta Parquet tables if they are reading the data from ADLS Gen2 via shortcuts. This could be more complex than managing permissions to OneLake tables, especially if there are multiple teams accessing the data.
Governance: OneLake tables can be governed with policies, which makes it easier to ensure that the data is used in a compliant manner. The Fabric team would need to develop and implement their own policies to govern the use of the data in ADLS Gen2 if they are reading the data from ADLS Gen2 via shortcuts. This could be more complex than using the built-in policies in OneLake.
In addition to these features, OneLake also offers a number of other advantages, such as:
Overall, OneLake is a more comprehensive and robust data management platform than ADLS Gen2. If you are planning to use Fabric to access the data, it is recommended to use OneLake to store and manage the data.
If the Fabric team needs the best performance, security, and governance features, then the data can be stored in OneLake. However, if the Fabric team is only concerned with reading the data, and does not need the additional features of OneLake, then it is possible to read the data from ADLS Gen2 via shortcuts.
Hope this helps. Please let me know if you have any further queries.
I have attached some links. Please refer them for more information:
https://www.infoworld.com/article/3704608/understanding-onelake-and-lakehouses-in-microsoft-fabric.h...
https://radacad.com/saying-yes-to-fabric-onelake-shortcuts-no-to-duplicate#:~:text=For%20example%2C%...
Thank you! This is exactly the type of response I was looking for.
One difference that I can think of is that by default all tables written from Fabric are v-ordered Delta Lake tables. With ADLS Gen2, since there is no default per se, you might be using some other table format like standard Parquet which has less functionality than v-ordered Delta Lake tables. But if you are writing to ADLS Gen2 v-ordered Delta Lake tables, then from my understanding that's essentially the same thing as "Fabric Tables in OneLake".
One thing I do not know for sure is if you are able to use Direct Lake Mode from Power BI on a v-ordered Delta Lake table coming into your Lakehouse via a shortcut to ADLS Gen2.
Thanks for your response. This was actually a follow-up question that I had. I have tried creating a shortcut to a CSV file that I had in ADLSgen2, and that worked fine. But I had to load that into a Delta Parquet table in Fabric in order for it to be used by M365. I would assume that I am duplicating the data, in that respect, by having the data on ADLSgen2 in a format other than Delta Parquet.
BTW, in our case we are only reading the ADLSgen2 data and not writing to it. Thanks.
Unstructured data like CSVs in a Lakehouse (from either a shortcut to ADLS Gen2, S3, or even directly in the Lakehouse itself) cannot be queried using the SQL Endpoint or Warehouse. To do this, your data needs to be stored in a structured table format like Delta Lake (the default in Fabric).
If the Azure team that manages the ADLS Gen2 data will not put the data into a structured table format, then you will likely need to in your Lakehouse. The ideal scenario if they are the data owners would be for them to perform that transformation and create the structured table so that you could create a shortcut to their stuctured table instead of the CSV.
Thanks! I was thinking in terms of Fabric auomatically creating the dataset for use by M365, it needs to have Delta Parquet tables. Is that correct?
So, if the tables in ADLSgen2 are in a structured format other than Delta Parquet, they would need to be converted to Delta Parquet in Fabric in order to have the integration with M365, thus duplicating the data in the process. Do I have that right?
I believe you are correct, however I have not seen any documentation that clarifies if Direct Lake Mode in Power BI is possible when leveraging v-ordered Delta Parquet tables via shortcut to ADLS Gen2, or if the v-ordered Delta Parquet tables must truly live in OneLake.
Closest thing I have found is the below graphic seems to indicate that as long as you have a a Fabric/Premium capacity, a Lakehouse, and v-ordered Delta Parquet tables, it should work.
Thanks, daxman!
Hi @mwjones61 ,
Thanks for using Fabric Community.
As I understand you are trying to compare Onelake and Shortcuts here. But Shortcuts are objects in Onelake that point to data in other storage locations. You can create shortcuts in lakehouses and Kusto Query Language (KQL) databases.
There are many advantages of Shortcuts in Onelake:
Shortcuts in Fabric offer a number of advantages:
Please do let us know if you have any further queries.
You can refer to the documentation : link
Ask questions in Data Engineering, Data Science, Data Warehouse and General Discussion.
Check out the April 2024 Fabric update to learn about new features.