Solved: Re: What is the practical difference between refer...

mwjones61 · ‎09-30-2023

In other words, what is the advantage of referencing tables in OneLake vs. referencing Delta Parquet tables via shortcut that are stored in ADLS gen2? This is a situation where one team wants to use Azure Synapse (not Fabric) on ADLS gen2 and another team wants to use that same ADLS gen2 data in Fabric via shortcuts. Is the Fabric team missing any features that would be available if that same data were in OneLake in this scenario?

v-nikhilan-msft · ‎10-02-2023

Hi ,
Thanks for explaining your ask.

Yes, there are some features that the Fabric team would be losing by having the data in ADLS Gen2 and reading it via shortcuts vs. having it in OneLake.

Performance: OneLake tables are optimized for performance, especially for queries that involve joins and aggregations. The Fabric team may experience slower performance for queries that involve joins and aggregations if they are reading the data from ADLS Gen2 via shortcuts.

Security: OneLake tables can be secured with role-based access control (RBAC), which makes it easier to manage who has access to the data. The Fabric team would need to manage permissions to the ADLS Gen2 storage account and the Delta Parquet tables if they are reading the data from ADLS Gen2 via shortcuts. This could be more complex than managing permissions to OneLake tables, especially if there are multiple teams accessing the data.

Governance: OneLake tables can be governed with policies, which makes it easier to ensure that the data is used in a compliant manner. The Fabric team would need to develop and implement their own policies to govern the use of the data in ADLS Gen2 if they are reading the data from ADLS Gen2 via shortcuts. This could be more complex than using the built-in policies in OneLake.

In addition to these features, OneLake also offers a number of other advantages, such as:

Data sharing:OneLake makes it easy to share data with other teams and applications, both within and outside of your organization.
Data governance:OneLake provides a number of built-in data governance features, such as RBAC, auditing, and data lineage.
Data quality:OneLake provides a number of data quality features, such as data profiling and data cleansing.
Overall, OneLake is a more comprehensive and robust data management platform than ADLS Gen2. If you are planning to use Fabric to access the data, it is recommended to use OneLake to store and manage the data.

If the Fabric team needs the best performance, security, and governance features, then the data can be stored in OneLake. However, if the Fabric team is only concerned with reading the data, and does not need the additional features of OneLake, then it is possible to read the data from ADLS Gen2 via shortcuts.

Hope this helps. Please let me know if you have any further queries.
I have attached some links. Please refer them for more information:
https://www.infoworld.com/article/3704608/understanding-onelake-and-lakehouses-in-microsoft-fabric.h...
https://radacad.com/saying-yes-to-fabric-onelake-shortcuts-no-to-duplicate#:~:text=For%20example%2C%...

View solution in original post

daxman · ‎10-02-2023

I believe you are correct, however I have not seen any documentation that clarifies if Direct Lake Mode in Power BI is possible when leveraging v-ordered Delta Parquet tables via shortcut to ADLS Gen2, or if the v-ordered Delta Parquet tables must truly live in OneLake.

Closest thing I have found is the below graphic seems to indicate that as long as you have a a Fabric/Premium capacity, a Lakehouse, and v-ordered Delta Parquet tables, it should work.

View solution in original post

mwjones61 · ‎10-02-2023

Thanks for your response. The scenario is that we have two teams: one on Fabric and another on Azure. They want to share the same set of data. Does it make any difference to the Fabric team if that data is in ADLSgen2 and read via shortcuts, or if it is in OneLake? In other words is the Fabric team losing any features by having that data in ADLSgen2 vs. having it in OneLake? Let me know if you need further clarification.

v-nikhilan-msft · ‎10-02-2023

Hi ,
Thanks for explaining your ask.

Yes, there are some features that the Fabric team would be losing by having the data in ADLS Gen2 and reading it via shortcuts vs. having it in OneLake.

Performance: OneLake tables are optimized for performance, especially for queries that involve joins and aggregations. The Fabric team may experience slower performance for queries that involve joins and aggregations if they are reading the data from ADLS Gen2 via shortcuts.

Security: OneLake tables can be secured with role-based access control (RBAC), which makes it easier to manage who has access to the data. The Fabric team would need to manage permissions to the ADLS Gen2 storage account and the Delta Parquet tables if they are reading the data from ADLS Gen2 via shortcuts. This could be more complex than managing permissions to OneLake tables, especially if there are multiple teams accessing the data.

Governance: OneLake tables can be governed with policies, which makes it easier to ensure that the data is used in a compliant manner. The Fabric team would need to develop and implement their own policies to govern the use of the data in ADLS Gen2 if they are reading the data from ADLS Gen2 via shortcuts. This could be more complex than using the built-in policies in OneLake.

In addition to these features, OneLake also offers a number of other advantages, such as:

Data sharing:OneLake makes it easy to share data with other teams and applications, both within and outside of your organization.
Data governance:OneLake provides a number of built-in data governance features, such as RBAC, auditing, and data lineage.
Data quality:OneLake provides a number of data quality features, such as data profiling and data cleansing.
Overall, OneLake is a more comprehensive and robust data management platform than ADLS Gen2. If you are planning to use Fabric to access the data, it is recommended to use OneLake to store and manage the data.

If the Fabric team needs the best performance, security, and governance features, then the data can be stored in OneLake. However, if the Fabric team is only concerned with reading the data, and does not need the additional features of OneLake, then it is possible to read the data from ADLS Gen2 via shortcuts.

Hope this helps. Please let me know if you have any further queries.
I have attached some links. Please refer them for more information:
https://www.infoworld.com/article/3704608/understanding-onelake-and-lakehouses-in-microsoft-fabric.h...
https://radacad.com/saying-yes-to-fabric-onelake-shortcuts-no-to-duplicate#:~:text=For%20example%2C%...

mwjones61 · ‎10-02-2023

Thank you! This is exactly the type of response I was looking for.

daxman · ‎10-02-2023

One difference that I can think of is that by default all tables written from Fabric are v-ordered Delta Lake tables. With ADLS Gen2, since there is no default per se, you might be using some other table format like standard Parquet which has less functionality than v-ordered Delta Lake tables. But if you are writing to ADLS Gen2 v-ordered Delta Lake tables, then from my understanding that's essentially the same thing as "Fabric Tables in OneLake".

One thing I do not know for sure is if you are able to use Direct Lake Mode from Power BI on a v-ordered Delta Lake table coming into your Lakehouse via a shortcut to ADLS Gen2.

mwjones61 · ‎10-02-2023

Thanks for your response. This was actually a follow-up question that I had. I have tried creating a shortcut to a CSV file that I had in ADLSgen2, and that worked fine. But I had to load that into a Delta Parquet table in Fabric in order for it to be used by M365. I would assume that I am duplicating the data, in that respect, by having the data on ADLSgen2 in a format other than Delta Parquet.

BTW, in our case we are only reading the ADLSgen2 data and not writing to it. Thanks.

daxman · ‎10-02-2023

Unstructured data like CSVs in a Lakehouse (from either a shortcut to ADLS Gen2, S3, or even directly in the Lakehouse itself) cannot be queried using the SQL Endpoint or Warehouse. To do this, your data needs to be stored in a structured table format like Delta Lake (the default in Fabric).

If the Azure team that manages the ADLS Gen2 data will not put the data into a structured table format, then you will likely need to in your Lakehouse. The ideal scenario if they are the data owners would be for them to perform that transformation and create the structured table so that you could create a shortcut to their stuctured table instead of the CSV.

mwjones61 · ‎10-02-2023

Thanks! I was thinking in terms of Fabric auomatically creating the dataset for use by M365, it needs to have Delta Parquet tables. Is that correct?

So, if the tables in ADLSgen2 are in a structured format other than Delta Parquet, they would need to be converted to Delta Parquet in Fabric in order to have the integration with M365, thus duplicating the data in the process. Do I have that right?

daxman · ‎10-02-2023

I believe you are correct, however I have not seen any documentation that clarifies if Direct Lake Mode in Power BI is possible when leveraging v-ordered Delta Parquet tables via shortcut to ADLS Gen2, or if the v-ordered Delta Parquet tables must truly live in OneLake.

Closest thing I have found is the below graphic seems to indicate that as long as you have a a Fabric/Premium capacity, a Lakehouse, and v-ordered Delta Parquet tables, it should work.

mwjones61 · ‎10-02-2023

Thanks, daxman!

v-nikhilan-msft · ‎10-02-2023

Hi @mwjones61 ,
Thanks for using Fabric Community.
As I understand you are trying to compare Onelake and Shortcuts here. But Shortcuts are objects in Onelake that point to data in other storage locations. You can create shortcuts in lakehouses and Kusto Query Language (KQL) databases.
There are many advantages of Shortcuts in Onelake:

Shortcuts in Fabric offer a number of advantages:

Data sharing:Shortcuts allow you to share data between different users and applications without having to copy the data. This can help to improve data consistency and reduce data duplication.
Data virtualization:Shortcuts can be used to create virtual data sets that combine data from different sources. This can make it easier to access and analyze data from different sources.
Data governance:Shortcuts can be used to implement data governance policies. For example, you can use shortcuts to restrict access to data to certain users or applications.
Performance: Shortcuts can help to improve performance by reducing the need to move data between different systems.
Cost savings:Shortcuts can help to reduce data storage and processing costs by reducing the need to copy data between different systems.

Could you please be more clear on the ask as I am not able to understand which two things you are trying to compare?
Please do let us know if you have any further queries.
You can refer to the documentation : link

What is the practical difference between referencing data via shortcut in ADLS gen2 vs. OneLake?

Helpful resources

New forum boards available in Synapse

Fabric certifications survey

Fabric Monthly Update - April 2024

Fabric Community Update - April 2024