Solved: Re: Maintenance on Lakehouse Tables

rtolsma · ‎10-29-2024

Hi,

I'm trying to run Maintenance on Lakehouse tables that are in OneLake. However, when I try and run Maintenance on the table with only the Optimize and V-Order flag selected I receive the following error:

I have even tried to run a Spark SQL notebook and the operation failed with a similar error. When I put the full URL in for the OneLake and workspace it ended up erroring again saying that the operation could not be completed on Dataverse tables.

Please let me know how this can be resolved.

Thanks,

Russ

richbenmintz · ‎10-30-2024

Hi @rtolsma ,

I believe the issue is that the Dataverse data is not actually managed by Fabric, tables are pointers to your dataverse environment, the error while annoying is correct.

I hope this helps,
Richard

Did I answer your question? Mark my post as a solution! Kudos Appreciated!

Proud to be a Super User!

View solution in original post

richbenmintz · ‎10-30-2024

Hi @rtolsma,

Are you able to provide screenshots of the Lakehouse Folder Structure and where the table you are trying to run maintenance on?

would be helpful for some context as you are mentioning Dataverse and the error is referencing ADLS Gen 2

I hope this helps,
Richard

Did I answer your question? Mark my post as a solution! Kudos Appreciated!

Proud to be a Super User!

AndyDDC · ‎10-30-2024

Hi @rtolsma how have the lakehouse tables been created?

rtolsma · ‎10-30-2024

The Lakehouse is a OneLake lakehouse and was created via Fabric Link. When I tried to run Optimize on a single table in the Lakehouse using Spark SQL it errored indicating that tables in the Dataverse cannot be maintained. My issue is that my client is trying to run a single query to drop a view and it ran for 5 minutes without completing. Since then performance has degraded for any query. When I inspected one of the highly transacted tables from D365 FnO that has been exported I could see that there were hundreds of 10MB files, I could see how this would cause performance issues. I want to optimize the table but this is failing either through the Fabric Maintenance operation or through a Notebook. I hope this provides clarity. Ultimately, I'm trying to optimize the client's experience with Fabric, as of right now they have an F8 sku and are seeing poor performance on querying and a high CU usage in the background for data coming from Fabric Link.

richbenmintz · ‎10-30-2024

Hi @rtolsma ,

I believe the issue is that the Dataverse data is not actually managed by Fabric, tables are pointers to your dataverse environment, the error while annoying is correct.

I hope this helps,
Richard

Did I answer your question? Mark my post as a solution! Kudos Appreciated!

Proud to be a Super User!

rtolsma · ‎10-30-2024

Interesting, so then why do I see storage consumption in Fabric? When I look at the Fabric Capacity Metrics App I see consumption continuing to grow. The OneLake storage for this workspace is at 1.5 TB there needs to be some way to maintain this?

richbenmintz · ‎10-30-2024

That is very odd, when you click on the table and view source files, what does it show you?

I hope this helps,
Richard

Did I answer your question? Mark my post as a solution! Kudos Appreciated!

Proud to be a Super User!

rtolsma · ‎10-30-2024

I see several hundred 9-10MB files as per the screenshot below:

richbenmintz · ‎10-30-2024

Can you right click at show the path of a single file?

Is there a delta_log folder in the directory?

I hope this helps,
Richard

Did I answer your question? Mark my post as a solution! Kudos Appreciated!

Proud to be a Super User!

rtolsma · ‎10-30-2024

Yes there is and it contains approximately 130+ .json files with some .checkpoint.parquet files in it.

richbenmintz · ‎10-30-2024

Crazy idea,

Find the location of the Delta Table in OneLake and create a new table, from that path, then try to vacuum and optimize.

I hope this helps,
Richard

Did I answer your question? Mark my post as a solution! Kudos Appreciated!

Proud to be a Super User!

rtolsma · ‎10-30-2024

Thanks for the idea, seems like I'm just going to have to copy data as you suggested -- likely to a warehouse or to another lakehouse and then optimize for queries from there. Thanks!

richbenmintz · ‎10-30-2024

The Create table should not duplicate the data,

spark.sql(f"""create table {table_name}
            using delta
            location '{delta_folder_path}'"""
        )

as it is pointing to an existing delta location, should simply create a metadata entry into the Fabric Lakehouse Catalog, could you humor me and show me the properties of one of the tables, would look something like

I hope this helps,
Richard

Did I answer your question? Mark my post as a solution! Kudos Appreciated!

Proud to be a Super User!

Leemor1975 · ‎02-21-2025

Hey richbenmintz,

I'm fighting this same issue and wanted to share the results of creating a new table that points to the existing files.

I created the table using this:

create table mserp_taxtransbientitydhs
using delta
location 'abfss://UAT_Dataverse@onelake.dfs.fabric.microsoft.com/dataverse_operationstm_cds2_workspace_xxxxxxxxxxxxxxxxxxxxxxxxxxx.Lakehouse/Tables/mserp_taxtransbientity'

Unfortunately, it still gives the error "

MethodNotAllowed, "This operation is not supported through shortcuts of account type Dataverse.""

Was worth a shot! Thanks for all your sugegstions on this one.

Darin

frithjof_v · ‎10-30-2024

It would be interesting to try.

But I'm pretty sure OPTIMIZE would need to optimize (i.e. rewrite) the physical, underlying parquet files.

In order to create files better optimized for querying (if the Dataverse managed files are not optimized for querying), my best bet is to create a physical copy in a Lakehouse (OneLake) and run optimize on the copy.

rtolsma · ‎10-30-2024

Here are the properties for this table -- definitely similar.

richbenmintz · ‎10-30-2024

It is a Shortcut, would be the only difference

I hope this helps,
Richard

Did I answer your question? Mark my post as a solution! Kudos Appreciated!

Proud to be a Super User!

rtolsma · ‎10-30-2024

Yup makes sense that this is a short cut into an ADLS Gen 2 storage facility that the Fabric instance doesn't have access to. So impossible to maintain, but I didn't realize that shortcuts would cause you to also increase your OneLake storage...seems like I'm doing exactly what was told us was the benefit, no data duplication but it appears that I'm consuming Dataverse storage and OneLake storage as well.

frithjof_v · ‎10-30-2024

5 ways to get your Dataverse Data into Microsoft Fabric / OneLake - DEV Community

This blog describes different ways of syncing Dataverse data into Fabric.

Just to be sure, which option in the blog article (link above) are you using?

There are two main options (besides traditional ETL using dataflows, pipelines or notebooks):

Link to Fabric
Azure Synapse Link (this is also an option for syncing Dataverse data for consumption in Fabric)

Link your Dataverse environment to Microsoft Fabric and unlock deep insights - Power Apps | Microsof...

Could you confirm which option you are using?

Anyway, I'm surprised that you're seeing OneLake storage being consumed, although I don't have extensive personal experience with the Link to Fabric or Azure Synapse Link options. Can you confirm that you don't have set up any notebooks/dataflows/pipelines which copies some data physically in OneLake?

I notice that your data storage in OneLake was first empty, and then there was some data from October 15, before it started increasing rapidly from October 24th. Is October 24th the day when you activated the link? Or did some other process starting to generate data on this date, some process which physically stores data in OneLake, in the workspace called Fabric?

Ref. your screenshot which shows that the OneLake data is stored in a workspace called Fabric. Is this the name of the workspace which contains the Dataverse Lakehouse? Or is this another workspace?

There is also a yammer forum for the Dataverse Link functionality, it covers both the Link to Fabric and Azure Synapse Link option. I think this is the link: Viva Engage : Dynamics 365 and Power Platform Preview Programs (although I cannot confirm 100% that this is the real link to that forum, but I think it is).

rtolsma · ‎10-30-2024

Thanks for the references! I'll do some more digging into the actual data, but yes, the inventrans table as an example is sitting at 3GB of delta lake files. I'm part of the D365 preview community groups on Yammer, so I may post my questions there and discuss with the Fabric Link product team.

Thanks for all your help as well!

frithjof_v · ‎10-30-2024

I think the shortcuts can show with "xGB" (for example 3GB) in Fabric and the files can show in the "view files", but the data should actually be stored in Dataverse, if you are indeed using the Link to Fabric option.

But the shortcuts should not show as storage in the Fabric Capacity Metrics app (I think). Because the data should be stored in Dataverse, if using the Link to Fabric option.

Is there some other (non-shortcut) data in the workspace called Fabric? I.e. some data which is actually stored physically in that workspace.

I notice that your data storage in OneLake was first empty, and then there was some data from October 15, before it started increasing rapidly from October 24th. Is October 24th the day when you activated the link?

Or did some other process starting to generate data on October 24th, some process which physically stores data in OneLake, in the workspace called Fabric?

Best of luck in getting to the root of it! 😀

As mentioned by @richbenmintz , OneLake File Explorer can perhaps provide some insights into this case.

Azure Storage Explorer is another option which provides a different view into the OneLake storage: Integrate OneLake with Azure Storage Explorer - Microsoft Fabric | Microsoft Learn

Maintenance on Lakehouse Tables

Helpful resources

Fabric Monthly Update - July 2025

Fabric Community Update - July 2025

Join us at FabCon Vienna from September 15-18, 2025

Maintenance on Lakehouse Tables

Helpful resources

Fabric Monthly Update - July 2025

Fabric Community Update - July 2025