March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early bird discount ends December 31.
Register NowBe one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now
Hi,
I'm currently using a dataflow taking data from a on-prem solution and sending it to a LH in fabric.
Once i do this i get a file with odd text combinations, ending with parquet, whilst still being in delta format.
What is this file? what is it used for? Can i delete it? Do i use this later rather than the initial delta file that i get from the actual dataflow?
Thanks in advance
Solved! Go to Solution.
Hi @Anonymous ,
In a new workspace, when an end-user first launches a Dataflow Gen2, a set of staging artifacts, such as a new Lakehouse and a new Data Warehouse, are automatically created for the whole workspace. These artifacts can be used for a set of functionalities available for Dataflow Gen2. This is called as DataflowStagingLakehouse.
A user is able to see these staging artifacts in the workspace list and lineage and being able to interact with these artifacts.
If a user deletes or modifies any of these staging artifacts, the workspace can be set in a state that won't be able to use these artifacts because of misconfiguration.
Hope this helps. Please let me know if you have any other questions.
Hey! Where exactly are you seeing this file? How are you accessing this file?
Im seeing the file in the LH where i send my data via the dataflow. The following is the start of the parque file thats created.
so im also accessing it via the LH
Obviously the original file that i had, didnt have a name that resemebeled the one in the picture.
A dataflow effectively creates a delta parquet table, but it doesn't have a way to create a file at the lakehouse yet.
How did you create this Lakehouse? is this the staging Lakehouse that automatically gets created by the system when you have a Dataflow Gen2?
I've been trying to repro this behavior with some parquet files from the NYC Taxi data, but haven't been able to repro this. Do you think that you could share some repro steps with some publicly available sample data or some sample data that you could share?
Sure, so I'm connecting the dataflow to a on-prem server.
Once this is done I'm, selecting the destination to be the LH that I've more or less just manually created outside of the dataflow.
Each time I now publish the dataflow a new parquet file is created which is peculiar.
What should be prefaced is that my original file is not a parquet file if thats what you are wondering.
What is the intention behind these parquet file being created? and what exactly is a delta parquet file? And do we ever use these in production? Moreover what is the string in the title? What does it represent?
Unfortunately I'm unable to share the data that I'm using as of now, due to NDAs.
Hi @Anonymous ,
We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. Please let us know if you have any further queries.
Are you able to create another Lakehouse and try to replicate this scenario by loading to this other Lakehouse?
Yes, I get the same parquet files,
I still dont understand what they are for? And why they are created? Can you detail that please?
Here i sent a screendump of what type of file is created each instance that the dataflow is published.
It includes parquet in the ending title of the file, moreover the header is gone.
Additionally the data order is missmatching or perhaps lost from before, i have not investigated this in more detail, but the data does not look "similar" in the regard of order.
Hi @Anonymous ,
Thanks for the update. Apologies for the issue you have been facing.
The screenshot you shared is from the Dataflows Staging Lakehouse and is not intended to be consumed by end users. The files created are files that the dataflow engine stores in that lakehouse for performing transformations and be faster. This Lakehouse will be hidden in the next couple of weeks and should never be used by users themselves.
Hope this helps. Please let us know if you have any further queries.
Thanks for the clarifications, no worries, I'll just go ahead and manually remove the files for now.
Could you also make the clarification with the dataflowsstaginglakehouse and any other lakehouse?
Hi @Anonymous ,
In a new workspace, when an end-user first launches a Dataflow Gen2, a set of staging artifacts, such as a new Lakehouse and a new Data Warehouse, are automatically created for the whole workspace. These artifacts can be used for a set of functionalities available for Dataflow Gen2. This is called as DataflowStagingLakehouse.
A user is able to see these staging artifacts in the workspace list and lineage and being able to interact with these artifacts.
If a user deletes or modifies any of these staging artifacts, the workspace can be set in a state that won't be able to use these artifacts because of misconfiguration.
Hope this helps. Please let me know if you have any other questions.
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!
Your insights matter. That’s why we created a quick survey to learn about your experience finding answers to technical questions.
Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.