Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

October 28 & 29: Experts share their secrets on how to pass the Fabric Analytics Engineer certification exam—live. Learn more

Reply
Anonymous
Not applicable

Parquet file from On-prem in Fabric

Hi,
I'm currently using a dataflow taking data from a on-prem solution and sending it to a LH in fabric.
Once i do this i get a file with odd text combinations, ending with parquet, whilst still being in delta format.

What is this file? what is it used for? Can i delete it? Do i use this later rather than the initial delta file that i get from the actual dataflow?

Thanks in advance

1 ACCEPTED SOLUTION

Hi @Anonymous ,

In a new workspace, when an end-user first launches a Dataflow Gen2, a set of staging artifacts, such as a new Lakehouse and a new Data Warehouse, are automatically created for the whole workspace. These artifacts can be used for a set of functionalities available for Dataflow Gen2. This is called as DataflowStagingLakehouse.

A user is able to see these staging artifacts in the workspace list and lineage and being able to interact with these artifacts.

If a user deletes or modifies any of these staging artifacts, the workspace can be set in a state that won't be able to use these artifacts because of misconfiguration.

Hope this helps. Please let me know if you have any other questions.

View solution in original post

11 REPLIES 11
miguel
Community Admin
Community Admin

Hey! Where exactly are you seeing this file? How are you accessing this file?

Anonymous
Not applicable

Im seeing the file in the LH where i send my data via the dataflow. The following is the start of the parque file thats created.

so im also accessing it via the LH
Obviously the original file that i had, didnt have a name that resemebeled the one in the picture. 

RF_consultant_0-1698670828431.png

 

A dataflow effectively creates a delta parquet table, but it doesn't have a way to create a file at the lakehouse yet.

How did you create this Lakehouse? is this the staging Lakehouse that automatically gets created by the system when you have a Dataflow Gen2? 

 

I've been trying to repro this behavior with some parquet files from the NYC Taxi data, but haven't been able to repro this. Do you think that you could share some repro steps with some publicly available sample data or some sample data that you could share?

Anonymous
Not applicable

Sure, so I'm connecting the dataflow to a on-prem server.
Once this is done I'm, selecting the destination to be the LH that I've more or less just manually created outside of the dataflow.

Each time I now publish the dataflow a new parquet file is created which is peculiar.

What should be prefaced is that my original file is not a parquet file if thats what you are wondering. 

What is the intention behind these parquet file being created? and what exactly is a delta parquet file? And do we ever use these in production? Moreover what is the string in the title? What does it represent? 
Unfortunately I'm unable to share the data that I'm using as of now, due to NDAs.

Hi @Anonymous ,
We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. Please let us know if you have any further queries.

Are you able to create another Lakehouse and try to replicate this scenario by loading to this other Lakehouse?

Anonymous
Not applicable

Yes, I get the same parquet files,
I still dont understand what they are for? And why they are created? Can you detail that please?

RF_consultant_3-1698824580377.png

 

Here i sent a screendump of what type of file is created each instance that the dataflow is published.

RF_consultant_4-1698824580376.png

 

It includes parquet in the ending title of the file, moreover the header is gone.
Additionally the data order is missmatching or perhaps lost from before, i have not investigated this in more detail, but the data does not look "similar" in the regard of order.

Hi @Anonymous ,
Thanks for the update. Apologies for the issue you have been facing.

The screenshot you shared is from the Dataflows Staging Lakehouse and is not intended to be consumed by end users. The files created are files that the dataflow engine stores in that lakehouse for performing transformations and be faster. This Lakehouse will be hidden in the next couple of weeks and should never be used by users themselves.

Hope this helps. Please let us know if you have any further queries.

 

Anonymous
Not applicable

Thanks for the clarifications, no worries, I'll just go ahead and manually remove the files for now.

Anonymous
Not applicable

Could you also make the clarification with the dataflowsstaginglakehouse and any other lakehouse?

Hi @Anonymous ,

In a new workspace, when an end-user first launches a Dataflow Gen2, a set of staging artifacts, such as a new Lakehouse and a new Data Warehouse, are automatically created for the whole workspace. These artifacts can be used for a set of functionalities available for Dataflow Gen2. This is called as DataflowStagingLakehouse.

A user is able to see these staging artifacts in the workspace list and lineage and being able to interact with these artifacts.

If a user deletes or modifies any of these staging artifacts, the workspace can be set in a state that won't be able to use these artifacts because of misconfiguration.

Hope this helps. Please let me know if you have any other questions.

Helpful resources

Announcements
Sept Fabric Carousel

Fabric Monthly Update - September 2024

Check out the September 2024 Fabric update to learn about new features.

October NL Carousel

Fabric Community Update - October 2024

Find out what's new and trending in the Fabric Community.