Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Be one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now

Reply
Scott_Powell
Advocate III
Advocate III

Can Dataflow Gen2 read from lakehouse "Files" as a folder?

Hi, I've uploaded several hundred daily Power BI audit logs into a Lakehouse under the Files area. I'd like to use a Dataflow Gen2 to combine the files and then reshape them. Is there a way to do this?

 

If I use a Lakehouse connection I seem to be able to select an individual file, but not the whole folder.

If I use a Folder connection, it wants to send everything through an on-prem gateway (which makes no sense).

 

Appreciate any insights - thanks in advance!

Scott

1 ACCEPTED SOLUTION

=Lakehouse.Content(null)

Then Drill down.

sorry for my poor English.

View solution in original post

9 REPLIES 9
ZachRoberts
Frequent Visitor

Hi @Scott_Powell 

 

Were you able to get this to work? Trying to do the same have a ton of files loaded into a folder in lakehouse and would like to use dataflow to shape them


Thanks,

Zach

Hi @ZachRoberts , unfortunately I did not find a way to do this. If you do - please let me know, we could really use this.

 

Thanks,

Scott

Hi @Scott_Powell ,

 

I wasn't able to figure out loading the files in my folder through Dataflow but ended up using a notebook to ingest the files to a table and went from there. 

 

Below is the notebook details if you want to give it a shot:

 

The below loads all files from within a folder and removes the first row in each file (not sure if your files are delmited but below example my files are | delimited), and in the step where you can input the column names you don't have to do this for every column - you can select the columns you want to load you just have to provide the appropiate column # starting from 0

 

Cell 1

from pyspark.sql.functions import split, row_number, input_file_name
from pyspark.sql import Window

# Read all files from the directory into a DataFrame
df = spark.read.text("Files/Expense/*.txt")

# Add a column for the input file name
df_with_filename = df.withColumn("filename", input_file_name())

# Add a row number to each row, partitioned by the input file name
windowSpec = Window.partitionBy("filename").orderBy("value")
df_with_rownum = df_with_filename.withColumn("rownum", row_number().over(windowSpec))

# Filter out the first row from each file
df_filtered = df_with_rownum.filter(df_with_rownum.rownum > 1)

# Split the value column by the | delimiter
df_split = df_filtered.withColumn("split_values", split(df_filtered["value"], "\|"))

# If you know the number of columns and want to give them names, you can do so
# For example, if there are two columns:
df_final = df_split.select(
    df_split["split_values"].getItem(0).alias("Constant"),
    df_split["split_values"].getItem(1).alias("BatchId"),

)

# Show the resulting DataFrame
df_final.show()

 

Cell 2

df_final.write.mode("overwrite").format("delta").saveAsTable("TableName")

 

 

yjh
Helper II
Helper II
Anonymous
Not applicable

Hi @yjh I realize I wasn't clear at all. I'm not trying to load them into Power BI desktop, I'm trying to use a Dataflow Gen2 to load them into tables in the Lakehouse.

 

Thanks,

Scott

=Lakehouse.Content(null)

Then Drill down.

sorry for my poor English.

This does not work for multiple files in the same folder though, right?  You can only select a single file using this method as far as I an tell.

You can use the combine files experience the same as with other file system views (SharePoint, data lake, folder and others). If you're seeing any issues with this approach please create a separate topic / thread so we can take a closer look at it

The method is the same.

Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!

Dec Fabric Community Survey

We want your feedback!

Your insights matter. That’s why we created a quick survey to learn about your experience finding answers to technical questions.

ArunFabCon

Microsoft Fabric Community Conference 2025

Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.

December 2024

A Year in Review - December 2024

Find out what content was popular in the Fabric community during 2024.

Top Solution Authors