Solved: Re: Can Dataflow Gen2 read from lakehouse "Files" ...

Scott_Powell · ‎09-08-2023

Hi, I've uploaded several hundred daily Power BI audit logs into a Lakehouse under the Files area. I'd like to use a Dataflow Gen2 to combine the files and then reshape them. Is there a way to do this?

If I use a Lakehouse connection I seem to be able to select an individual file, but not the whole folder.

If I use a Folder connection, it wants to send everything through an on-prem gateway (which makes no sense).

Appreciate any insights - thanks in advance!

Scott

yjh · ‎09-09-2023

=Lakehouse.Content（null）

Then Drill down.

sorry for my poor English.

View solution in original post

ZachRoberts · ‎10-06-2023

Hi @Scott_Powell

Were you able to get this to work? Trying to do the same have a ton of files loaded into a folder in lakehouse and would like to use dataflow to shape them

Thanks,

Zach

Scott_Powell · ‎10-13-2023

Hi @ZachRoberts , unfortunately I did not find a way to do this. If you do - please let me know, we could really use this.

Thanks,

Scott

ZachRoberts · ‎10-13-2023

Hi @Scott_Powell ,

I wasn't able to figure out loading the files in my folder through Dataflow but ended up using a notebook to ingest the files to a table and went from there.

Below is the notebook details if you want to give it a shot:

The below loads all files from within a folder and removes the first row in each file (not sure if your files are delmited but below example my files are | delimited), and in the step where you can input the column names you don't have to do this for every column - you can select the columns you want to load you just have to provide the appropiate column # starting from 0

Cell 1

from pyspark.sql.functions import split, row_number, input_file_name
from pyspark.sql import Window

# Read all files from the directory into a DataFrame
df = spark.read.text("Files/Expense/*.txt")

# Add a column for the input file name
df_with_filename = df.withColumn("filename", input_file_name())

# Add a row number to each row, partitioned by the input file name
windowSpec = Window.partitionBy("filename").orderBy("value")
df_with_rownum = df_with_filename.withColumn("rownum", row_number().over(windowSpec))

# Filter out the first row from each file
df_filtered = df_with_rownum.filter(df_with_rownum.rownum > 1)

# Split the value column by the | delimiter
df_split = df_filtered.withColumn("split_values", split(df_filtered["value"], "\|"))

# If you know the number of columns and want to give them names, you can do so
# For example, if there are two columns:
df_final = df_split.select(
    df_split["split_values"].getItem(0).alias("Constant"),
    df_split["split_values"].getItem(1).alias("BatchId"),

)

# Show the resulting DataFrame
df_final.show()

Cell 2

df_final.write.mode("overwrite").format("delta").saveAsTable("TableName")

yjh · ‎09-08-2023

https://data-marc.com/2023/08/25/access-onelake-files-from-power-bi-desktop/

Anonymous · ‎09-08-2023

Hi @yjh I realize I wasn't clear at all. I'm not trying to load them into Power BI desktop, I'm trying to use a Dataflow Gen2 to load them into tables in the Lakehouse.

Thanks,

Scott

yjh · ‎09-09-2023

=Lakehouse.Content（null）

Then Drill down.

sorry for my poor English.

joeguicearnold · ‎02-28-2024

This does not work for multiple files in the same folder though, right? You can only select a single file using this method as far as I an tell.

miguel · ‎02-28-2024

You can use the combine files experience the same as with other file system views (SharePoint, data lake, folder and others). If you're seeing any issues with this approach please create a separate topic / thread so we can take a closer look at it

yjh · ‎09-09-2023

The method is the same.

Can Dataflow Gen2 read from lakehouse "Files" as a folder?

Helpful resources

Join us at the Microsoft Fabric Community Conference

Join our Community Sticker Challenge 2025

Fabric Monthly Update - January 2025

Fabric Monthly Update - January 2025

Join us at the 2025 Microsoft Fabric Community Conference

Can Dataflow Gen2 read from lakehouse "Files" as a folder?

Helpful resources

Join us at the Microsoft Fabric Community Conference

Join our Community Sticker Challenge 2025

Fabric Monthly Update - January 2025

Fabric Monthly Update - January 2025