Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Enhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.

Reply
surbhinijhara
Regular Visitor

Ingest list of parquet files with complex data types from Storage blob into lake house

Hello,

 

Problem : Unable to ingest from Azure storage blob -> Lakehouse 

 

I have a list of parquet files in multi-folder structure in Azure storage blob.

I understand that datapipeline copy activity does not support Parquet complex types into  lakehouse tables. Ref: 

But I am trying to ingest as files only and then process it and store ion flattened structure in lakehouse tables.

 

However I still get the error, as it appears that the type is checked when being read from Source, irrespective of Lakehouse table or file chosen,

What is an appropriate way to ingest the Parquet files with complex types?

 

Error that I receive:

ErrorCode=UnsupportedParquetComplexType,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=,Source=,''Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Parquet complex types of STRUCT, LIST and MAP are not supported.

 

Sample Azure Folder structure:

1. container name : sampletelemery

2. folders: year=25/month-01/day=01

There are bunch of files in the day folder

 

If I use Dataflow, then the I get the storage blob metada in columns where name column contains the parquet file name. Do I then read just the columns and process one by one. Is taht teh way to go?

Screenshot 2025-01-24 at 1.32.36 PM.png

 Any Directions will be useful. Thanks

 

1 ACCEPTED SOLUTION
nilendraFabric
Super User
Super User

Hello @surbhinijhara 

 

Have you tried creating a notebook and then invoke the notebook or schedule it as required.

 

One way to bring Parquet files that contain structures like LIST, MAP, or STRUCT into a lakehouse is to use a notebook with Spark, rather than a pipeline activity that enforces type checks. You can read these files, flatten or transform their complex columns, then write them into the lakehouse. For a multi-folder structure such as year=25/month=01/day=01, you can specify wildcards in your Spark read path.

 

Give it a try:

 

df = spark.read.parquet("abfss://sampletelemery@<yourstorageaccount>.dfs.core.windows.net/year=25/month=01/day=01/*.parquet")

 

from pyspark.sql.functions import col, explode

# Example of flattening a nested array column
df_flat = df.withColumn("exploded_items", explode(col("someArrayColumn")))
# Continue transformations as needed...

# Write as Delta table to the Tables section
df_flat.write.mode("overwrite").format("delta").saveAsTable("your_lakehouse_table")

# Or write as Parquet files to the Files section
df_flat.write.mode("overwrite").format("parquet").save("Files/my_parquet_folder")

hope this helps

 

Thanks

View solution in original post

5 REPLIES 5
surbhinijhara
Regular Visitor

Thanks, @nilendraFabric 

surbhinijhara
Regular Visitor

Thanks, @nilendraFabric !

So esentially it means that I cannot use Data factory here - i.e. neither data pipeline nor dataflow. Instead write a custom code using Notebook to load the data.

 

Below is the piece of code that I have used (similar to yours just that i need to read from blob, and not datalake), I need not needed to use explode function and I could still load the raw data into lakehouse tables as well as files. Can you comment if that is fine or you have another input. Thanks again.

 

# Welcome to your new notebook
# Type here in the cell editor to add code
 
# Azure Blob Storage credentials
storage_account_name = "<account-name>"
storage_account_key =<key>"
container_name = "<container-name>"
blob_path = "<folder-path>/*.parquet"

# Configure Spark to access Azure Blob Storage
spark.conf.set(f"fs.azure.account.key.{storage_account_name}.blob.core.windows.net", storage_account_key)

# Path to the Parquet file on Azure Blob Storage
parquet_file_path = f"wasbs://{container_name}@{storage_account_name}.blob.core.windows.net/{blob_path}"

# Load the Parquet file into a DataFrame
df = spark.read.parquet(parquet_file_path)

# Write as Delta table to the Tables section
#df_flat.write.mode("overwrite").format("delta").saveAsTable("<lakehouse_table>")

#OR write as Parquet files to the Files section
df.write.mode("overwrite").format("parquet").save("Files/<folder>")
 
 

 

It looks great. Try to use key vault for secret storage.

nilendraFabric
Super User
Super User

Hello @surbhinijhara 

 

Have you tried creating a notebook and then invoke the notebook or schedule it as required.

 

One way to bring Parquet files that contain structures like LIST, MAP, or STRUCT into a lakehouse is to use a notebook with Spark, rather than a pipeline activity that enforces type checks. You can read these files, flatten or transform their complex columns, then write them into the lakehouse. For a multi-folder structure such as year=25/month=01/day=01, you can specify wildcards in your Spark read path.

 

Give it a try:

 

df = spark.read.parquet("abfss://sampletelemery@<yourstorageaccount>.dfs.core.windows.net/year=25/month=01/day=01/*.parquet")

 

from pyspark.sql.functions import col, explode

# Example of flattening a nested array column
df_flat = df.withColumn("exploded_items", explode(col("someArrayColumn")))
# Continue transformations as needed...

# Write as Delta table to the Tables section
df_flat.write.mode("overwrite").format("delta").saveAsTable("your_lakehouse_table")

# Or write as Parquet files to the Files section
df_flat.write.mode("overwrite").format("parquet").save("Files/my_parquet_folder")

hope this helps

 

Thanks

Thanks @nilendraFabric 

Helpful resources

Announcements
July 2025 community update carousel

Fabric Community Update - July 2025

Find out what's new and trending in the Fabric community.