Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Join us at FabCon Vienna from September 15-18, 2025, for the ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM. Get registered

Reply
A_monged
Frequent Visitor

Pipeline Copy activity CSV to Parquet issue

I have a CSV file in my Lakehouse. When I use the Copy Activity to move it and change the format to Parquet, I see columns with null values, although the CSV file is configured correctly  (with escape and quote characters)!!

i tried both to load file to table in lakehouse and read it as parquet using spark and in both it shows the null columns !!!

 

destination.png

 

source.png

 

1 ACCEPTED SOLUTION

I found the issue to be that many rows with null values were intentionally added to the CSV, and after partitioning, the nulls were the first to be seen.

View solution in original post

8 REPLIES 8
Anonymous
Not applicable

Hi @A_monged ,

 

Thanks for the reply from lbendlin .

 

I used PySpark statements in notebook to convert CSV files from lakehouse to parquet files:

# start SparkSession
from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("CSV_to_Parquet").getOrCreate()

# load CSV file
df = spark.read.format("csv").option("header","true").load("Files/orders/2019.csv")
df.show()

# transform to Parquet form and save it
parquet_file_path = "Files/test.parquet"
df.write.parquet(parquet_file_path)

 

Works fine, as shown below, and my file does not contain null values:

vhuijieymsft_0-1734506951268.png

 

The error may occur because the data types and schemas defined in the CSV file do not match the data types and schemas expected by the Parquet format.

 

Optionally, you can use the same method as I did to convert the CSV file to a Parquet file.

 

If you want to save it as a table you can use the following syntax:

df.write.mode(“overwrite”).saveAsTable(“parquetTestTable”)

vhuijieymsft_1-1734506951269.png

 

If you have any other questions please feel free to contact me.

 

Best Regards,
Yang
Community Support Team

 

If there is any post helps, then please consider Accept it as the solution  to help the other members find it more quickly.
If I misunderstand your needs or you still have problems on it, please feel free to let us know. Thanks a lot!

I tried your approach but still got the same null columns attached table output and sample rows of raw CSV file 

 

as you showed but still nulls.pngquote character.png

lbendlin
Super User
Super User

Does your CSV file contain quoted row delimiters?

File use comma as delimiter and " as a quote character 

quote character.png

So commas are quoted.  But what about linefeeds in your data? are they quoted?

I found the issue to be that many rows with null values were intentionally added to the CSV, and after partitioning, the nulls were the first to be seen.

Anonymous
Not applicable

Hi @A_monged ,

 

Thanks for the reply from lbendlin .

 

In order to deal with null values, an efficient way to get the data is to use Dataflow Gen2. In Dataflow Gen2, you can process the data, such as removing columns that contain null values. Then, set Destination to Lakehouse so that data that does not contain null values can be written to Lakehouse. This approach ensures data integrity and accuracy.

vhuijieymsft_0-1734939694139.png

 

For more information on using Dataflow Gen2, you can refer to these official documents:

mslearn-fabric

Create your first Microsoft Fabric dataflow - Microsoft Fabric | Microsoft Learn

 

If you have any other questions please feel free to contact me.

 

Best Regards,
Yang
Community Support Team

 

If there is any post helps, then please consider Accept it as the solution  to help the other members find it more quickly.
If I misunderstand your needs or you still have problems on it, please feel free to let us know. Thanks a lot!

There are no linefeeds in my data

Helpful resources

Announcements
Join our Fabric User Panel

Join our Fabric User Panel

This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.

June FBC25 Carousel

Fabric Monthly Update - June 2025

Check out the June 2025 Fabric update to learn about new features.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.