Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started

Reply
arpost
Advocate V
Advocate V

How do I just write a CSV file to a lakehouse with a notebook?

Greetings, all. I'm exploring Fabric notebooks and have a question. I am trying to write a simple CSV file to a lakehouse using the following code:

spark.createDataFrame(dataframe).write.mode("overwrite").csv(LakehousePath + FilePath + FileName_NoType + '_altered')

When I write this, however, I get a folder with files in it rather than a single .csv file:

 

arpost_0-1722013681112.png

 

How do I just save a normal csv file with the name I've chosen?

6 REPLIES 6
v-huijiey-msft
Community Support
Community Support

Hi @arpost ,

 

Thanks for the reply from @frithjof_v .

 

The reason you get a folder containing files instead of a single .csv file after writing the code is because Apache Spark's default behavior is to process the data in a distributed fashion and write the results to multiple partial files instead of a single file.

 

To write CSV files to lakehouse, why not choose the no-code way of upload file, besides the alternative provided by frithjof_v ? This way is easier and faster.

vhuijieymsft_0-1722216949667.png

 

If you have any other questions please feel free to contact me.

 

Best Regards,
Yang
Community Support Team

 

If there is any post helps, then please consider Accept it as the solution  to help the other members find it more quickly.
If I misunderstand your needs or you still have problems on it, please feel free to let us know. Thanks a lot!

@v-huijiey-msft, this is for an automated export process, so a manual upload of data wouldn't apply.

Hi @arpost ,

 

I would like to know why the “upload” option is used for export? I often use this option to upload a csv file and then use it, here are my test steps:

 

First, here's my cvs file.

vhuijieymsft_0-1722305545634.png

 

I created a lakehouse, which was just created empty.

vhuijieymsft_1-1722305545640.png

 

I used upload to upload the csv file to the lakehouse and after successful upload it shows the below image.

vhuijieymsft_2-1722305567391.png
vhuijieymsft_3-1722305567394.png

vhuijieymsft_4-1722305579158.png

vhuijieymsft_5-1722305579161.png

 

To use it, you can select the load to table option and then manipulate the table.

vhuijieymsft_6-1722305600835.png
vhuijieymsft_7-1722305600837.png

 

If you have any other questions please feel free to contact me.

 

Best Regards,
Yang
Community Support Team

 

If there is any post helps, then please consider Accept it as the solution  to help the other members find it more quickly.
If I misunderstand your needs or you still have problems on it, please feel free to let us know. Thanks a lot!

@v-huijiey-msft, when running an automated process that is generating and/or altering data, there isn't a place for a manual upload because the data being produced and exported to a file is being generated via code rather than by a human. This can then be tied into and invoked by a data pipeline as an intermediary step to do things like (1) pull data, (2) create a file and upload to lakehouse, and (3) send file via email.

 

In my scenario, I don't need to manually drop a file somewhere but instead need the process to produce the file and drop it so it can be picked up.

frithjof_v
Resident Rockstar
Resident Rockstar

One alternative is to convert to Pandas:

 

 

# Sample data
data = [
    ("Alice", 34),
    ("Bob", 45),
    ("Catherine", 29)
]

# Define the schema
columns = ["Name", "Age"]

# Create DataFrame
df = spark.createDataFrame(data, columns)

# Convert to Pandas dataframe
df_pandas = df.toPandas()

# Write DataFrame to CSV file
df_pandas.to_csv('/lakehouse/default/Files/myFile.csv', header=True, index = False)

 

 

Read and write data with Pandas - Microsoft Fabric | Microsoft Learn

 

Keep in mind Pandas cannot handle the same amount of data as Spark, so I guess converting to Pandas should only be done if the data is below a certain size (I have no idea what that "limit" is and what would be the consequense of breaking that limit).

 

Perhaps, if you are going to use pandas dataframe before saving to csv, you can might as well just use pandas dataframe everywhere in your Notebook (instead of spark dataframe). Then you don't need to convert.
I don't really know but perhaps that would make sense.

At least when you are working with small amounts of data I think you can just go with pandas.

 

I found some other information/discussion about the topic, however I didn't find a simpler solution:

writing spark dataframe as CSV to a repo - Databricks Community - 60003

scala - Write single CSV file using spark-csv - Stack Overflow

Write in Single CSV file - Databricks Community - 29551

How to Save PySpark Dataframe to a Single Output File | Engineering for Data Science

Solved: Simply writing a dataframe to a CSV file (non-part... - Databricks Community - 27818

Solved: How do I create a single CSV file from multiple pa... - Databricks Community - 29962

python - Write to a CSV file using Microsoft Fabric - Stack Overflow

Here is some more information/discussion regarding Pandas: https://www.reddit.com/r/MicrosoftFabric/s/ZXrtR0nbvk

Helpful resources

Announcements
Europe Fabric Conference

Europe’s largest Microsoft Fabric Community Conference

Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.

AugFabric_Carousel

Fabric Monthly Update - August 2024

Check out the August 2024 Fabric update to learn about new features.

August Carousel

Fabric Community Update - August 2024

Find out what's new and trending in the Fabric Community.