Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Enhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.

Reply
smpa01
Super User
Super User

Progrmatically write files in delta

How can I progrmatically write files (create if not exits + write/overwrite daily) in the files section of the delta lake. the following did not work where I am trying to create a .txt file with a string (not df)

 

 

 

import os

#content to write
str = 'lorem ipsum dolores'
#base path
base_path = "Files/materialized/"
#desired file type
query_file = "daily.txt"
#file path
file_path = base_path +query_file
#create if not exists
os.makedirs(os.path.dirname(file_path), exist_ok=True)
#write content
with open (file_path,'w') as file:
    file.write(str)

 

 

 

Also, which file extesion is optimum (from delta lake compression perspective; hence more performant). I need to log a string in the file daily and I need to be read back the string I wrote by calling the file.

 

I have also tried this but I have realized that it creates a folder called f1 with mutiple text file and a _success file. Is there any way to control the file name as f1.txt. However, the Copy Activity in pipeline creates the file with exact desired name in the destination and a single file. Can I do what is achievable inpipeline in notebook

 

import os

#content to write
str = 'lorem ipsum dolores'
#base path
df = spark.createDataFrame([Row(value=str)])
//write a single row df
df.write.mode("overwrite").parquet("Files/f1")

 

Thank you in advance.

 

Did I answer your question? Mark my post as a solution!
Proud to be a Super User!
My custom visualization projects
Plotting Live Sound: Viz1
Beautiful News:Viz1, Viz2, Viz3
Visual Capitalist: Working Hrs
1 ACCEPTED SOLUTION
frithjof_v
Super User
Super User

I was able to use this code to write to a simple text file:

 

 

import os

# Define the sentence you want to write
sentence = "This is the sentence that will be written to the text file."

# Specify the folder path and file name
folder_base_path = "/lakehouse/default/Files/"
folder_relative_path = "sentence_files"
file_name = "output.txt"
folder_path = os.path.join(folder_base_path, folder_relative_path)

# Combine the folder path and file name
file_path = os.path.join(folder_path, file_name)

# Create the directory if it doesn't exist
os.makedirs(folder_path, exist_ok=True)

# Open the file in write mode and write the sentence
with open(file_path, "w") as file:
    file.write(sentence)

print(f"Sentence written to {file_path}")

 

 

The folder_base_path will depend on whether your notebook has a default lakehouse or if you are just mounting lakehouses to your notebook.

In the code I show above, the folder_base_path assumes that the notebook has a default lakehouse.

If you don't want to use a default lakehouse, then you will need to mount a lakehouse instead. However if you don't have any specific requirements, I would say just use a default lakehouse for your notebook.

 

https://learn.microsoft.com/en-us/fabric/data-engineering/lakehouse-notebook-explore#switch-lakehous...

 

https://fabric.guru/how-to-mount-a-lakehouse-and-identify-the-mounted-lakehouse-in-fabric-notebook

 

By default, PySpark creates a folder with multiple files. I guess this is because PySpark uses distributed processing on multiple worker nodes.

 

If you want to write a (not too big) dataframe to a single file, I think the easiest way is to use Pandas.

 

https://community.fabric.microsoft.com/t5/Data-Engineering/How-do-I-just-write-a-CSV-file-to-a-lakeh...

 

https://www.reddit.com/r/MicrosoftFabric/s/J38eNFH1gw

View solution in original post

4 REPLIES 4
frithjof_v
Super User
Super User

"The file name control is an important aspect of my workflow, hence I cant stick writing to files using codes in notebook. I don't know if there is a lakehouse api that lets you write(upload) a file with developer created content such that the file name could be exactly same as dev desired."

 

This can be done with similar code like the one I used in the previous comment.

 

It can also be done with Pandas. 

 

I think also the ADLS Gen2 API can be used, it supposedly works with OneLake. Here is an example of someone who has used the API to connect to OneLake from Power Automate. I guess you can use the API from any client, not just from Power Automate. https://www.linkedin.com/pulse/how-call-onelake-api-from-power-automate-enterprise-app-nigel-smith-4...

 

PowerShell might also be an option:

https://learn.microsoft.com/en-us/fabric/onelake/onelake-powershell

 

There is also the OneLake explorer, where we can interact with OneLake as a folder structure on our local machine.

@frithjof_v  it worked. Many thanks for this. I just tried out. This is exactly what I had in my mind.

Did I answer your question? Mark my post as a solution!
Proud to be a Super User!
My custom visualization projects
Plotting Live Sound: Viz1
Beautiful News:Viz1, Viz2, Viz3
Visual Capitalist: Working Hrs
frithjof_v
Super User
Super User

I was able to use this code to write to a simple text file:

 

 

import os

# Define the sentence you want to write
sentence = "This is the sentence that will be written to the text file."

# Specify the folder path and file name
folder_base_path = "/lakehouse/default/Files/"
folder_relative_path = "sentence_files"
file_name = "output.txt"
folder_path = os.path.join(folder_base_path, folder_relative_path)

# Combine the folder path and file name
file_path = os.path.join(folder_path, file_name)

# Create the directory if it doesn't exist
os.makedirs(folder_path, exist_ok=True)

# Open the file in write mode and write the sentence
with open(file_path, "w") as file:
    file.write(sentence)

print(f"Sentence written to {file_path}")

 

 

The folder_base_path will depend on whether your notebook has a default lakehouse or if you are just mounting lakehouses to your notebook.

In the code I show above, the folder_base_path assumes that the notebook has a default lakehouse.

If you don't want to use a default lakehouse, then you will need to mount a lakehouse instead. However if you don't have any specific requirements, I would say just use a default lakehouse for your notebook.

 

https://learn.microsoft.com/en-us/fabric/data-engineering/lakehouse-notebook-explore#switch-lakehous...

 

https://fabric.guru/how-to-mount-a-lakehouse-and-identify-the-mounted-lakehouse-in-fabric-notebook

 

By default, PySpark creates a folder with multiple files. I guess this is because PySpark uses distributed processing on multiple worker nodes.

 

If you want to write a (not too big) dataframe to a single file, I think the easiest way is to use Pandas.

 

https://community.fabric.microsoft.com/t5/Data-Engineering/How-do-I-just-write-a-CSV-file-to-a-lakeh...

 

https://www.reddit.com/r/MicrosoftFabric/s/J38eNFH1gw

I usually write to delta tables and not to files. Write to tables create the table with exact same desiredTblName. I was hoping for the same to happen when it comes to files. But to my surprise, it creates a folder in the files section with desiredName (that I was hoping to be the file name) and the actual file name is spark generated.

 

I have also observed that if you are manual uploading a file / using pipeline to copy a parquet (or any other available format) file to a sink/destination, the system writes it with exact same desiredName.

 

The file name control is an important aspect of my workflow, hence I cant stick writing to files using codes in notebook. I don't know if there is a lakehouse api that lets you write(upload) a file with developer created content such that the file name could be exactly same as dev desired. If it happens, I want to try it out, but for now I have broken my code to let pipeline handle that part.

Did I answer your question? Mark my post as a solution!
Proud to be a Super User!
My custom visualization projects
Plotting Live Sound: Viz1
Beautiful News:Viz1, Viz2, Viz3
Visual Capitalist: Working Hrs

Helpful resources

Announcements
Join our Fabric User Panel

Join our Fabric User Panel

This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.

June FBC25 Carousel

Fabric Monthly Update - June 2025

Check out the June 2025 Fabric update to learn about new features.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.