The ultimate Microsoft Fabric, Power BI, Azure AI, and SQL learning event: Join us in Stockholm, September 24-27, 2024.
Save €200 with code MSCUST on top of early bird pricing!
Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started
We are continuously entering data into an excel file. We ingested the excel data into the lakehouse. How can we keep loading the new data added to the excel file into the lakehouse?
Solved! Go to Solution.
You could do it in 2 ways:
Something like this:
import pandas
import os
df=pandas.read_excel("abfss://d5723636-d844-4bf7-bfc2-8ad519fdef14@onelake.dfs.fabric.microsoft.com/8e81a236-b5a9-4a94-8565-1d0a84955e16/Files/Raw/Excel/SomeExcelData.xlsx",sheet_name=0)
display(df)
I am trying do something similar but instead I am uploading a new file each day with the previous days sales and I want to append that to an existing table in a lake house. Is this possible?
I'm pretty sure that's possible.
I guess it will be an advantage if you name the files in a consistent manner and include the date in the file name.
Let's say you put all the files in the same folder. One new file each day.
If you e.g. use a Dataflow Gen2 which is scheduled to run each day, I think it should be possible to use the actual day as a variable in your query, to point the query to the correct file.
E.g. use something like Date.From(DateTimeZone.UtcNow()) to determine the actual date, and then create a variable based on the actual date in order to create a "file name variable" which you can use in the Dataflow Gen2 query to identify the correct file in the source folder.
Then you can run this Dataflow Gen2 each day and append the query output to the Lakehouse table.
You could probably also do more advanced stuff like anti join merging with the existing data in the Lakehouse table, if you need to ensure that you are not entering the same data twice.
(Then I think you could also just replace the Excel file every day, and use the same name of the Excel file every day, irrespective of date.) However the merge operation may be more "costly" in terms of resource consumption (CU) especially if the Excel file or Lakehouse table grows large.
You can probably do the same in a Notebook.
And maybe in a Data pipeline also, however I don't have experience with Data pipeline.
Perhaps this preview feature could be helpful for this kind of requirement. However at the moment this is just a preview feature, so I don't think it's recommended to use it for production environment. https://learn.microsoft.com/en-us/fabric/data-factory/pipeline-storage-event-triggers
Thank you very much
Thanks AndyDC
You could do it in 2 ways:
Something like this:
import pandas
import os
df=pandas.read_excel("abfss://d5723636-d844-4bf7-bfc2-8ad519fdef14@onelake.dfs.fabric.microsoft.com/8e81a236-b5a9-4a94-8565-1d0a84955e16/Files/Raw/Excel/SomeExcelData.xlsx",sheet_name=0)
display(df)
or you could use Dataflows Gen2 to connect to the Excel document and save to a Lakehouse table. Either choose the Overwrite or Append option in the Dataflow destination depending on how your data in the Excel document is being updated. You could then schedule the Dataflow to run.
Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.
Check out the August 2024 Fabric update to learn about new features.