Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started

Reply
NEK
Regular Visitor

Excel incremental load into lakehouse

We are continuously entering data into an excel file. We ingested the excel data into the lakehouse. How can we keep loading the new data added to the excel file into the lakehouse?

1 ACCEPTED SOLUTION
AndyDDC
Memorable Member
Memorable Member

You could do it in 2 ways:

 

  • Use a Notebook to import the Excel data into a dataframe and then load to the Lakehouse Table (I assume you have a Delta table?) and then schedule the Notebook to run.

Something like this:

 

import pandas

import os

df=pandas.read_excel("abfss://d5723636-d844-4bf7-bfc2-8ad519fdef14@onelake.dfs.fabric.microsoft.com/8e81a236-b5a9-4a94-8565-1d0a84955e16/Files/Raw/Excel/SomeExcelData.xlsx",sheet_name=0)

display(df)

View solution in original post

6 REPLIES 6
bbrennan12345
Regular Visitor

I am trying do something similar but instead I am uploading a new file each day with the previous days sales and I want to append that to an existing table in a lake house. Is this possible?

I'm pretty sure that's possible. 

 

I guess it will be an advantage if you name the files in a consistent manner and include the date in the file name.

 

Let's say you put all the files in the same folder. One new file each day.

 

If you e.g. use a Dataflow Gen2 which is scheduled to run each day, I think it should be possible to use the actual day as a variable in your query, to point the query to the correct file.

 

E.g. use something like Date.From(DateTimeZone.UtcNow()) to determine the actual date, and then create a variable based on the actual date in order to create a "file name variable" which you can use in the Dataflow Gen2 query to identify the correct file in the source folder.

 

Then you can run this Dataflow Gen2 each day and append the query output to the Lakehouse table.

 

You could probably also do more advanced stuff like anti join merging with the existing data in the Lakehouse table, if you need to ensure that you are not entering the same data twice.

(Then I think you could also just replace the Excel file every day, and use the same name of the Excel file every day, irrespective of date.) However the merge operation may be more "costly" in terms of resource consumption (CU) especially if the Excel file or Lakehouse table grows large.

 

 

You can probably do the same in a Notebook.

 

And maybe in a Data pipeline also, however I don't have experience with Data pipeline.

 

 

Perhaps this preview feature could be helpful for this kind of requirement. However at the moment this is just a preview feature, so I don't think it's recommended to use it for production environment. https://learn.microsoft.com/en-us/fabric/data-factory/pipeline-storage-event-triggers

Thank you very much

NEK
Regular Visitor

Thanks AndyDC

AndyDDC
Memorable Member
Memorable Member

You could do it in 2 ways:

 

  • Use a Notebook to import the Excel data into a dataframe and then load to the Lakehouse Table (I assume you have a Delta table?) and then schedule the Notebook to run.

Something like this:

 

import pandas

import os

df=pandas.read_excel("abfss://d5723636-d844-4bf7-bfc2-8ad519fdef14@onelake.dfs.fabric.microsoft.com/8e81a236-b5a9-4a94-8565-1d0a84955e16/Files/Raw/Excel/SomeExcelData.xlsx",sheet_name=0)

display(df)

or you could use Dataflows Gen2 to connect to the Excel document and save to a Lakehouse table.  Either choose the Overwrite or Append option in the Dataflow destination depending on how your data in the Excel document is being updated.  You could then schedule the Dataflow to run.

 

AndyDDC_0-1699525832357.png

 

Helpful resources

Announcements
Europe Fabric Conference

Europe’s largest Microsoft Fabric Community Conference

Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.

AugFabric_Carousel

Fabric Monthly Update - August 2024

Check out the August 2024 Fabric update to learn about new features.

September Hackathon Carousel

Microsoft Fabric & AI Learning Hackathon

Learn from experts, get hands-on experience, and win awesome prizes.

Sept NL Carousel

Fabric Community Update - September 2024

Find out what's new and trending in the Fabric Community.