Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Join the OneLake & Platform Admin teams for an ask US anything on July 16th. Join now.

Reply
esraaE
Frequent Visitor

extract data into OneLake using Fabric

I have a collection of PDFs from various sources and need to extract data into OneLake using Fabric.
How can I do that? Are there any documentation or steps I should follow?

1 ACCEPTED SOLUTION
NandanHegde
Super User
Super User

Assuming the sources are supported by either Dataflow Gen 2 or datapipelines, you can extract data from PDF via dataflow Gen 2 .

 

Below blog explains the same:

https://datasharkx.wordpress.com/2023/12/03/read-and-import-data-from-pdf-file-using-msft-fabric/




----------------------------------------------------------------------------------------------
Nandan Hegde (MSFT Data MVP)
LinkedIn Profile : www.linkedin.com/in/nandan-hegde-4a195a66
GitHUB Profile : https://github.com/NandanHegde15
Twitter Profile : @nandan_hegde15
MSFT MVP Profile : https://mvp.microsoft.com/en-US/MVP/profile/8977819f-95fb-ed11-8f6d-000d3a560942
Topmate : https://topmate.io/nandan_hegde
Blog :https://datasharkx.wordpress.com

View solution in original post

3 REPLIES 3
NandanHegde
Super User
Super User

Assuming the sources are supported by either Dataflow Gen 2 or datapipelines, you can extract data from PDF via dataflow Gen 2 .

 

Below blog explains the same:

https://datasharkx.wordpress.com/2023/12/03/read-and-import-data-from-pdf-file-using-msft-fabric/




----------------------------------------------------------------------------------------------
Nandan Hegde (MSFT Data MVP)
LinkedIn Profile : www.linkedin.com/in/nandan-hegde-4a195a66
GitHUB Profile : https://github.com/NandanHegde15
Twitter Profile : @nandan_hegde15
MSFT MVP Profile : https://mvp.microsoft.com/en-US/MVP/profile/8977819f-95fb-ed11-8f6d-000d3a560942
Topmate : https://topmate.io/nandan_hegde
Blog :https://datasharkx.wordpress.com

How can I write to my Fabric Lakehouse via external app or Systems?
We are currently integrating multiple systems and need to push files directly into the Fabric Lakehouse. However, we would like to avoid using notebooks or pipelines for this process. Additionally, we anticipate adding more systems in the future and want to ensure the solution can scale accordingly.
Could you please advise on the best approach to achieve this?
Anonymous
Not applicable

Hi @esraaE 

 

First extract the data, you might consider using a tool or library to extract the data from the PDF.

 

After extracting the data, you may need to clean the data and convert it into a suitable format (for example, CSV, JSON) to load into OneLake.

 

Here are some information about connecting to onelake:

 

How do I connect to OneLake? - Microsoft Fabric | Microsoft Learn

 

Options to get data into the Lakehouse - Microsoft Fabric | Microsoft Learn

 

Access OneLake with Python - Microsoft Fabric | Microsoft Learn

 

Regards,

Nono Chen

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Helpful resources

Announcements
Join our Fabric User Panel

Join our Fabric User Panel

This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.

June FBC25 Carousel

Fabric Monthly Update - June 2025

Check out the June 2025 Fabric update to learn about new features.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.