Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get Fabric Certified for FREE during Fabric Data Days. Don't miss your chance! Request now

Reply
esraaE
Frequent Visitor

extract data into OneLake using Fabric

I have a collection of PDFs from various sources and need to extract data into OneLake using Fabric.
How can I do that? Are there any documentation or steps I should follow?

1 ACCEPTED SOLUTION
NandanHegde
Super User
Super User

Assuming the sources are supported by either Dataflow Gen 2 or datapipelines, you can extract data from PDF via dataflow Gen 2 .

 

Below blog explains the same:

https://datasharkx.wordpress.com/2023/12/03/read-and-import-data-from-pdf-file-using-msft-fabric/




----------------------------------------------------------------------------------------------
Nandan Hegde (MSFT Data MVP)
LinkedIn Profile : www.linkedin.com/in/nandan-hegde-4a195a66
GitHUB Profile : https://github.com/NandanHegde15
Twitter Profile : @nandan_hegde15
MSFT MVP Profile : https://mvp.microsoft.com/en-US/MVP/profile/8977819f-95fb-ed11-8f6d-000d3a560942
Topmate : https://topmate.io/nandan_hegde
Blog :https://datasharkx.wordpress.com

View solution in original post

3 REPLIES 3
NandanHegde
Super User
Super User

Assuming the sources are supported by either Dataflow Gen 2 or datapipelines, you can extract data from PDF via dataflow Gen 2 .

 

Below blog explains the same:

https://datasharkx.wordpress.com/2023/12/03/read-and-import-data-from-pdf-file-using-msft-fabric/




----------------------------------------------------------------------------------------------
Nandan Hegde (MSFT Data MVP)
LinkedIn Profile : www.linkedin.com/in/nandan-hegde-4a195a66
GitHUB Profile : https://github.com/NandanHegde15
Twitter Profile : @nandan_hegde15
MSFT MVP Profile : https://mvp.microsoft.com/en-US/MVP/profile/8977819f-95fb-ed11-8f6d-000d3a560942
Topmate : https://topmate.io/nandan_hegde
Blog :https://datasharkx.wordpress.com

How can I write to my Fabric Lakehouse via external app or Systems?
We are currently integrating multiple systems and need to push files directly into the Fabric Lakehouse. However, we would like to avoid using notebooks or pipelines for this process. Additionally, we anticipate adding more systems in the future and want to ensure the solution can scale accordingly.
Could you please advise on the best approach to achieve this?
Anonymous
Not applicable

Hi @esraaE 

 

First extract the data, you might consider using a tool or library to extract the data from the PDF.

 

After extracting the data, you may need to clean the data and convert it into a suitable format (for example, CSV, JSON) to load into OneLake.

 

Here are some information about connecting to onelake:

 

How do I connect to OneLake? - Microsoft Fabric | Microsoft Learn

 

Options to get data into the Lakehouse - Microsoft Fabric | Microsoft Learn

 

Access OneLake with Python - Microsoft Fabric | Microsoft Learn

 

Regards,

Nono Chen

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Helpful resources

Announcements
Fabric Data Days Carousel

Fabric Data Days

Advance your Data & AI career with 50 days of live learning, contests, hands-on challenges, study groups & certifications and more!

October Fabric Update Carousel

Fabric Monthly Update - October 2025

Check out the October 2025 Fabric update to learn about new features.

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.