Join us at FabCon Atlanta from March 16 - 20, 2026, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.
Register now!View all the Fabric Data Days sessions on demand. View schedule
I have a collection of PDFs from various sources and need to extract data into OneLake using Fabric.
How can I do that? Are there any documentation or steps I should follow?
Solved! Go to Solution.
Assuming the sources are supported by either Dataflow Gen 2 or datapipelines, you can extract data from PDF via dataflow Gen 2 .
Below blog explains the same:
https://datasharkx.wordpress.com/2023/12/03/read-and-import-data-from-pdf-file-using-msft-fabric/
Assuming the sources are supported by either Dataflow Gen 2 or datapipelines, you can extract data from PDF via dataflow Gen 2 .
Below blog explains the same:
https://datasharkx.wordpress.com/2023/12/03/read-and-import-data-from-pdf-file-using-msft-fabric/
Hi @esraaE
First extract the data, you might consider using a tool or library to extract the data from the PDF.
After extracting the data, you may need to clean the data and convert it into a suitable format (for example, CSV, JSON) to load into OneLake.
Here are some information about connecting to onelake:
How do I connect to OneLake? - Microsoft Fabric | Microsoft Learn
Options to get data into the Lakehouse - Microsoft Fabric | Microsoft Learn
Access OneLake with Python - Microsoft Fabric | Microsoft Learn
Regards,
Nono Chen
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Check out the November 2025 Fabric update to learn about new features.
Advance your Data & AI career with 50 days of live learning, contests, hands-on challenges, study groups & certifications and more!
| User | Count |
|---|---|
| 7 | |
| 3 | |
| 3 | |
| 2 | |
| 2 |