Join us at FabCon Atlanta from March 16 - 20, 2026, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.
Register now!View all the Fabric Data Days sessions on demand. View schedule
I have a collection of PDFs from various sources and need to extract data into OneLake using Fabric.
How can I do that? Are there any documentation or steps I should follow?
Solved! Go to Solution.
Assuming the sources are supported by either Dataflow Gen 2 or datapipelines, you can extract data from PDF via dataflow Gen 2 .
Below blog explains the same:
https://datasharkx.wordpress.com/2023/12/03/read-and-import-data-from-pdf-file-using-msft-fabric/
Assuming the sources are supported by either Dataflow Gen 2 or datapipelines, you can extract data from PDF via dataflow Gen 2 .
Below blog explains the same:
https://datasharkx.wordpress.com/2023/12/03/read-and-import-data-from-pdf-file-using-msft-fabric/
Hi @esraaE
First extract the data, you might consider using a tool or library to extract the data from the PDF.
After extracting the data, you may need to clean the data and convert it into a suitable format (for example, CSV, JSON) to load into OneLake.
Here are some information about connecting to onelake:
How do I connect to OneLake? - Microsoft Fabric | Microsoft Learn
Options to get data into the Lakehouse - Microsoft Fabric | Microsoft Learn
Access OneLake with Python - Microsoft Fabric | Microsoft Learn
Regards,
Nono Chen
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
| User | Count |
|---|---|
| 7 | |
| 3 | |
| 3 | |
| 2 | |
| 2 |