Power BI is turning 10! Tune in for a special live episode on July 24 with behind-the-scenes stories, product evolution highlights, and a sneak peek at what’s in store for the future.
Save the dateEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
I have a collection of PDFs from various sources and need to extract data into OneLake using Fabric.
How can I do that? Are there any documentation or steps I should follow?
Solved! Go to Solution.
Assuming the sources are supported by either Dataflow Gen 2 or datapipelines, you can extract data from PDF via dataflow Gen 2 .
Below blog explains the same:
https://datasharkx.wordpress.com/2023/12/03/read-and-import-data-from-pdf-file-using-msft-fabric/
Assuming the sources are supported by either Dataflow Gen 2 or datapipelines, you can extract data from PDF via dataflow Gen 2 .
Below blog explains the same:
https://datasharkx.wordpress.com/2023/12/03/read-and-import-data-from-pdf-file-using-msft-fabric/
Hi @esraaE
First extract the data, you might consider using a tool or library to extract the data from the PDF.
After extracting the data, you may need to clean the data and convert it into a suitable format (for example, CSV, JSON) to load into OneLake.
Here are some information about connecting to onelake:
How do I connect to OneLake? - Microsoft Fabric | Microsoft Learn
Options to get data into the Lakehouse - Microsoft Fabric | Microsoft Learn
Access OneLake with Python - Microsoft Fabric | Microsoft Learn
Regards,
Nono Chen
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.