The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends September 15. Request your voucher.
Hey guys
So I watched some tutorials and read some Microsoft answer how to start with the data import via PQ.
When I selcted the file which contains the pdf´s, I click on transform, followed by clicking the down pointing double arrow at the first colum "content" to get to the combine window.
Now I got the problem, that the suggested "Tables" dont fit the way I need different text passages to be extracted.
I got the option to choose the complete pdf with "Page001" - but what to do then?
How can I select the passages I need?
The main project I want to clear is, to get a self refreshing diagram, based on the "everytime a file is created" synchronisation from onedrive. Maybe someone can share some light here please, thanks a lot.
Solved! Go to Solution.
Hi @_V_ ,
Please share more details about the issue, as it might help in resolving it accurately.
Thank You.
Hi @_V_ ,
We haven’t received a response from you in some time. To help address the issue, could you share some sample data when possible?
Thank you.
Hi @_V_ ,
If your issue is resolved, that’s good to hear. But if you're still facing any difficulties or running into new challenges, feel free to let me know.
Thank You.
Hi @_V_ ,
Thank you for your response. Sorry to hear you’re still encountering errors. Could you provide a bit more information about the issue? For instance.
1. Does the error occur at a specific step? Or
2. Does it appear after you paste the M code?
With these details, I can troubleshoot more effectively or update the logic if needed.
Thank you for your patience let me know so I can assist further.
Regards,
Yugandhar.
Hi @_V_ ,
Please share more details about the issue, as it might help in resolving it accurately.
Thank You.
Hey @_V_ ,
Perfect! I can see exactly what you need from the weigh slip. Looking at your Zimmermann Recycling document, I can identify the key fields:
Target Data Points:
Extraction Strategy: Since this is a structured document, here's the approach:
The German text actually helps - terms like "Bauschutt", "Beton", "Asphalt" are consistent identifiers you can use for classification.
Best Regards,
Jainesh Poojara | Power BI Developer
Jainesh,
I know what youre talking about, but I really dont get it done since Im completely new to this.
The problem with the delivery type is a field above; its either "ab Werk" which means taken of or its "Anlieferung" which would be "brought to us". Anyways, I guess this could be resolved later with the diagram, since we do not offer the materials we take for sale. So a materialtype can only be on or the other.
Ive taken some pics to show you my nooby moves, could you please have a look again?
I really appreciate your time and efforts.
After choosing the folder, I hit transform data
Then I use the double arrow next to content
and now I have to approch the filters?
Then I choose the page, since it has all informations
Hey @_V_ ,
Looking at your screenshots, you're on the right track! Here are the next steps to clean up your data:
After selecting Page001:
Quick tip: Don't worry about the delivery type complexity now - you can always add a calculated column later to mark sellable vs non-sellable materials.
Your approach is correct, just need these cleanup steps before loading the data!
Best Regards,
Jainesh Poojara | Power BI Developer
Jainesh hey
I dont get it as I need it.
If I choose to delete all null-values in a column, it automatically deletes every following row of other columns; so I can only do this on sone columns which do not affect my needed values.
Also, even if every weighing pdf is built the same way, I get the net weight in different columns for instance. This is really stressing and I can see, why paid apps exist to be honest....
Is there a possibility that you have a look into this if I send you a couple of those files? Just to see if a pro has the same problem
Hey @_V_,
I completely understand your frustration! You've hit two very common Power Query challenges that even pros deal with regularly. Let me help you solve both issues:
This happens because Power Query removes the entire row when ANY selected column has nulls. Here's how to fix it:
Better approach:
Or use conditional filtering:
This is the classic "inconsistent PDF structure" problem. Here's the pro approach:
Solution - Unpivot and merge:
Did it work? ✔ Give a Kudo • Mark as Solution – help others too!
Best regards,
Jainesh Poojara / Power BI Developer
Hi @_V_ ,
Thank you for providing the context and screenshots they're very helpful. I also appreciate @jaineshp support and valuable insights throughout this discussion. You're on the right track, and it's common to face some challenges with PDFs in Power Query. Even if the layout looks consistent, small formatting changes can cause issues.
Here are a few suggestions:
Helpful Reference: Power Query Exact Online Premium connector - Power Query | Microsoft Learn
Hope this helps, It covers both table and text extraction.
Regards,
Yugandhar.
Hey good morning Yugandhar
Ok, but I dont know where to set the option "extract text" instead of parts/tables;
Ill have a look into your link.
Thanks so far
Edit: Does this contain a paid plugin? Is there no way to get this done without paying extra stuff....?
Edit2: Thanks again for your big help Jainesh
I still dont get it done; sometimes the net weight isnt even listed in the standard Page layout.
I can imagine, merging values into one column makes it even harder, since the source of the weight is always in a different row.
Meaning: I get then 1 column with all the weights and I wont be able to tell which one is net, which is tara and so on. Im done with this, Ill have a look who has the cheapest scraper app
Hello @_V_ ,
You're absolutely right Power Query doesn’t label a button as Extract Text. What we mean is when you select Page001 during the Combine Files step, you're effectively importing the entire page's raw text which then allows you to apply text functions like Text.BetweenDelimiters() to extract exactly what you need.
And just to clarify you don’t need any paid plugin to get this working. Everything can be done using Power Query in Excel or Power BI Desktop with built in features.
Thank You.
Like said before, its missing some net weight data and even if everything would be there, its too complicated to me to accomplish.
I completely understand your concern. Just to add to what we’ve discussed so far, here are a couple of community threads that are quite similar to your situation
Solved: Import Full PDF into Power BI instead of a few tab... - Microsoft Fabric Community
Solved: Multiple PDFs Different Tables - Microsoft Fabric Community
These threads may not exactly match your case, but they touch on the same challenges with PDF structure, data inconsistency, and parsing. It might be helpful to check them out for some additional insights.
Thanks for your patience.
Those links do not help me.
Thanks
How do I REPLACE a cells value, if the cell above meets certain conditions?
Since the textfilters deletes whole rows, I need the replace funktion.
So what I need is:
If a cell is equal "Tara (t)" then replace it with "/" and the content of the cell below with "/"
Can you tell me that please?
We've successfully applied the transformation as requested:
Replaced "Tara (t)" with "/"
Also replaced the value immediately following "Tara (t)" with "/"
FYI:
Please use this M Code in the Advanced Editor.
let
Source = Table.FromRows({
{"Gross weight"},
{"Tara (t)"},
{"13,88"},
{"Net weight"},
{"Tara (t)"},
{"14,32"},
{"Other"}
}, {"A"}),
AddIndex = Table.AddIndexColumn(Source, "Index", 0, 1, Int64.Type),
MarkTara = Table.AddColumn(AddIndex, "Replace", each
if [A] = "Tara (t)" then 1 else 0),
MarkNext = Table.AddColumn(MarkTara, "ReplaceNext", each
try MarkTara[Replace]{[Index]-1} otherwise 0),
Combined = Table.AddColumn(MarkNext, "Final", each
if [Replace] = 1 or [ReplaceNext] = 1 then "/" else [A]),
Output = Table.SelectColumns(Combined, {"Final"})
in
Output
Regards,
Yugandhar.
Hello @_V_ ,
Everything appears to be in order now. If you need any more help, just let us know.
Thanks.
Hi @_V_ ,
You're on the right track. After expanding "Content" and seeing "Page001", you'll need to use text functions to extract specific passages:
For auto-refresh: Once extraction works, set up scheduled refresh in Power BI/Excel and your OneDrive sync should handle the file updates automatically.
What specific text sections are you trying to extract?That would help suggest the right text functions to use.
Best Regards,
Jainesh Poojara | Power BI Developer