Microsoft Fabric Community Conference 2025, March 31 - April 2, Las Vegas, Nevada. Use code MSCUST for a $150 discount.
Register nowThe Power BI DataViz World Championships are on! With four chances to enter, you could win a spot in the LIVE Grand Finale in Las Vegas. Show off your skills.
Hi all, I have two columns below where there are multiple “tenants” in one column and multiple “areas” in another. I want to extract all the “SA Majeste” and “PWGSC” entities from the Tenant column such that in the new first column, I get:
etc. and their corresponding areas—formatted as “(XX,XXX.00)”—in the new second column:
The SA Majeste or PWGSC entities are not always in second place. They could be in third, fourth or nth place. Thus, there is a pattern for extraction, but I had no luck figuring out the M code for this extraction. Unfortunately, all the data comes from a PDF online, so I have no control over how the data are structured…I’d have normalized the data to make the table analysis-friendly otherwise.
@seedrs91 is there any other data format avaialble other than PDF for the download? such as XML/HTML?
Unfortunately not, It is only available in PDF.
Like you said there is no apparent pattern to this and trying to come up with a pattern and then translate that into regex and then using js within power query seems like a difficult task for this (from performance standpoint too).
There is one thing that comes to my mind; if use node.js; there is a package called pdf2json. So you can create a node project and utilize pdf2json to convert that to a json first and then pass the json to power bi. I am not 100% sure though if it would still give you what you need.
Or if you have direct access to data vendor, you can ask them to format the data with desired delimiter between tenants and corresponding fields before making it avaialble to clients. I did exactly this in similar situation like you at past.
Sorry, I'm still a beginner so may I ask what is it that pdf2json does that helps with my goal?
@smpa01 wrote:
Or if you have direct access to data vendor, you can ask them to format the data with desired delimiter between tenants and corresponding fields before making it avaialble to clients. I did exactly this in similar situation like you at past.
Apologies but this won't be possible. This is a bid whereby the vendor doesn't provide preferntial treatment to any bidder.
User | Count |
---|---|
19 | |
10 | |
10 | |
9 | |
7 |