Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

The Power BI DataViz World Championships are on! With four chances to enter, you could win a spot in the LIVE Grand Finale in Las Vegas. Show off your skills.

Reply
seedrs91
Frequent Visitor

Very tricky text extraction

Hi all, I have two columns below where there are multiple “tenants” in one column and multiple “areas” in another. I want to extract all the “SA Majeste” and “PWGSC” entities from the Tenant column such that in the new first column, I get:

 

  • SA MAJESTE LA REINE ( 81001133)
  • SA MAJESTE LA REINE (81001086

 

etc. and their corresponding areas—formatted as “(XX,XXX.00)”—in the new second column:

 

  • 6,075
  • 7,453

 

The SA Majeste or PWGSC entities are not always in second place. They could be in third, fourth or nth place. Thus, there is a pattern for extraction, but I had no luck figuring out the M code for this extraction. Unfortunately, all the data comes from a PDF online, so I have no control over how the data are structured…I’d have normalized the data to make the table analysis-friendly otherwise.

 

seedrs91_0-1689190600254.jpeg

4 REPLIES 4
smpa01
Super User
Super User

@seedrs91 is there any other data format avaialble other than PDF for the download? such as XML/HTML?

Did I answer your question? Mark my post as a solution!
Proud to be a Super User!
My custom visualization projects
Plotting Live Sound: Viz1
Beautiful News:Viz1, Viz2, Viz3
Visual Capitalist: Working Hrs

Unfortunately not, It is only available in PDF.

Like you said there is no apparent pattern to this and trying to come up with a pattern and then translate that into regex and then using js within power query seems like a difficult task for this (from performance standpoint too).

There is one thing that comes to my mind; if use node.js; there is a package called pdf2json. So you can create a node project and utilize pdf2json to convert that to a json first and then pass the json to power bi. I am not 100% sure though if it would still give you what you need.

 

Or if you have direct access to data vendor, you can ask them to format the data with desired delimiter between tenants and corresponding fields before making it avaialble to clients. I did exactly this in similar situation like you at past.

Did I answer your question? Mark my post as a solution!
Proud to be a Super User!
My custom visualization projects
Plotting Live Sound: Viz1
Beautiful News:Viz1, Viz2, Viz3
Visual Capitalist: Working Hrs

Sorry, I'm still a beginner so may I ask what is it that pdf2json does that helps with my goal?

 


@smpa01 wrote:

 

Or if you have direct access to data vendor, you can ask them to format the data with desired delimiter between tenants and corresponding fields before making it avaialble to clients. I did exactly this in similar situation like you at past.


Apologies but this won't be possible. This is a bid whereby the vendor doesn't provide preferntial treatment to any bidder.

Helpful resources

Announcements
Feb2025 Sticker Challenge

Join our Community Sticker Challenge 2025

If you love stickers, then you will definitely want to check out our Community Sticker Challenge!

Jan NL Carousel

Fabric Community Update - January 2025

Find out what's new and trending in the Fabric community.

Top Kudoed Authors