Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Power BI is turning 10! Let’s celebrate together with dataviz contests, interactive sessions, and giveaways. Register now.

Reply
PaulDBrown
Community Champion
Community Champion

Scrape only relevant images from web

Good morning,

 

See if anyone can help a real beginner in M. I've been reading up on web scarping from various sources to see how to scrpe image urls from websites. I've been trying my luck with the following link: http://classicyachtinfo.com/yachts/a-day-at-the-races/ 

 

And this is the code I have so far:

 

let
    Source = 
     Web.BrowserContents("http://classicyachtinfo.com/yachts/a-day-at-the-races"),
    Image = 
     Html.Table(
      Source, 
      {{
       "Image", 
       "img", 
       each [Attributes][src]}})
       
in
    Image

 

 

The thing is this code brings in all images on the page.

fecth image 2.JPG

 

 

Is there a way to restrict the image url import to the relevant ones? for example, from this page it would be the image related to the yacht itself (no ads etc):

 

yacht image.jpg

 

I have been trying to use RowSelector but keep getting an error . (My knowledge of M is far too basic to work my way out of the error, and have tried a few things over the last hour...)

 

Thanks for any help!





Did I answer your question? Mark my post as a solution!
In doing so, you are also helping me. Thank you!

Proud to be a Super User!
Paul on Linkedin.






1 ACCEPTED SOLUTION

@edhans 

 

Thanks again for your suggestion. I did eventually succeed in importing the image URLs, albeit tweaking the M code for the different websites. Just for general information, I imported about 3,500 different URLs, from about 1,650 different pages from 4 different websites. The whole process took over 6 hours probably (in fact I started the queries at around 20:00, checked well past midnight and it was more or less half way there...). 

 

I subsequently decided I was also interest in importing some text from each of those pages, but have given up trying to work out the code for the selector etc needed...

 

Thanks again!





Did I answer your question? Mark my post as a solution!
In doing so, you are also helping me. Thank you!

Proud to be a Super User!
Paul on Linkedin.






View solution in original post

4 REPLIES 4
edhans
Super User
Super User

I'm not sure how to get that specific image as I don't know how you'd know the file name. However, I did put together a list of all images on that page that have the word "yacht" in them, excluding ads.

 

I started with your code, then:

  1. converted everything to lower case. Power Query is case senstitive, so Yacht and yacht are different. This makes it easier to search.
  2. I inserted a new column to include the text that appears after "classicyachtinfo.com/wp-content/". That seemed to eliminate the ads from Marine Insurance, the WP pixel trackers, etc.
  3. Then I filtered that final column for the word "yacht."

I got 4 results, but still not sure which image would be the one you are looking at, and not sure it is any of them. You are assuming the file names would have the word you are looking for. That might be called "sailboatbybeach" on that site. 🤷‍♂️

 

let
    Source = 
     Web.BrowserContents("http://classicyachtinfo.com/yachts/a-day-at-the-races"),
    Image = 
     Html.Table(
      Source, 
      {{
       "Image", 
       "img", 
       each [Attributes][src]}}),
    #"Lowercased Text" = Table.TransformColumns(Image,{{"Image", Text.Lower, type text}}),
    #"Inserted Text After Delimiter" = Table.AddColumn(#"Lowercased Text", "Text After Delimiter", each Text.AfterDelimiter([Image], "classicyachtinfo.com/wp-content/"), type text),
    #"Filtered Rows" = Table.SelectRows(#"Inserted Text After Delimiter", each Text.Contains([Text After Delimiter], "yacht"))
in
    #"Filtered Rows"

a



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting

@edhans 

 

Thank you for taking the time to give this a shot. Unfortunately the images rendered are not what I am looking for:

 

M result.JPG

 

I actually think my original code does not include the images I am looking for. If I go into the webpage 's code, what I'm trying to get at is the following:

 

2019-11-07.png

 

or...

 

2019-11-07 (1).png

 

which is why I started playing around with "RowSelector", but as I say, I know veeeery little about M and even less about CSS selectors..

The thing is I need to write queries for a number of webpages to import image urls (which will all be of course have different structures), and I'm trying to understand the coding patterns/structure to apply to each website (which entails providing the code necessary to reach out to a particuar segment of the webpage's code).

 

Thanks again!

 

 





Did I answer your question? Mark my post as a solution!
In doing so, you are also helping me. Thank you!

Proud to be a Super User!
Paul on Linkedin.






If you change my final filter to look for "races" vs "yacht" is that not the two images you are referring to?



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting

@edhans 

 

Thanks again for your suggestion. I did eventually succeed in importing the image URLs, albeit tweaking the M code for the different websites. Just for general information, I imported about 3,500 different URLs, from about 1,650 different pages from 4 different websites. The whole process took over 6 hours probably (in fact I started the queries at around 20:00, checked well past midnight and it was more or less half way there...). 

 

I subsequently decided I was also interest in importing some text from each of those pages, but have given up trying to work out the code for the selector etc needed...

 

Thanks again!





Did I answer your question? Mark my post as a solution!
In doing so, you are also helping me. Thank you!

Proud to be a Super User!
Paul on Linkedin.






Helpful resources

Announcements
Join our Fabric User Panel

Join our Fabric User Panel

This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.

June 2025 Power BI Update Carousel

Power BI Monthly Update - June 2025

Check out the June 2025 Power BI update to learn about new features.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.