Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now! Learn more

Reply
inzaghi1230
Frequent Visitor

Power BI Screen Scraping

Hi I am in my second week of working in Power BI. I am wanting to retrieve all listing data from RealEstate.co.nz and I have got to the following stage (please see code below), however I am having questions for the further step. Please help. Thank you.

 

  1. What shall I do to transform the current result to a meaning full table eg. Make the column “FindListingHeader” the table header however remove the duplication and make the “FindListingDetail” the data in the table. I have tried to use Transpose, but it doesn’t get me a proper table.
  2. How do I make the query to go to Next Page on the website. There are roughly about 50 pages.
  3. How do I get rid of “</p” in the description field, my Replace Value didn’t seem to work very well. Also the field doesn’t display the full text although I have already change the type to text.

 

let
    Source =Table.FromColumns({Lines.FromBinary(Web.Contents("http://www.realestate.co.nz/residential/search/districts/237/property_types/1%2C2%2C3%2C4%2C5%2C6%2C7%2C8%2C9%2C50"),null,null,1252)}),
    #"Renamed Columns to HTML" = Table.RenameColumns(Source,{{"Column1", "HTML"}}),
    #"Add Conditional Column Store Header" = Table.AddColumn(#"Renamed Columns to HTML", "FindListingHeader", each if Text.Contains([HTML], "id=""listing-") then "ListingNo" else if Text.Contains([HTML], "itemprop=""name""") then "Slogan" else if Text.Contains([HTML], "itemprop=""description""") then "Description" else if Text.Contains([HTML], "itemprop=""streetAddress""") then "StreetAddress" else if Text.Contains([HTML], "itemprop=""addressLocality""") then "Suburb" else if Text.Contains([HTML], "class=""price""") then "Price" else if Text.Contains([HTML], "Bedrooms") then "Bedrooms" else if Text.Contains([HTML], "Bathrooms") then "Bathrooms" else null ),
    #"Add Conditional Column Store Details" = Table.AddColumn(#"Add Conditional Column Store Header", "FindListingDetails", each if Text.Contains([HTML], "id=""listing-") then [HTML] else if Text.Contains([HTML], "itemprop=""name""") then [HTML] else if Text.Contains([HTML], "itemprop=""description""") then [HTML] else if Text.Contains([HTML], "itemprop=""addressLocality""") then [HTML] else if Text.Contains([HTML], "itemprop=""streetAddress""") then [HTML] else if Text.Contains([HTML], "class=""price""") then [HTML] else if Text.Contains([HTML], "Bedrooms</h6></li>") then [HTML] else if Text.Contains([HTML], "Bathrooms</h6></li>") then [HTML] else null ),
    #"Changed Data Type to Text" = Table.TransformColumnTypes(#"Add Conditional Column Store Details",{{"FindListingDetails", type text}, {"FindListingHeader", type text}}),
    #"Trimmed Text" = Table.TransformColumns(#"Changed Data Type to Text",{{"FindListingDetails", Text.Trim}}),
    #"Remove <span itemprop=""name"">" = Table.ReplaceValue(#"Trimmed Text","<span itemprop=""name"">","",Replacer.ReplaceText,{"FindListingDetails"}),
    #"Remove <p itemprop=""description"">" = Table.ReplaceValue(#"Remove <span itemprop=""name"">","<p itemprop=""description"">","",Replacer.ReplaceText,{"FindListingDetails"}),
    #"Remove itemprop=""addressLocality"">" = Table.ReplaceValue(#"Remove <p itemprop=""description"">","<span class=""location"" itemprop=""addressLocality"">","",Replacer.ReplaceText,{"FindListingDetails"}),
    #"Remove <span itemprop=""streetAddress"">" = Table.ReplaceValue(#"Remove itemprop=""addressLocality"">","<span itemprop=""streetAddress"">","",Replacer.ReplaceText,{"FindListingDetails"}),
    #"Remove <div class=""price"">" = Table.ReplaceValue(#"Remove <span itemprop=""streetAddress"">","<div class=""price"">","",Replacer.ReplaceText,{"FindListingDetails"}),
    #"Remove <li><h6>" = Table.ReplaceValue(#"Remove <div class=""price"">","<li><h6>","",Replacer.ReplaceText,{"FindListingDetails"}),
    #"Remove </h6></li>" = Table.ReplaceValue(#"Remove <li><h6>","</h6></li>","",Replacer.ReplaceText,{"FindListingDetails"}),
    #"Remove </span>" = Table.ReplaceValue(#"Remove </h6></li>","</span>","",Replacer.ReplaceText,{"FindListingDetails"}),
    #"Remove </div>" = Table.ReplaceValue(#"Remove </span>","</div>","",Replacer.ReplaceText,{"FindListingDetails"}),
    #"Remove </p" = Table.ReplaceValue(#"Remove </div>","</p","",Replacer.ReplaceText,{"HTML"}),
    #"Remove """ = Table.ReplaceValue(#"Remove </p","""","",Replacer.ReplaceText,{"FindListingDetails"}),
    #"Remove <div id=listing-" = Table.ReplaceValue(#"Remove ""","<div id=listing-","",Replacer.ReplaceText,{"FindListingDetails"}),
    #"Remove class=listing" = Table.ReplaceValue(#"Remove <div id=listing-"," class=listing featuredListing itemscope itemtype=http://schema.org/Residence data-gtm={","",Replacer.ReplaceText,{"FindListingDetails"}),
    #"Remove ," = Table.ReplaceValue(#"Remove class=listing",",","",Replacer.ReplaceText,{"FindListingDetails"}),
    #"Remove }" = Table.ReplaceValue(#"Remove ,","}","",Replacer.ReplaceText,{"FindListingDetails"}),
    #"Remove >" = Table.ReplaceValue(#"Remove }",">","",Replacer.ReplaceText,{"FindListingDetails"}),
    #"Filtered Out Null and Blank on Details" = Table.SelectRows(#"Remove >", each [FindListingDetails] <> null and [FindListingDetails] <> ""),
    #"Removed Columns HTML" = Table.RemoveColumns(#"Filtered Out Null and Blank on Details",{"HTML"})
in
    #"Removed Columns HTML"

 

1 ACCEPTED SOLUTION

6 REPLIES 6
Phil_Seamark
Microsoft Employee
Microsoft Employee

Hi @inzaghi1230,

 

This should be achieveable.  Any chance you can share your PBIX file to fine tune?

 

PM me a link if that suits.


To learn more about DAX visit : aka.ms/practicalDAX

Proud to be a Datanaut!

Hi Phil,

 

Thanks for your reply. However I can not seem to attach the pbix file here. Actually if you start a new report and paste my code into the Advance Editor, you will get exactly what I have here.

 

The website I am wanting to get data is www.realestate.co.nz . At our workplace, we want to get an idea on the housing supply by taking snapshot on the property listings on a daily basis. Once we build up our data history, then we could look into the average asking price or property type on the supply side.

 

Really new to Power BI, but it's really powerful and I really want to dive into it.

 

Thanks for the help.

Hi @inzaghi1230,

 

If you upload to a One Drive, or Drop Box you can probably generate a link to share.


To learn more about DAX visit : aka.ms/practicalDAX

Proud to be a Datanaut!

Hi Phil,

 

Here is the link to one drive. https://1drv.ms/u/s!Au_vRMalV-Pujmc4UtPjcsO0LNmn

 

Please let me know if any problems. Once again, thank you for your help. Cheers.

 

 

Hi @inzaghi1230

 

Have you read these walkthroughs?

 

http://blogs.adatis.co.uk/callumgreen/post/Loop-through-Multiple-Web-Pages-using-Power-Query

 

and about URL Parameterization 

 

http://radacad.com/custom-functions-made-easy-in-power-bi-desktop 

 


To learn more about DAX visit : aka.ms/practicalDAX

Proud to be a Datanaut!

Thanks Phil. Exactly what I want! Cheers.

Helpful resources

Announcements
Power BI DataViz World Championships

Power BI Dataviz World Championships

The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now!

December 2025 Power BI Update Carousel

Power BI Monthly Update - December 2025

Check out the December 2025 Power BI Holiday Recap!

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.