Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Next up in the FabCon + SQLCon recap series: The roadmap for Microsoft SQL and Maximizing Developer experiences in Fabric. All sessions are available on-demand after the live show. Register now

Reply
mike_asplin
Helper V
Helper V

Mystified by results of web query

I have written some code to extract the data from a website, but the results are different to what I see in the browser. I noticed this as i have a function running thorugh pages 1 to 22, but I have loads of duplicates

 

Take this page

https://www.legionellacontrol.org.uk/search/?c=11&q=&r=&p=18 

 

If i just paste that as a web date source in Power query I get completely different companies. See excel here

Legionella 

 

this is with absolutely no processing. How can this be?  is this some clever dynamic listing on the website to stop people scrapping the data?  Any advice appreciated

 

Power query is returning which are not the companies on that web page

Water Logic Works Ltd
Coral Environmental Ltd
Titan Water Limited
CLIRA Limited
Trident Water Solutions Limited
Brighter Compliance Limited
Direct-Tech Solutions Limited
Lucion Services Limited
HLA Services Limited
Riverside Environmental Services Limited
Acorn Environment Services Limited
Environmental Essentials UK Ltd T/A EE UK Ltd
H2O Environmental Services Ltd
Aqua B Group/LTM Compliance
Bureau Veritas UK Limited
9 REPLIES 9
v-kpoloju-msft
Community Support
Community Support

Hi @mike_asplin

Thank you for reaching out to the Microsoft Fabric Community Forum. Also, thanks to @Natarajan_M@ronrsnfld,  for those inputs on this thread.

Has your issue been resolved? If the response provided by the community member @Natarajan_M@ronrsnfld, addressed your query, could you please confirm? It helps us ensure that the solutions provided are effective and beneficial for everyone.

Hope this helps clarify things and let me know what you find after giving these steps a try happy to help you investigate this further.

Thank you for using the Microsoft Community Forum.

Hi @mike_asplin

Just wanted to follow up. If the shared guidance worked for you, that’s wonderful hopefully it also helps others looking for similar answers. If there’s anything else you'd like to explore or clarify, don’t hesitate to reach out.

Thank you.

ronrsnfld
Super User
Super User

Please note that if, while viewing the table in the PQ UI, you "Refresh Preview", the listed companies will change. That probably explains why what you see is different in Excel vs PQ. Possibly a function of the web site as the query refreshes.

Natarajan_M
Solution Sage
Solution Sage

Hi @mike_asplin ,

I was able to extract the results from the webpage using the web connector.


Data : 

Natarajan_M_3-1772158159210.png

 



let
    StartPage = 1,
  
    EndPage = 100, 
    
    PageList = {StartPage..EndPage},
    PageTable = Table.FromList(PageList, Splitter.SplitByNothing(), {"PageNumber"}),
    
    AddURLColumn = Table.AddColumn(
        PageTable, 
        "Source URL", 
        each "https://www.legionellacontrol.org.uk/search/?c=11&q=&r=&p=" & Text.From([PageNumber])
    ),
    
    fxScrapePage = (url as text) as table =>
    let
        
        Source = Web.Contents(url),
        
        ExtractData = Html.Table(
            Source, 
            {
                {"Company Name", "a[href*='/company/?name=']"}, 
                {"Profile Link", "a[href*='/company/?name=']", each [Attributes][href]?}, 
                {"Phone", "a[href^='tel:']"}, 
                {"Email", "a[href^='mailto:']", each Text.Replace([Attributes][href], "mailto:", "")}, 
                {"Website", "a:contains('Visit Website')", each [Attributes][href]?} 
            }, 
            [RowSelector=".row"]
        ),
        
        CleanCompanyName = Table.ReplaceValue(ExtractData, "MORE INFORMATION", null, Replacer.ReplaceValue, {"Company Name"}),
        FinalFilter = Table.SelectRows(CleanCompanyName, each ([Company Name] <> null))
    in
        FinalFilter,

 
    GetData = Table.AddColumn(
        AddURLColumn, 
        "ScrapedData", 
        each fxScrapePage([Source URL])
    ),
    
    ExpandedData = Table.ExpandTableColumn(
        GetData, 
        "ScrapedData", 
        {"Company Name", "Profile Link", "Phone", "Email", "Website"}
    ),
    
  
    FinalData = Table.SelectRows(ExpandedData, each ([Company Name] <> null)),
    #"Removed Columns" = Table.RemoveColumns(FinalData,{"PageNumber", "Source URL"})


in
    #"Removed Columns"

 
File : webdata.pbix

Thanks 
If this response was helpful in any way, I’d gladly accept a kudo.
Please mark it as the correct solution. It helps other community members find their way faster

For example I found Aquafine (UK) Ltd on page 3 on the website, but its on your p18. Maybe they have a random generator to mix up the reuslts each time you search as companies are not in alphabetical order so as to not favour companies begining with A. Is that a thing on a website?

 

I searched 3 times and Aquafine (UK) Ltd is on P3 everytime on the website

 

In your code its not there

Company Name

Aqualogic Limited
Aquate Consultancy Ltd
Chlorination and Chemical Cleaning Services Ltd t/a C3 Water
Churchill Environmental Services Limited
First Environment Limited
Frankham Risk Management Services Limited
GEM Water Limited
General Environmental Services Limited
Guardian Hygiene Services Limited
Maddison Water Hygiene Limited
Maplin Services Limited
Nant Limited
Normec Latis Scientific Limited
Orion Building Engineering Services Ltd
TWC (Services) Limited

 

Hi @mike_asplin , Just observed the order of the values are not consistent between browsers 
I suspect the values will change based on the browser and the PBI desktop webview is based on Microsoft Edge WebView2 (Chromium-based).

In Chrome Personal profile 1:

Natarajan_M_0-1772206538058.png

in chrome personal profile 2 :

Natarajan_M_3-1772207161184.png

 


In Edge : 

Natarajan_M_1-1772206614209.png


the values keep changing everytime we refresh in the powerquery layer.

in pbi : 

Natarajan_M_2-1772206825527.png

 


Thanks 
If this response was helpful in any way, I’d gladly accept a kudo.
Please mark it as the correct solution. It helps other community members find their way faster

So the website is randomly sorting the results each time its queried?

 

I guess that means its impossible to scrape the data as each time you change pages the output is randomised?

Hi @mike_asplin , 
What I meant is that the sorting is inconsistent across different browsers; it varies depending on the profile (the account from which you are accessing it even with in chrome if i use diffrent accounts its giving diffrent results). Additionally, within the Power BI Webview, the order changes dynamically with each refresh, altering every time you refresh the page.

Thanks.
If this response was helpful in any way, I’d gladly accept a kudo.
Please mark it as the correct solution. It helps other community members find their way faster







My question was more why is the url in power query for page 18 producing a different result to putting the same url for page 18 into a web browser. I have never seen this behaviour before as should be the same data.  Plus the fact that createing the url in power query is returning duplicate pages for different page numbers. 

 

I appreciate you showing how to extract the html long hand, but actually power query does a good job extracting the fields its just the wrong page. 

 

Your code has the same issue.

 

If you comapre the comapnies on this url https://www.legionellacontrol.org.uk/search/?c=11&q=&r=&p=18 

 

to your pbix for the same page completely different companies and wierdly different to my model for the same page.  its like they are taking the page number and randonly picking a page.  What is even wierder is if I scroll through all 22 pages of that url on the website and check the top company on each page none of them are in the list below. Its like the power query url is going somewhere completely different.

 

Company Name

Rock Compliance Limited
Reigate Environmental Services Ltd
Scottish Water Property
Water Environmental Treatment Limited
GEM Water Limited
Aquafine (UK) Ltd
Water Logic Works Ltd
Rochester Midland Corporation Limited
Pure Drop Water Systems Management Ltd
Chemical Treatment Services Ltd
H2O Cost Reduction Services Ltd t/a H2O Legionella Control Services
Pennington Choices Limited
H2O Solutions
Life Environmental Services Limited
HYDROP Environmental Consultancy Services Limited

 

Helpful resources

Announcements
New to Fabric survey Carousel

New to Fabric Survey

If you have recently started exploring Fabric, we'd love to hear how it's going. Your feedback can help with product improvements.

Power BI DataViz World Championships carousel

Power BI DataViz World Championships - June 2026

A new Power BI DataViz World Championship is coming this June! Don't miss out on submitting your entry.

Join our Fabric User Panel

Join our Fabric User Panel

Share feedback directly with Fabric product managers, participate in targeted research studies and influence the Fabric roadmap.

March Power BI Update Carousel

Power BI Community Update - March 2026

Check out the March 2026 Power BI update to learn about new features.