Making Retrieval of Web Data More Robust

codemonkey · ‎08-01-2017

Hi All,

I retrieve data from a number of web pages as per the standard WebPage(Web.Contents...mechanism.

I refresh this data every week, but sometimes the structure in the source pages changes. For example, one week, a source page might have 3 tables on it, where I am interested in the 3rd table. But the next week, due to changes in advertising sections, there might be 6 tables, meaning I am then interested in the 5th table.

I know the characteristics of the table I am interested in. I know, for example, column names that are specific to that table only.

In another language, I would therefore loop through all the tables and select the table of interest based on the presence of one of its unique column names.

In M, I've achieve something similar as follows:

let
    Source = Web.Page(Web.Contents("http://www.website.htm")),
    Tables = Table.SelectRows(Source, each [Source] = "Table"),
    ExpandedTable = Table.ExpandTableColumn(Tables,"Data",{"MyTargetColumn1","MyTargetColumn2","MyTargetColumn3","MyTargetColumn4"}),
    FilteredTable = Table.RemoveMatchingRows(ExpandedTable, {[MyTargetColumn1 = null]}, "MyTargetColumn1")
in
    FilteredTable

This works perfectly in terms of end result, but I'm wondering if there is a more efficient way to do this. For example, one of the drawbacks of the above method is that I'm expanding the contents of all the tables before removing rows that have a null value in my key target column. The tables are small, so it's not that big a performance hit. But ideally, I'd want to identify the target table and only expand it before applying filters.

Any suggestions on how I can achieve this more efficient approach?

v-yulgu-msft · ‎08-03-2017

Hi @codemonkey,

If you only want to load data from one or some specific tables from web page, you can select those tables in Navigator pane first when you connect to data source via Web connector.

Regards,
Yuliana Gu

Community Support Team _ Yuliana Gu
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

codemonkey · ‎08-03-2017

Sorry, I should have been clearer. When I said a number of pages, I meant hundreds, each one called up using a different date parameter in the URL, so I don't want to do this manually through the interface. I want to do it via code so that when I update the main date start/end parameter, it will excecute against all pages in that date range, regardless of layout.

As mentioned, the existing code works, i.e. it extracts the correct data regardless of whether the target table number changes from one day to the next. The question is: can I make this solution more efficient? More specifically, is there any way to avoid expanding all tables on a page to identify the desired target table?

Wayne

Making Retrieval of Web Data More Robust

Helpful resources

Power BI Monthly Update - July 2025

Join our Fabric User Panel

Fabric Community Update - June 2025

How to Get Your Question Answered Quickly

Party with Power BI’s own Guy in a Cube

Making Retrieval of Web Data More Robust

Helpful resources

Power BI Monthly Update - July 2025

Join our Fabric User Panel

Fabric Community Update - June 2025

How to Get Your Question Answered Quickly