Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Enhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.

Reply
codemonkey
Frequent Visitor

Making Retrieval of Web Data More Robust

Hi All,

 

I retrieve data from a number of web pages as per the standard WebPage(Web.Contents...mechanism.

 

I refresh this data every week, but sometimes the structure in the source pages changes. For example, one week, a source page might have 3 tables on it, where I am interested in the 3rd table. But the next week, due to changes in advertising sections, there might be 6 tables, meaning I am then interested in the 5th table.

 

I know the characteristics of the table I am interested in. I know, for example, column names that are specific to that table only. 

 

In another language, I would therefore loop through all the tables and select the table of interest based on the presence of one of its unique column names.

 

In M, I've achieve something similar as follows:

 

 

let
    Source = Web.Page(Web.Contents("http://www.website.htm")),
    Tables = Table.SelectRows(Source, each [Source] = "Table"),
    ExpandedTable = Table.ExpandTableColumn(Tables,"Data",{"MyTargetColumn1","MyTargetColumn2","MyTargetColumn3","MyTargetColumn4"}),
    FilteredTable = Table.RemoveMatchingRows(ExpandedTable, {[MyTargetColumn1 = null]}, "MyTargetColumn1")
in
    FilteredTable

This works perfectly in terms of end result, but I'm wondering if there is a more efficient way to do this. For example, one of the drawbacks of the above method is that I'm expanding the contents of all the tables before removing rows that have a null value in my key target column. The tables are small, so it's not that big a performance hit. But ideally, I'd want to identify the target table and only expand it before applying filters.

 

Any suggestions on how I can achieve this more efficient approach?

 

2 REPLIES 2
v-yulgu-msft
Microsoft Employee
Microsoft Employee

Hi @codemonkey,

 

If you only want to load data from one or some specific tables from web page, you can select those tables in Navigator pane first when you connect to data source via Web connector.

5.PNG

 

Regards,
Yuliana Gu

Community Support Team _ Yuliana Gu
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Sorry, I should have been clearer. When I said a number of pages, I meant hundreds, each one called up using a different date parameter in the URL, so I don't want to do this manually through the interface. I want to do it via code so that when I update the main date start/end parameter, it will excecute against all pages in that date range, regardless of layout.

 

As mentioned, the existing code works, i.e. it extracts the correct data regardless of whether the target table number changes from one day to the next. The question is: can I make this solution more efficient? More specifically, is there any way to avoid expanding all tables on a page to identify the desired target table?

 

Wayne

Helpful resources

Announcements
July PBI25 Carousel

Power BI Monthly Update - July 2025

Check out the July 2025 Power BI update to learn about new features.

Join our Fabric User Panel

Join our Fabric User Panel

This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.