Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Reply
codemonkey
Frequent Visitor

Making Retrieval of Web Data More Robust

Hi All,

 

I retrieve data from a number of web pages as per the standard WebPage(Web.Contents...mechanism.

 

I refresh this data every week, but sometimes the structure in the source pages changes. For example, one week, a source page might have 3 tables on it, where I am interested in the 3rd table. But the next week, due to changes in advertising sections, there might be 6 tables, meaning I am then interested in the 5th table.

 

I know the characteristics of the table I am interested in. I know, for example, column names that are specific to that table only. 

 

In another language, I would therefore loop through all the tables and select the table of interest based on the presence of one of its unique column names.

 

In M, I've achieve something similar as follows:

 

 

let
    Source = Web.Page(Web.Contents("http://www.website.htm")),
    Tables = Table.SelectRows(Source, each [Source] = "Table"),
    ExpandedTable = Table.ExpandTableColumn(Tables,"Data",{"MyTargetColumn1","MyTargetColumn2","MyTargetColumn3","MyTargetColumn4"}),
    FilteredTable = Table.RemoveMatchingRows(ExpandedTable, {[MyTargetColumn1 = null]}, "MyTargetColumn1")
in
    FilteredTable

This works perfectly in terms of end result, but I'm wondering if there is a more efficient way to do this. For example, one of the drawbacks of the above method is that I'm expanding the contents of all the tables before removing rows that have a null value in my key target column. The tables are small, so it's not that big a performance hit. But ideally, I'd want to identify the target table and only expand it before applying filters.

 

Any suggestions on how I can achieve this more efficient approach?

 

2 REPLIES 2
v-yulgu-msft
Employee
Employee

Hi @codemonkey,

 

If you only want to load data from one or some specific tables from web page, you can select those tables in Navigator pane first when you connect to data source via Web connector.

5.PNG

 

Regards,
Yuliana Gu

Community Support Team _ Yuliana Gu
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Sorry, I should have been clearer. When I said a number of pages, I meant hundreds, each one called up using a different date parameter in the URL, so I don't want to do this manually through the interface. I want to do it via code so that when I update the main date start/end parameter, it will excecute against all pages in that date range, regardless of layout.

 

As mentioned, the existing code works, i.e. it extracts the correct data regardless of whether the target table number changes from one day to the next. The question is: can I make this solution more efficient? More specifically, is there any way to avoid expanding all tables on a page to identify the desired target table?

 

Wayne

Helpful resources

Announcements
LearnSurvey

Fabric certifications survey

Certification feedback opportunity for the community.

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.