Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.
Hi All,
I retrieve data from a number of web pages as per the standard WebPage(Web.Contents...mechanism.
I refresh this data every week, but sometimes the structure in the source pages changes. For example, one week, a source page might have 3 tables on it, where I am interested in the 3rd table. But the next week, due to changes in advertising sections, there might be 6 tables, meaning I am then interested in the 5th table.
I know the characteristics of the table I am interested in. I know, for example, column names that are specific to that table only.
In another language, I would therefore loop through all the tables and select the table of interest based on the presence of one of its unique column names.
In M, I've achieve something similar as follows:
let Source = Web.Page(Web.Contents("http://www.website.htm")), Tables = Table.SelectRows(Source, each [Source] = "Table"), ExpandedTable = Table.ExpandTableColumn(Tables,"Data",{"MyTargetColumn1","MyTargetColumn2","MyTargetColumn3","MyTargetColumn4"}), FilteredTable = Table.RemoveMatchingRows(ExpandedTable, {[MyTargetColumn1 = null]}, "MyTargetColumn1") in FilteredTable
This works perfectly in terms of end result, but I'm wondering if there is a more efficient way to do this. For example, one of the drawbacks of the above method is that I'm expanding the contents of all the tables before removing rows that have a null value in my key target column. The tables are small, so it's not that big a performance hit. But ideally, I'd want to identify the target table and only expand it before applying filters.
Any suggestions on how I can achieve this more efficient approach?
Hi @codemonkey,
If you only want to load data from one or some specific tables from web page, you can select those tables in Navigator pane first when you connect to data source via Web connector.
Regards,
Yuliana Gu
Sorry, I should have been clearer. When I said a number of pages, I meant hundreds, each one called up using a different date parameter in the URL, so I don't want to do this manually through the interface. I want to do it via code so that when I update the main date start/end parameter, it will excecute against all pages in that date range, regardless of layout.
As mentioned, the existing code works, i.e. it extracts the correct data regardless of whether the target table number changes from one day to the next. The question is: can I make this solution more efficient? More specifically, is there any way to avoid expanding all tables on a page to identify the desired target table?
Wayne
User | Count |
---|---|
98 | |
89 | |
82 | |
70 | |
67 |
User | Count |
---|---|
114 | |
104 | |
101 | |
72 | |
65 |