Solved: Re: Scraping Data Invoke Function Error

Anonymous · ‎02-18-2020

Hi,

I am trying to web scrape information from some URLs and I am doing so by setting it up from one URL and then invoking a function on the res tof the URLs. The problem is that some of the URLs I get are nonexistent, and I keep getting errors that do not let me expand the table. If the URL does not exist anymore I need to be able to point it out to take action on it.

Below is a screenshot of where I am stuck and what happens if I do decide to continue with the errors. Any help on how to proceed or how to set it up to ignore the error would be highly appreciated. Thank you!

DataChant · ‎02-18-2020

Hi @Anonymous,

There are several ways you can approach your requirements. First of all, you can extract the error codes following Chris Webb's technique here.

Another approach is to split between the successful and erroneous pages. You can start with your current query with the screenshot you showed as a base query. From here you can create two reference queries (By right-clicking on the query and select Reference). Rename the first reference as Results and the second reference as Errors. Now, in the Results query, you can remove all errors by selecting Home-->Remove Rows--> Remove Errors. In the Errors query, you can keep all the errors by selecting Home-->Keep Rows--> Keep Errors

From here, you can keep transforming the Errors query to highlight the errors.

Note that the main caveat of this approach is that the web scraping will be conducted twice due to the two reference queries. There are other more complex ways to achieve your solution. I may be able to send you an example if you can share the sample PBIX file.

View solution in original post

DataChant · ‎02-18-2020

Hi @Anonymous,

There are several ways you can approach your requirements. First of all, you can extract the error codes following Chris Webb's technique here.

Another approach is to split between the successful and erroneous pages. You can start with your current query with the screenshot you showed as a base query. From here you can create two reference queries (By right-clicking on the query and select Reference). Rename the first reference as Results and the second reference as Errors. Now, in the Results query, you can remove all errors by selecting Home-->Remove Rows--> Remove Errors. In the Errors query, you can keep all the errors by selecting Home-->Keep Rows--> Keep Errors

From here, you can keep transforming the Errors query to highlight the errors.

Note that the main caveat of this approach is that the web scraping will be conducted twice due to the two reference queries. There are other more complex ways to achieve your solution. I may be able to send you an example if you can share the sample PBIX file.