Solved: Re: Web Scraping Integration with Fabric

imbusto · ‎01-27-2025

Dear Community,

I want to integrate a web scraping script written in Python with Microsoft Fabric, so it automatically ingests the extracted data to my data warehouse.

I tried to run my script after installing Selenium and other library dependencies. However, the script when scrolling or clicking buttons to load dynamic HTML elements.

I was wondering if there is any workaround for this problem so I can keep the hosting in Fabric.

If not, which Microsoft instance do you recommend me to use? So the script can execute browser actions successfully and the data can still be ingested seamlessly by Fabric's warehouse (I was thinking Azure DevOps Pipeline but not sure if that would be the optimal option).

Thanks in advance!

nilendraFabric · ‎01-27-2025

Hello @imbusto

Running Selenium in Fabric notebooks has limitations because the environment lacks a browser for WebDriver to interact with directly. While headless mode or remote Selenium services (e.g., LambdaTest) might be workarounds, they require additional configuration and may not fully resolve all issues

Instead of running Selenium directly in Fabric, you can use a remote Selenium service like LambdaTest or BrowserStack. These services allow you to execute browser automation tasks remotely and retrieve the results back into your Fabric environment.

even consider running Selenium scripts on an external environment (e.g., Azure Virtual Machine or local machine) and then transferring the scraped data to Microsoft Fabric via APIs

You explore Microsoft Power Automate for simpler web scraping tasks that don't require extensive coding

After scraping, the extracted data can be sent to Fabric's data warehouse via APIs or Fabric's Data Factory allows you to create pipelines for ingesting data dynamically from various sources, including APIs. This could involve storing raw JSON responses from APIs in a lakehouse before transforming them into structured tables in the warehouse.

https://community.fabric.microsoft.com/t5/Data-Engineering/Installing-webdriver-for-selenium/m-p/429...

Hope this is helpful.

Thanks

View solution in original post

v-pnaroju-msft · ‎02-05-2025

Hi imbusto,

We are following up to see if your query has been resolved. Should you have identified a solution, we kindly request you to share it with the community to assist others facing similar issues.

If our response was helpful, please mark it as the accepted solution and provide kudos, as this helps the broader community.

Thank you.

v-pnaroju-msft · ‎02-02-2025

Hi imbusto,

We wanted to check in regarding your query, as we have not heard back from you. If you have resolved the issue, sharing the solution with the community would be greatly appreciated and could help others encountering similar challenges.

If you found our response useful, kindly mark it as the accepted solution and provide kudos to guide other members.

Thank you.

v-pnaroju-msft · ‎01-30-2025

Hi imbusto,

Thank you @nilendraFabric for the response.

We would like to inquire if the solution offered by @nilendraFabric has resolved your issue. If you have discovered an alternative approach, we encourage you to share it with the community to assist others facing similar challenges.

Should you find the response helpful, please mark it as the accepted solution and add kudos. This recognition benefits other members seeking solutions to related queries.

Thank you.

nilendraFabric · ‎01-27-2025

Hello @imbusto

Running Selenium in Fabric notebooks has limitations because the environment lacks a browser for WebDriver to interact with directly. While headless mode or remote Selenium services (e.g., LambdaTest) might be workarounds, they require additional configuration and may not fully resolve all issues

Instead of running Selenium directly in Fabric, you can use a remote Selenium service like LambdaTest or BrowserStack. These services allow you to execute browser automation tasks remotely and retrieve the results back into your Fabric environment.

even consider running Selenium scripts on an external environment (e.g., Azure Virtual Machine or local machine) and then transferring the scraped data to Microsoft Fabric via APIs

You explore Microsoft Power Automate for simpler web scraping tasks that don't require extensive coding

After scraping, the extracted data can be sent to Fabric's data warehouse via APIs or Fabric's Data Factory allows you to create pipelines for ingesting data dynamically from various sources, including APIs. This could involve storing raw JSON responses from APIs in a lakehouse before transforming them into structured tables in the warehouse.

https://community.fabric.microsoft.com/t5/Data-Engineering/Installing-webdriver-for-selenium/m-p/429...

Hope this is helpful.

Thanks