Check your eligibility for this 50% exam voucher offer and join us for free live learning sessions to get prepared for Exam DP-700.
Get StartedDon't miss out! 2025 Microsoft Fabric Community Conference, March 31 - April 2, Las Vegas, Nevada. Use code MSCUST for a $150 discount. Prices go up February 11th. Register now.
Dear Community,
I want to integrate a web scraping script written in Python with Microsoft Fabric, so it automatically ingests the extracted data to my data warehouse.
I tried to run my script after installing Selenium and other library dependencies. However, the script when scrolling or clicking buttons to load dynamic HTML elements.
I was wondering if there is any workaround for this problem so I can keep the hosting in Fabric.
If not, which Microsoft instance do you recommend me to use? So the script can execute browser actions successfully and the data can still be ingested seamlessly by Fabric's warehouse (I was thinking Azure DevOps Pipeline but not sure if that would be the optimal option).
Thanks in advance!
Hi imbusto,
We wanted to check in regarding your query, as we have not heard back from you. If you have resolved the issue, sharing the solution with the community would be greatly appreciated and could help others encountering similar challenges.
If you found our response useful, kindly mark it as the accepted solution and provide kudos to guide other members.
Thank you.
Hi imbusto,
Thank you @nilendraFabric for the response.
We would like to inquire if the solution offered by @nilendraFabric has resolved your issue. If you have discovered an alternative approach, we encourage you to share it with the community to assist others facing similar challenges.
Should you find the response helpful, please mark it as the accepted solution and add kudos. This recognition benefits other members seeking solutions to related queries.
Thank you.
Hello @imbusto
Running Selenium in Fabric notebooks has limitations because the environment lacks a browser for WebDriver to interact with directly. While headless mode or remote Selenium services (e.g., LambdaTest) might be workarounds, they require additional configuration and may not fully resolve all issues
Instead of running Selenium directly in Fabric, you can use a remote Selenium service like LambdaTest or BrowserStack. These services allow you to execute browser automation tasks remotely and retrieve the results back into your Fabric environment.
even consider running Selenium scripts on an external environment (e.g., Azure Virtual Machine or local machine) and then transferring the scraped data to Microsoft Fabric via APIs
You explore Microsoft Power Automate for simpler web scraping tasks that don't require extensive coding
After scraping, the extracted data can be sent to Fabric's data warehouse via APIs or Fabric's Data Factory allows you to create pipelines for ingesting data dynamically from various sources, including APIs. This could involve storing raw JSON responses from APIs in a lakehouse before transforming them into structured tables in the warehouse.
https://community.fabric.microsoft.com/t5/Data-Engineering/Installing-webdriver-for-selenium/m-p/429...
Hope this is helpful.
Thanks
User | Count |
---|---|
33 | |
14 | |
6 | |
3 | |
2 |
User | Count |
---|---|
39 | |
22 | |
11 | |
7 | |
6 |