Power BI is turning 10! Tune in for a special live episode on July 24 with behind-the-scenes stories, product evolution highlights, and a sneak peek at what’s in store for the future.
Save the dateEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
I'd like to create a data pipeline and run a pyspark code directly from a Github repo, is that possible?
Hello @lchinelli,
We hope you're doing well. Could you please confirm whether your issue has been resolved or if you're still facing challenges? Your update will be valuable to the community and may assist others with similar concerns.
Thank you.
Hello @lchinelli,
Thank you for reaching out to the Microsoft Fabric Forum Community.
I’ve reproduced your scenario in Microsoft Fabric and achieved the desired outcome. You can run PySpark code directly from a GitHub repo by using a Fabric Notebook that dynamically fetches the script using a requests.get() call and exec() to run it. This notebook can then be triggered inside a Data Factory pipeline using a Notebook activity.
How It Works:
Example GitHub Code Used:
data = [
("Microsoft Fabric", 2025),
("Power BI", 2024),
("Synapse", 2023)
]
columns = ["Product", "Year"]
df = spark.createDataFrame(data, columns)
df.show()
Here’s a successful pipeline run in Microsoft Fabric using a notebook that fetches a PySpark script from GitHub:
If this information is helpful, please “Accept as solution” and give a "kudos" to assist other community members in resolving similar issues more efficiently.
Thank you.
Is it possible to run code from another folders importing into a main.py file or in a main.ipynb? I said that because my code is OOP
Hello @lchinelli,
yes, it is possible to run modular, object-oriented PySpark code across multiple files/folders (just like in OOP projects), even within Microsoft Fabric Notebooks or from a main.py.
Thank you.
As far as I know, You can not run programmimg code such as Pyspark from Github repo. It is for CI/CD ( Github Repo) . By the way, why you have to do this.
Rather than I should use Dataflow Gen 2 or Python Notebooks.
To better version control and to import modules from another folders
Do you mean run a notebook from a GitHub repo using a GitHub workflow? If so then absolutely.
I did a post that shows how you can do it with Azure DevOps, you can port the logic over:
https://www.kevinrchant.com/2025/01/31/authenticate-as-a-service-principal-to-run-a-microsoft-fabric...
User | Count |
---|---|
6 | |
2 | |
2 | |
2 | |
2 |
User | Count |
---|---|
18 | |
17 | |
5 | |
4 | |
4 |