<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to run a pyspark code directly from a Github Repo? in Data Engineering</title>
    <link>https://community.fabric.microsoft.com/t5/Data-Engineering/How-to-run-a-pyspark-code-directly-from-a-Github-Repo/m-p/4776271#M11249</link>
    <description>&lt;P&gt;Hello &lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/1313967"&gt;@lchinelli&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;We hope you're doing well. Could you please confirm whether your issue has been resolved or if you're still facing challenges? Your update will be valuable to the community and may assist others with similar concerns.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;Thank you.&lt;/P&gt;
&lt;P&gt;&lt;LI-WRAPPER&gt;&lt;/LI-WRAPPER&gt;&lt;/P&gt;</description>
    <pubDate>Fri, 25 Jul 2025 10:14:59 GMT</pubDate>
    <dc:creator>v-ssriganesh</dc:creator>
    <dc:date>2025-07-25T10:14:59Z</dc:date>
    <item>
      <title>How to run a pyspark code directly from a Github Repo?</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/How-to-run-a-pyspark-code-directly-from-a-Github-Repo/m-p/4768448#M11040</link>
      <description>&lt;P&gt;I'd like to create a data pipeline and run a pyspark code directly from a Github repo, is that possible?&lt;/P&gt;</description>
      <pubDate>Fri, 18 Jul 2025 12:15:54 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/How-to-run-a-pyspark-code-directly-from-a-Github-Repo/m-p/4768448#M11040</guid>
      <dc:creator>lchinelli</dc:creator>
      <dc:date>2025-07-18T12:15:54Z</dc:date>
    </item>
    <item>
      <title>Re: How to run a pyspark code directly from a Github Repo?</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/How-to-run-a-pyspark-code-directly-from-a-Github-Repo/m-p/4768768#M11055</link>
      <description>&lt;P&gt;Do you mean run a notebook from a GitHub repo using a GitHub workflow? If so then absolutely.&lt;BR /&gt;&lt;BR /&gt;I did a post that shows how you can do it with Azure DevOps, you can port the logic over:&lt;BR /&gt;&lt;A href="https://www.kevinrchant.com/2025/01/31/authenticate-as-a-service-principal-to-run-a-microsoft-fabric-notebook-from-azure-devops/" target="_blank"&gt;https://www.kevinrchant.com/2025/01/31/authenticate-as-a-service-principal-to-run-a-microsoft-fabric-notebook-from-azure-devops/&lt;/A&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 18 Jul 2025 15:56:01 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/How-to-run-a-pyspark-code-directly-from-a-Github-Repo/m-p/4768768#M11055</guid>
      <dc:creator>KevinChant</dc:creator>
      <dc:date>2025-07-18T15:56:01Z</dc:date>
    </item>
    <item>
      <title>Re: How to run a pyspark code directly from a Github Repo?</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/How-to-run-a-pyspark-code-directly-from-a-Github-Repo/m-p/4768988#M11064</link>
      <description>&lt;P&gt;As far as I know, You can not run programmimg code such as Pyspark from Github repo. It is for CI/CD ( Github Repo) . By the way, why you have to do this.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Rather than I should use Dataflow Gen 2 or Python Notebooks.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 18 Jul 2025 23:06:39 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/How-to-run-a-pyspark-code-directly-from-a-Github-Repo/m-p/4768988#M11064</guid>
      <dc:creator>BhaveshPatel</dc:creator>
      <dc:date>2025-07-18T23:06:39Z</dc:date>
    </item>
    <item>
      <title>Re: How to run a pyspark code directly from a Github Repo?</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/How-to-run-a-pyspark-code-directly-from-a-Github-Repo/m-p/4770143#M11093</link>
      <description>&lt;P&gt;Hello &lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/1313967"&gt;@lchinelli&lt;/a&gt;,&lt;BR /&gt;Thank you for reaching out to the Microsoft Fabric Forum Community.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;P&gt;I’ve reproduced your scenario in Microsoft Fabric and achieved the desired outcome. You can run PySpark code directly from a GitHub repo by using a Fabric Notebook that dynamically fetches the script using a requests.get() call and exec() to run it. This notebook can then be triggered inside a Data Factory pipeline using a Notebook activity.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;P&gt;How It Works:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Your .py file is stored in GitHub (public or private).&lt;/LI&gt;
&lt;LI&gt;The Fabric notebook reads and executes that code using the raw GitHub URL.&lt;/LI&gt;
&lt;LI&gt;A pipeline triggers the notebook and runs the code.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;BR /&gt;Example GitHub Code Used:&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;data = [

    ("Microsoft Fabric", 2025),

    ("Power BI", 2024),

    ("Synapse", 2023)

]

columns = ["Product", "Year"]

df = spark.createDataFrame(data, columns)

df.show()&lt;/LI-CODE&gt;
&lt;P&gt;&lt;BR /&gt;Here’s a successful pipeline run in Microsoft Fabric using a notebook that fetches a PySpark script from GitHub:&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="vssriganesh_0-1753082115969.png" style="width: 400px;"&gt;&lt;img src="https://community.fabric.microsoft.com/t5/image/serverpage/image-id/1284740i1981D53A7F32109C/image-size/medium?v=v2&amp;amp;px=400" role="button" title="vssriganesh_0-1753082115969.png" alt="vssriganesh_0-1753082115969.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If this information is helpful, please &lt;STRONG&gt;“Accept as solution”&lt;/STRONG&gt; and give a &lt;STRONG&gt;"kudos"&lt;/STRONG&gt; to assist other community members in resolving similar issues more efficiently.&lt;BR /&gt;Thank you.&lt;/P&gt;</description>
      <pubDate>Mon, 21 Jul 2025 07:19:13 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/How-to-run-a-pyspark-code-directly-from-a-Github-Repo/m-p/4770143#M11093</guid>
      <dc:creator>v-ssriganesh</dc:creator>
      <dc:date>2025-07-21T07:19:13Z</dc:date>
    </item>
    <item>
      <title>Re: How to run a pyspark code directly from a Github Repo?</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/How-to-run-a-pyspark-code-directly-from-a-Github-Repo/m-p/4770621#M11106</link>
      <description>&lt;P&gt;Is it possible to run code from another folders importing into a main.py file or in a main.ipynb? I said that because my code is OOP&lt;/P&gt;</description>
      <pubDate>Mon, 21 Jul 2025 13:58:14 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/How-to-run-a-pyspark-code-directly-from-a-Github-Repo/m-p/4770621#M11106</guid>
      <dc:creator>lchinelli</dc:creator>
      <dc:date>2025-07-21T13:58:14Z</dc:date>
    </item>
    <item>
      <title>Re: How to run a pyspark code directly from a Github Repo?</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/How-to-run-a-pyspark-code-directly-from-a-Github-Repo/m-p/4770624#M11107</link>
      <description>&lt;P&gt;To better version control and to import modules from another folders&lt;/P&gt;</description>
      <pubDate>Mon, 21 Jul 2025 11:48:51 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/How-to-run-a-pyspark-code-directly-from-a-Github-Repo/m-p/4770624#M11107</guid>
      <dc:creator>lchinelli</dc:creator>
      <dc:date>2025-07-21T11:48:51Z</dc:date>
    </item>
    <item>
      <title>Re: How to run a pyspark code directly from a Github Repo?</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/How-to-run-a-pyspark-code-directly-from-a-Github-Repo/m-p/4771800#M11141</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/1313967"&gt;@lchinelli&lt;/a&gt;,&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;P&gt;yes, it is possible to run modular, object-oriented PySpark code across multiple files/folders (just like in OOP projects), even within Microsoft Fabric Notebooks or from a main.py.&lt;BR /&gt;&lt;BR /&gt;Thank you.&lt;/P&gt;</description>
      <pubDate>Tue, 22 Jul 2025 07:13:20 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/How-to-run-a-pyspark-code-directly-from-a-Github-Repo/m-p/4771800#M11141</guid>
      <dc:creator>v-ssriganesh</dc:creator>
      <dc:date>2025-07-22T07:13:20Z</dc:date>
    </item>
    <item>
      <title>Re: How to run a pyspark code directly from a Github Repo?</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/How-to-run-a-pyspark-code-directly-from-a-Github-Repo/m-p/4776271#M11249</link>
      <description>&lt;P&gt;Hello &lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/1313967"&gt;@lchinelli&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;We hope you're doing well. Could you please confirm whether your issue has been resolved or if you're still facing challenges? Your update will be valuable to the community and may assist others with similar concerns.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;Thank you.&lt;/P&gt;
&lt;P&gt;&lt;LI-WRAPPER&gt;&lt;/LI-WRAPPER&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 25 Jul 2025 10:14:59 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/How-to-run-a-pyspark-code-directly-from-a-Github-Repo/m-p/4776271#M11249</guid>
      <dc:creator>v-ssriganesh</dc:creator>
      <dc:date>2025-07-25T10:14:59Z</dc:date>
    </item>
    <item>
      <title>Re: How to run a pyspark code directly from a Github Repo?</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/How-to-run-a-pyspark-code-directly-from-a-Github-Repo/m-p/4778873#M11315</link>
      <description>&lt;P&gt;Hello &lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/1313967"&gt;@lchinelli&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Hope everything’s going great on your end. Just checking in has the issue been resolved or are you still running into problems? Sharing an update can really help others facing the same thing.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Thank you.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;LI-WRAPPER&gt;&lt;/LI-WRAPPER&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 28 Jul 2025 15:24:24 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/How-to-run-a-pyspark-code-directly-from-a-Github-Repo/m-p/4778873#M11315</guid>
      <dc:creator>v-ssriganesh</dc:creator>
      <dc:date>2025-07-28T15:24:24Z</dc:date>
    </item>
    <item>
      <title>Re: How to run a pyspark code directly from a Github Repo?</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/How-to-run-a-pyspark-code-directly-from-a-Github-Repo/m-p/4779013#M11319</link>
      <description>&lt;P&gt;How?&lt;/P&gt;</description>
      <pubDate>Mon, 28 Jul 2025 17:31:22 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/How-to-run-a-pyspark-code-directly-from-a-Github-Repo/m-p/4779013#M11319</guid>
      <dc:creator>lchinelli</dc:creator>
      <dc:date>2025-07-28T17:31:22Z</dc:date>
    </item>
    <item>
      <title>Re: How to run a pyspark code directly from a Github Repo?</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/How-to-run-a-pyspark-code-directly-from-a-Github-Repo/m-p/4779597#M11336</link>
      <description>&lt;P&gt;Hi &lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/1313967"&gt;@lchinelli&lt;/a&gt;,&lt;BR /&gt;Thank you for the follow-up.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;P&gt;I have practically reproduced your scenario using modular, object-oriented PySpark code structured across multiple folders like models/ and utils/.&lt;BR /&gt;&lt;BR /&gt;where modular, object-oriented PySpark code is executed across multiple folders (like models/ and utils/) using a main driver script.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;P&gt;What I Did:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Created a GitHub repository with the following folder structure:&lt;/LI&gt;
&lt;/UL&gt;
&lt;LI-CODE lang="markup"&gt;/main.py

/models/

    └── cleaner.py

    └── validator.py

/utils/

    └── formatter.py&lt;/LI-CODE&gt;
&lt;UL&gt;
&lt;LI&gt;Each helper module (cleaner.py, validator.py, formatter.py) contains a class or function for a part of the logic (e.g., data cleaning, validation, formatting).&lt;/LI&gt;
&lt;LI&gt;In the Notebook, I used requests.get().text to dynamically fetch each .py file from GitHub raw URLs, then ran each with exec() to load the functions/classes into memory.&lt;/LI&gt;
&lt;LI&gt;Then, I fetched and ran main.py, which uses the imported modules to:&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL&gt;
&lt;LI&gt;Create a sample Spark DataFrame&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL&gt;
&lt;UL&gt;
&lt;LI&gt;Apply cleaning and validation&lt;/LI&gt;
&lt;LI&gt;Format and return the final result&lt;/LI&gt;
&lt;/UL&gt;
&lt;LI&gt;The final output was printed using df.show() inside main.py.&lt;BR /&gt;&lt;BR /&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Attached the screenshot below showing the successful execution and expected output:&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="vssriganesh_0-1753778775336.png" style="width: 400px;"&gt;&lt;img src="https://community.fabric.microsoft.com/t5/image/serverpage/image-id/1286861i0803363D40E80D21/image-size/medium?v=v2&amp;amp;px=400" role="button" title="vssriganesh_0-1753778775336.png" alt="vssriganesh_0-1753778775336.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;If you have any further questions, please don't hesitate to contact us through the community. We are happy to assist you.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;P&gt;Best Regards,&lt;BR /&gt;Ganesh singamshetty.&lt;/P&gt;</description>
      <pubDate>Tue, 29 Jul 2025 08:47:52 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/How-to-run-a-pyspark-code-directly-from-a-Github-Repo/m-p/4779597#M11336</guid>
      <dc:creator>v-ssriganesh</dc:creator>
      <dc:date>2025-07-29T08:47:52Z</dc:date>
    </item>
    <item>
      <title>Re: How to run a pyspark code directly from a Github Repo?</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/How-to-run-a-pyspark-code-directly-from-a-Github-Repo/m-p/4784250#M11411</link>
      <description>&lt;P&gt;&lt;SPAN&gt;Hello &lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/1313967"&gt;@lchinelli&lt;/a&gt;,&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Could you please confirm if your query has been resolved by the provided solutions? This would be helpful for other members who may encounter similar issues.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Thank you for being part of the Microsoft Fabric Community.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;LI-WRAPPER&gt;&lt;/LI-WRAPPER&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 01 Aug 2025 10:29:41 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/How-to-run-a-pyspark-code-directly-from-a-Github-Repo/m-p/4784250#M11411</guid>
      <dc:creator>v-ssriganesh</dc:creator>
      <dc:date>2025-08-01T10:29:41Z</dc:date>
    </item>
  </channel>
</rss>

