<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: I am looking for a way of writing unit tests for pyspark transfomations. in Data Engineering</title>
    <link>https://community.fabric.microsoft.com/t5/Data-Engineering/I-am-looking-for-a-way-of-writing-unit-tests-for-pyspark/m-p/4851675#M12932</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;A href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/1362553" target="_blank"&gt;@ex_kjetilh&lt;/A&gt;,&lt;/P&gt;
&lt;P&gt;We would like to confirm if our community members answer resolves your query or if you need further help. If you still have any questions or need more support, please feel free to let us know.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thank you for your patience and look forward to hearing from you.&lt;BR /&gt;Best Regards,&lt;BR /&gt;Prashanth Are&lt;BR /&gt;MS Fabric community support&lt;/P&gt;</description>
    <pubDate>Thu, 16 Oct 2025 10:16:14 GMT</pubDate>
    <dc:creator>v-prasare</dc:creator>
    <dc:date>2025-10-16T10:16:14Z</dc:date>
    <item>
      <title>I am looking for a way of writing unit tests for pyspark transfomations.</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/I-am-looking-for-a-way-of-writing-unit-tests-for-pyspark/m-p/4839013#M12622</link>
      <description>&lt;P&gt;I want to write tests for functions in fabric notebooks of the type: given this dataframe (read from file or code) do the transformation and check if resulting dataframe is like so and so.&lt;BR /&gt;&lt;BR /&gt;How do I do this in fabric?&lt;BR /&gt;&lt;BR /&gt;I want the test runs to be called from CI/CD.&lt;BR /&gt;&lt;BR /&gt;However, I can find very little written about this. Maybe I am just not looking in the right places.&lt;/P&gt;</description>
      <pubDate>Tue, 30 Sep 2025 12:16:26 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/I-am-looking-for-a-way-of-writing-unit-tests-for-pyspark/m-p/4839013#M12622</guid>
      <dc:creator>ex_kjetilh</dc:creator>
      <dc:date>2025-09-30T12:16:26Z</dc:date>
    </item>
    <item>
      <title>Re: I am looking for a way of writing unit tests for pyspark transfomations.</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/I-am-looking-for-a-way-of-writing-unit-tests-for-pyspark/m-p/4839242#M12628</link>
      <description>&lt;P&gt;Hi &lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/1362553"&gt;@ex_kjetilh&lt;/a&gt;,&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Below is a practical way to unit-test PySpark transformations from Fabric notebooks and run them in CI/CD. It boils down to: put your transform logic in plain Python modules, test them with pytest + a local Spark session (or Spark’s own testing helpers), and optionally add Fabric-side integration tests for end-to-end coverage.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;&lt;STRONG&gt;Refactor your notebook code into testable functions.&lt;/STRONG&gt;&lt;BR /&gt;Put all transformation logic in /src/your_pkg/transforms.py (imported by your notebook), so tests don’t depend on a notebook runtime. See Databricks’ pattern (same idea) for testing notebook code by moving logic into modules: &lt;A href="https://learn.microsoft.com/en-us/azure/databricks/notebooks/testing" target="_blank" rel="noopener"&gt;Unit testing for notebooks.&lt;/A&gt;&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Write pytest unit tests that spin up a local Spark.&lt;BR /&gt;&lt;/STRONG&gt;Create /tests/conftest.py and /tests/test_transforms.py. Use Spark’s built-in test helpers (Spark 4.x) like pyspark.testing.assertDataFrameEqual.&lt;BR /&gt;&lt;BR /&gt;&lt;/LI&gt;&lt;/OL&gt;&lt;LI-CODE lang="python"&gt;# tests/conftest.py
import pytest
from pyspark.sql import SparkSession

@pytest.fixture(scope="session")
def spark():
    return (SparkSession.builder
            .master("local[*]")
            .appName("unit-tests")
            .getOrCreate())
​&lt;/LI-CODE&gt;&lt;LI-CODE lang="python"&gt;# tests/test_transforms.py
from pyspark.sql import Row
from your_pkg.transforms import clean_and_join
from pyspark.testing import assertDataFrameEqual  # Spark ≥ 4.x

def test_clean_and_join(spark):
    left = spark.createDataFrame([Row(id=1, v="a "), Row(id=2, v="b")])
    right = spark.createDataFrame([Row(id=1, w=10), Row(id=2, w=20)])
    actual = clean_and_join(left, right)  # your transform
    expected = spark.createDataFrame([Row(id=1, v="a", w=10), Row(id=2, v="b", w=20)])
    assertDataFrameEqual(actual, expected, checkRowOrder=False)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;STRONG&gt;&lt;STRONG&gt;Run tests in CI/CD (GitHub Actions or Azure DevOps).&lt;BR /&gt;Pin your local pyspark to the Fabric runtime version to avoid surprises (check your Fabric Spark runtime, then set the same pyspark==x.y.* in tests). Example GitHub Actions job:&lt;/STRONG&gt;&lt;/STRONG&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;LI-CODE lang="python"&gt;name: unit-tests
on: [push, pull_request]
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with: { python-version: '3.10' }
      - name: Install Java for Spark
        uses: actions/setup-java@v4
        with: { distribution: 'temurin', java-version: '11' }
      - name: Install deps
        run: |
          pip install "pyspark==&amp;lt;match_Fabric_runtime&amp;gt;" pytest chispa
      - name: Run pytest
        run: pytest -q --maxfail=1 --disable-warnings
​Fabric CI/CD background: Deployment pipelines overview, &lt;/LI-CODE&gt;&lt;P&gt;&lt;A href="https://learn.microsoft.com/en-us/fabric/cicd/git-integration/intro-to-git-integration" target="_blank" rel="noopener"&gt;Git integration overview. Good write-ups with examples: &lt;/A&gt;&lt;A href="https://www.kevinrchant.com/2024/08/30/unit-tests-on-microsoft-fabric-items/" target="_blank" rel="noopener"&gt;Unit tests on Microsoft Fabric items (pytest), &lt;/A&gt;&lt;A href="https://blog.fabric.microsoft.com/en-US/blog/optimizing-for-ci-cd-in-microsoft-fabric/" target="_blank" rel="noopener"&gt;Optimizing for CI/CD in Microsoft Fabric.&lt;/A&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;STRONG&gt;&lt;STRONG&gt;Add Fabric integration tests.&lt;BR /&gt;&lt;/STRONG&gt;&lt;/STRONG&gt;Keep unit tests fast/local. For end-to-end checks inside Fabric (e.g., against a Lakehouse table), you can:&lt;UL&gt;&lt;LI&gt;Trigger a Notebook job in a test workspace that seeds tiny test data, calls your transform, and asserts results (either via Spark asserts or by writing a small “result” table and checking row counts/values).&lt;/LI&gt;&lt;LI&gt;Or orchestrate with deployment pipelines/fabric-cicd and run a smoke-test notebook after deploy. Example concept posts: &lt;A href="https://www.kevinrchant.com/2025/05/01/automate-testing-microsoft-fabric-data-pipelines-with-yaml-pipelines/" target="_blank" rel="noopener"&gt;Automate testing Fabric pipelines with YAML, &lt;/A&gt;&lt;A href="https://www.kevinrchant.com/2025/02/27/initial-tests-of-fabric-cicd/" target="_blank" rel="noopener"&gt;fabric-cicd library initial tests.&lt;/A&gt;&lt;/LI&gt;&lt;LI&gt;In notebooks, import your package; don’t re-implement.&lt;BR /&gt;Your Fabric notebook should import your_pkg.transforms so the code under test and the code you run in Fabric are the same. General notebook authoring doc: &lt;A href="https://learn.microsoft.com/en-us/fabric/data-engineering/author-execute-notebook" target="_blank" rel="noopener"&gt;Develop, execute, and manage Fabric notebooks.&amp;nbsp;&lt;/A&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;EM&gt;&lt;STRONG&gt;If you found this helpful, consider giving some Kudos. If I answered your question or solved your problem, mark this post as the solution.&lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 30 Sep 2025 16:15:45 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/I-am-looking-for-a-way-of-writing-unit-tests-for-pyspark/m-p/4839242#M12628</guid>
      <dc:creator>tayloramy</dc:creator>
      <dc:date>2025-09-30T16:15:45Z</dc:date>
    </item>
    <item>
      <title>Re: I am looking for a way of writing unit tests for pyspark transfomations.</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/I-am-looking-for-a-way-of-writing-unit-tests-for-pyspark/m-p/4843698#M12744</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/1362553"&gt;@ex_kjetilh&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;We would like to confirm if our community members answer resolves your query or if you need further help. If you still have any questions or need more support, please feel free to let us know. We are happy to help you.&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;&lt;a href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/1340679"&gt;@tayloramy&lt;/a&gt;&amp;nbsp;,Thanks for your prompt response&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thank you for your patience and look forward to hearing from you.&lt;BR /&gt;Best Regards,&lt;BR /&gt;Prashanth Are&lt;BR /&gt;MS Fabric community support&lt;/P&gt;</description>
      <pubDate>Mon, 06 Oct 2025 16:41:53 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/I-am-looking-for-a-way-of-writing-unit-tests-for-pyspark/m-p/4843698#M12744</guid>
      <dc:creator>v-prasare</dc:creator>
      <dc:date>2025-10-06T16:41:53Z</dc:date>
    </item>
    <item>
      <title>Re: I am looking for a way of writing unit tests for pyspark transfomations.</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/I-am-looking-for-a-way-of-writing-unit-tests-for-pyspark/m-p/4843735#M12746</link>
      <description>&lt;P&gt;I wrote a post some time ago relating to this, I hope it helps:&lt;BR /&gt;&lt;BR /&gt;&lt;A href="https://www.kevinrchant.com/2024/08/30/unit-tests-on-microsoft-fabric-items/" target="_blank"&gt;https://www.kevinrchant.com/2024/08/30/unit-tests-on-microsoft-fabric-items/&lt;/A&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 06 Oct 2025 17:43:47 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/I-am-looking-for-a-way-of-writing-unit-tests-for-pyspark/m-p/4843735#M12746</guid>
      <dc:creator>KevinChant</dc:creator>
      <dc:date>2025-10-06T17:43:47Z</dc:date>
    </item>
    <item>
      <title>Re: I am looking for a way of writing unit tests for pyspark transfomations.</title>
      <link>https://community.fabric.microsoft.com/t5/Data-Engineering/I-am-looking-for-a-way-of-writing-unit-tests-for-pyspark/m-p/4851675#M12932</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;A href="https://community.fabric.microsoft.com/t5/user/viewprofilepage/user-id/1362553" target="_blank"&gt;@ex_kjetilh&lt;/A&gt;,&lt;/P&gt;
&lt;P&gt;We would like to confirm if our community members answer resolves your query or if you need further help. If you still have any questions or need more support, please feel free to let us know.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thank you for your patience and look forward to hearing from you.&lt;BR /&gt;Best Regards,&lt;BR /&gt;Prashanth Are&lt;BR /&gt;MS Fabric community support&lt;/P&gt;</description>
      <pubDate>Thu, 16 Oct 2025 10:16:14 GMT</pubDate>
      <guid>https://community.fabric.microsoft.com/t5/Data-Engineering/I-am-looking-for-a-way-of-writing-unit-tests-for-pyspark/m-p/4851675#M12932</guid>
      <dc:creator>v-prasare</dc:creator>
      <dc:date>2025-10-16T10:16:14Z</dc:date>
    </item>
  </channel>
</rss>

