Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Ask the Fabric Databases & App Development teams anything! Live on Reddit on August 26th. Learn more.

Reply
russelp
Helper I
Helper I

Pydeequ - JavaPackage is not callable

Hi,

 

I'm trying to use PyDeequ and I am following the steps here - https://pydeequ.readthedocs.io/en/latest/README.html#installation

 

1. 

pip install pydeequ

2.

import os

# Set the SPARK_VERSION environment variable
os.environ['SPARK_VERSION'] = '3.3'

3. 

from pyspark.sql import SparkSession, Row
import pydeequ

spark = (SparkSession
    .builder
    .config("spark.jars.packages", pydeequ.deequ_maven_coord)
    .config("spark.jars.excludes", pydeequ.f2j_maven_coord)
    .getOrCreate())

df = spark.sparkContext.parallelize([
            Row(a="foo", b=1, c=5),
            Row(a="bar", b=2, c=6),
            Row(a="baz", b=3, c=None)]).toDF()

4. 

from pydeequ.analyzers import *
analysisResult = AnalysisRunner(spark) \
                    .onData(df) \
                    .addAnalyzer(Size()) \
                    .addAnalyzer(Completeness("b")) \
                    .run()

analysisResult_df = AnalyzerContext.successMetricsAsDataFrame(spark, analysisResult)
analysisResult_df.show()

 

I am getting the following error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[53], line 4
      1 from pydeequ.analyzers import *
      3 analysisResult = AnalysisRunner(spark) \
----> 4                     .onData(df) \
      5                     .addAnalyzer(Size()) \
      6                     .addAnalyzer(Completeness("b")) \
      7                     .run()
      9 analysisResult_df = AnalyzerContext.successMetricsAsDataFrame(spark, analysisResult)
     10 analysisResult_df.show()

File ~/cluster-env/trident_env/lib/python3.10/site-packages/pydeequ/analyzers.py:52, in AnalysisRunner.onData(self, df)
     46 """
     47 Starting point to construct an AnalysisRun.
     48 :param dataFrame df: tabular data on which the checks should be verified
     49 :return: new AnalysisRunBuilder object
     50 """
     51 df = ensure_pyspark_df(self._spark_session, df)
---> 52 return AnalysisRunBuilder(self._spark_session, df)

File ~/cluster-env/trident_env/lib/python3.10/site-packages/pydeequ/analyzers.py:124, in AnalysisRunBuilder.__init__(self, spark_session, df)
    122 self._jspark_session = spark_session._jsparkSession
    123 self._df = df
--> 124 self._AnalysisRunBuilder = self._jvm.com.amazon.deequ.analyzers.runners.AnalysisRunBuilder(df._jdf)

TypeError: 'JavaPackage' object is not callable
 

 

Did I miss any installation or setup or anything? 

 

 

 

 

1 ACCEPTED SOLUTION

Hi @Anonymous ,

 

We had the latest version of pydeequ and we managed to solve it by adding a spark property to the environment. See screenshot below. 

 

russelp_0-1720651892035.png

 

 

View solution in original post

2 REPLIES 2
Anonymous
Not applicable

Hi @russelp ,

 

The "'JavaPackage' object is not callable" error message usually means that the used Java/Scala package was not found. This may mean that the Deequ library was not loaded correctly into the Spark session.

 

There are several things you can check for the problem:

Make sure you are using compatible versions of Spark and Deequ.

 

Make sure PyDeequ is correctly installed and up to date. This can be checked with the following command:

pip show pydeequ

 

PyDeequ can be reinstalled with the following command:

pip install --upgrade pydeequ

 

Best Regards,
Yang
Community Support Team

 

If there is any post helps, then please consider Accept it as the solution  to help the other members find it more quickly.
If I misunderstand your needs or you still have problems on it, please feel free to let us know. Thanks a lot!

Hi @Anonymous ,

 

We had the latest version of pydeequ and we managed to solve it by adding a spark property to the environment. See screenshot below. 

 

russelp_0-1720651892035.png

 

 

Helpful resources

Announcements
Fabric July 2025 Monthly Update Carousel

Fabric Monthly Update - July 2025

Check out the July 2025 Fabric update to learn about new features.

August 2025 community update carousel

Fabric Community Update - August 2025

Find out what's new and trending in the Fabric community.