Solved: Pydeequ - JavaPackage is not callable

russelp · ‎07-08-2024

Hi,

I'm trying to use PyDeequ and I am following the steps here - https://pydeequ.readthedocs.io/en/latest/README.html#installation

1.

pip install pydeequ

2.

import os

# Set the SPARK_VERSION environment variable
os.environ['SPARK_VERSION'] = '3.3'

3.

from pyspark.sql import SparkSession, Row
import pydeequ

spark = (SparkSession
    .builder
    .config("spark.jars.packages", pydeequ.deequ_maven_coord)
    .config("spark.jars.excludes", pydeequ.f2j_maven_coord)
    .getOrCreate())

df = spark.sparkContext.parallelize([
            Row(a="foo", b=1, c=5),
            Row(a="bar", b=2, c=6),
            Row(a="baz", b=3, c=None)]).toDF()

4.

from pydeequ.analyzers import *
analysisResult = AnalysisRunner(spark) \
                    .onData(df) \
                    .addAnalyzer(Size()) \
                    .addAnalyzer(Completeness("b")) \
                    .run()

analysisResult_df = AnalyzerContext.successMetricsAsDataFrame(spark, analysisResult)
analysisResult_df.show()

I am getting the following error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[53], line 4
      1 from pydeequ.analyzers import *
      3 analysisResult = AnalysisRunner(spark) \
----> 4                     .onData(df) \
      5                     .addAnalyzer(Size()) \
      6                     .addAnalyzer(Completeness("b")) \
      7                     .run()
      9 analysisResult_df = AnalyzerContext.successMetricsAsDataFrame(spark, analysisResult)
     10 analysisResult_df.show()

File ~/cluster-env/trident_env/lib/python3.10/site-packages/pydeequ/analyzers.py:52, in AnalysisRunner.onData(self, df)
     46 """
     47 Starting point to construct an AnalysisRun.
     48 :param dataFrame df: tabular data on which the checks should be verified
     49 :return: new AnalysisRunBuilder object
     50 """
     51 df = ensure_pyspark_df(self._spark_session, df)
---> 52 return AnalysisRunBuilder(self._spark_session, df)

File ~/cluster-env/trident_env/lib/python3.10/site-packages/pydeequ/analyzers.py:124, in AnalysisRunBuilder.__init__(self, spark_session, df)
    122 self._jspark_session = spark_session._jsparkSession
    123 self._df = df
--> 124 self._AnalysisRunBuilder = self._jvm.com.amazon.deequ.analyzers.runners.AnalysisRunBuilder(df._jdf)

TypeError: 'JavaPackage' object is not callable

Did I miss any installation or setup or anything?

russelp · ‎07-10-2024

Hi @Anonymous ,

We had the latest version of pydeequ and we managed to solve it by adding a spark property to the environment. See screenshot below.

View solution in original post

datalearner_1 · ‎10-02-2025

I have installed pypi library to my envinornment , the latest version for pydeeque available is 1.5.0 and added the spark property
com.amazon.deequ:deequ:1.5.0-spark-3.5, after adding this I am not able to connect to spark session.

Anonymous · ‎07-08-2024

Hi @russelp ,

The "'JavaPackage' object is not callable" error message usually means that the used Java/Scala package was not found. This may mean that the Deequ library was not loaded correctly into the Spark session.

There are several things you can check for the problem:

Make sure you are using compatible versions of Spark and Deequ.

Make sure PyDeequ is correctly installed and up to date. This can be checked with the following command:

pip show pydeequ

PyDeequ can be reinstalled with the following command:

pip install --upgrade pydeequ

Best Regards,
Yang
Community Support Team

If there is any post helps, then please consider Accept it as the solution to help the other members find it more quickly.
If I misunderstand your needs or you still have problems on it, please feel free to let us know. Thanks a lot!

russelp · ‎07-10-2024

Hi @Anonymous ,

We had the latest version of pydeequ and we managed to solve it by adding a spark property to the environment. See screenshot below.

Pydeequ - JavaPackage is not callable

Helpful resources

FabCon Global Hackathon

Fabric Monthly Update - September 2025

FabCon Atlanta 2026

FabCon is coming to Atlanta

Pydeequ - JavaPackage is not callable

Helpful resources

FabCon Global Hackathon

Fabric Monthly Update - September 2025

FabCon Atlanta 2026