Extract Data from Databricks Unity Catalog

H26027 · ‎03-06-2024

Hi,

I want to connect to Databricks UnityCatalog tables from Fabric Notebooks. Therefore I tried to utilize the databricks jdbc driver to read data with pyspark in this way:

personal_access_token = "XXX"
jdbc_url = "jdbc:databricks://adb-XXX.azuredatabricks.net:443/default"

connection_properties = {
"user": "token",
"password": personal_access_token,
"driver": "com.databricks.client.jdbc.Driver",
"ssl": "true",
"httpPath": "sql/protocolv1/o/XXX"
}

df = spark.read.jdbc(url=jdbc_url, table="XXX", properties=connection_properties)

I added the JDBC driver .jar file to my custom environment. But still, I am getting this error:
java.lang.ClassNotFoundException: com.databricks.client.jdbc.Driver

What am I doing wrong? Is there an alternative solution? Shortcut is currently no option.

Thanks in advance.

H26027 · ‎03-18-2024

I found a solution to my previous mentioned problem. The connection from Fabric Notebook to Databricks Unity Catalog is now working as expected - but not as described initially by Microsoft Documentation.

What I did now - in case anybody meets the same issue:

1. Download the Databricks JDBC Driver and store the .jar to Fabric Lakehouse.

2. Create a custom environment in Fabric

3. Add a spark property for spark.jars that is pointing to the ABFSS location of the uploaded .jar file:

4. In the Notebook itself, connect to this custom environment and use this call for reading the data:

personal_access_token = "XXX"
jdbc_url = "jdbc:databricks://adb-XXX.azuredatabricks.net:443/default"
http_path = "sql/protocolv1/o/XXX/XXX" 
query = "[catalog].[schema].[table]" # or "(SELECT [selection] FROM [table]) as alias"

df = (spark.read.format("jdbc")
    .option("url", jdbc_url)
    .option("dbtable", query)
    .option("user", "token")
    .option("password", personal_access_token)
    .option("driver", "com.databricks.client.jdbc.Driver")
    .option("ssl", "1")
    .option("ThriftTransport", "2")
    .option("AuthMech", "3")
    .option("httpPath", http_path)
    .option("UseNativeQuery", "0")
    .load())

The option "UseNativeQuery" is necessary too. Otherwise the jdbc dialect seems to be wrong, causing the before mentioned issue with data types (+ issues with quotes).

Thanks @v-cboorla-msft for your help.

View solution in original post

Mahens · ‎10-02-2024

Recently there is release of built-in support for integrating UC DB Tables into Fabric. It is much easier to integrate

https://learn.microsoft.com/en-us/fabric/database/mirrored-database/azure-databricks

v-cboorla-msft · ‎03-07-2024

Hi @H26027

Thanks for using Microsoft Fabric Community.

As I understnad that you’re trying to connect to Databricks UnityCatalog tables from Fabric Notebooks using the Databricks JDBC driver. The error message you’re encountering, java.lang.ClassNotFoundException: com.databricks.client.jdbc.Driver, indicates that the JDBC driver class is not found.

Could you please make sure that you’ve correctly added the Databricks JDBC driver .jar file to your custom environment. Double-check the path and make sure it’s accessible to your Spark session.

Confirm that the driver version matches your Databricks environment.

If the issue still persists please do let us know. Glad to help.

Please refer to this documentation which might help you.

Documentation link : Integrating Microsoft Fabric and Databricks Unity Catalog

I hope this information helps.

Thanks.

H26027 · ‎03-07-2024

The exact steps I did were:
Create a custom environment (preview) and upload a custom library to it:

After the upload, I published these changes.

Then I referenced this environment as the default environment for my workspace.
The Notebook that I am using is connected to this environment.
Then I just launched the code I gave in my initial post. I did not call this library specifically in the notebook though. Is that necessary? If so, how?

Thanks.

v-cboorla-msft · ‎03-14-2024

Hi @H26027

Apologize for the delay in response from my end.

Following up to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others .

If the issue still persists, kindly provide the following additional information for further assistance.

To ensure compatibility and troubleshoot the Databricks JDBC driver error you're encountering, could you please confirm the version of your Databricks? Version compatibility might be a potential cause. We can troubleshoot by verifying the appropriate JDBC driver for your specific Databricks runtime version. This might help you resolve the issue.

For additional information please refer : Unity Catalog limitations

: Databricks JDBC Driver

I hope this information helps. Please do let us know if you have any further queries.

Thanks.

H26027 · ‎03-14-2024

Hi @v-cboorla-msft,

I am using databricks cluster with runtime version 13.3 LTS in shared access mode that is unity catalog enabled.
The JDBC Driver I downloaded is the latest (2.6.36) downloaded from here https://www.databricks.com/spark/jdbc-drivers-archive.

The Databricks side is not the problem though, since the fabric notebook already says it cannot find the driver. I believe the driver is not loaded in the environment correctly, but I don't see anything I did wrong.

For testing purpose I added a public library from pip to the same environment. This library is available in the notebook without any problem.

v-cboorla-msft · ‎03-14-2024

Hi @H26027

Apologize for the inconvenience that you are facing here.

Please reach out to our support team to gain deeper insights and explore potential solutions, it's highly recommended that you reach out to our support team. Their expertise will be invaluable in suggesting the most appropriate approach.

Please go ahead and raise a support ticket to reach our support team:

https://support.fabric.microsoft.com/support

After creating a Support ticket please provide the ticket number as it would help us to track for more information.

Thank you.

v-cboorla-msft · ‎03-15-2024

Hi @H26027

I'm following up on my previous inquiry to see if you've had a chance to create a support ticket for this issue. If a ticket has been created, I would appreciate it if you could provide the ticket number for our reference. This will allow us to track the progress of the issue.

Thanks.

v-cboorla-msft · ‎03-15-2024

Hi @H26027

In the Microsoft Fabric Community forum, I found a relevant post titled 'Solved: Trying to connect to an Oracle database.' It discusses common connection problems and could be helpful in your current situation.

If the issue still persists. Please go ahead and raise a support ticket to reach our support team:

https://support.fabric.microsoft.com/support.

I hope this information helps.

Thanks.

H26027 · ‎03-15-2024

Hi @v-cboorla-msft ,

thanks again for the feedback. The ticket is opened under #2403140050003202.

The link you just provided actually really did the trick, but that is definetely not how it should work according to the official documentation.

Anyways, I could establish the connection now. However, I am getting errors when reading data that is not of type string.
Reading data to the dataframe is fine. Printing the schema works fine too and shows the correct datatypes. When I want to display the data, I get errors for everything else than string e.g. like: "Error converting value to Timestamp" (or for LONG values too).
If I already cast the value as string in the select sql for the jdbc connection, then it works fine.

The data itself is not corrupted, since the same operations work fine in databricks directly.

Any ideas?
Thanks

H26027 · ‎03-18-2024

I found a solution to my previous mentioned problem. The connection from Fabric Notebook to Databricks Unity Catalog is now working as expected - but not as described initially by Microsoft Documentation.

What I did now - in case anybody meets the same issue:

1. Download the Databricks JDBC Driver and store the .jar to Fabric Lakehouse.

2. Create a custom environment in Fabric

3. Add a spark property for spark.jars that is pointing to the ABFSS location of the uploaded .jar file:

4. In the Notebook itself, connect to this custom environment and use this call for reading the data:

personal_access_token = "XXX"
jdbc_url = "jdbc:databricks://adb-XXX.azuredatabricks.net:443/default"
http_path = "sql/protocolv1/o/XXX/XXX" 
query = "[catalog].[schema].[table]" # or "(SELECT [selection] FROM [table]) as alias"

df = (spark.read.format("jdbc")
    .option("url", jdbc_url)
    .option("dbtable", query)
    .option("user", "token")
    .option("password", personal_access_token)
    .option("driver", "com.databricks.client.jdbc.Driver")
    .option("ssl", "1")
    .option("ThriftTransport", "2")
    .option("AuthMech", "3")
    .option("httpPath", http_path)
    .option("UseNativeQuery", "0")
    .load())

The option "UseNativeQuery" is necessary too. Otherwise the jdbc dialect seems to be wrong, causing the before mentioned issue with data types (+ issues with quotes).

Thanks @v-cboorla-msft for your help.

v-cboorla-msft · ‎03-18-2024

Hi @H26027

Glad that your query got resolved and thank you for sharing the details in the community as it can be helpful to others, much appreciated.

Please continue using Fabric Community for any help regarding your queries.

Thanks.

Extract Data from Databricks Unity Catalog