Spark Job Definitions: multiple lakehouses

DennesTorres · ‎10-02-2023

Hi,

When I select multiple lakehouses on the Spark job definition, one becomes the default and all the others become a configuration setting available to the notebook.

If I'm using specific functions related to path, for example, listing all the tables on the lakehouse, will this "just work" automatically, or do I need to loop through all the lakehouses set to the job definition, changing the default one?

In the last case, how do I make this loop and change the default one?

Kind Regards,

Dennes

v-cboorla-msft · ‎10-05-2023

Hi @DennesTorres

Following up to see if the below suggestion was helpful. And, please do let us know in case of any further queries.

v-cboorla-msft · ‎10-03-2023

Hi @DennesTorres

Welcome to Fabric Community and thanks for posting your question here.

As I understand that you are trying to list all the tables present in the Lakehouses. You can achieve this by running the below code in the notebook.

from pyspark.sql import SparkSession

spark = SparkSession \
    .builder \
    .appName("Python Spark SQL basic example") \
    .config("spark.some.config.option", "some-value") \
    .getOrCreate()
print("hello spark job definition")

# Get the list of lakehouses to read tables from.
lakehouses = ["lakehouse1", "lakehouse2"]

# Loop through the lakehouses and read all tables from each lakehouse.
for lakehouse in lakehouses:
    tables = spark.sql(f"SHOW TABLES IN {lakehouse}")
    print(tables)

Hope this helps, please do let us know if you have any queries.