Solved: Re: Issue with Accessing Tables from Non-dbo Schem...

goldenarm253 · ‎12-31-2024

Hello Microsoft Fabric Community,

I am currently working on a data engineering task in Microsoft Fabric that involves retrieving and processing tables from a specific schema in a Lakehouse. While my PySpark script works perfectly when the source schema is dbo, it fails when attempting to retrieve tables from a schema other than dbo. Here’s a breakdown of the issue: (Notebook works correctly for dbo)

------------------------------------------------------------------------------Remove Columns with complete NULLs in schema-------

from pyspark.sql.functions import col, sum as _sum
from pyspark.sql.utils import AnalysisException

# Step 1: Retrieve all table names from the SOQL schema in Salesforce_Lakehouse
try:
table_list = spark.sql("SHOW TABLES IN Salesforce_Lakehouse.SOQL").select("tableName").rdd.flatMap(lambda x: x).collect()
except AnalysisException as e:
print(f"Error retrieving tables: {str(e)}")
table_list = []

# Step 2: Loop through each table, process, and create non-null columns
for table_name in table_list:
try:
print(f"Processing table: {table_name}")

# Load table data from SOQL schema
df = spark.sql(f"SELECT * FROM Salesforce_Lakehouse.SOQL.{table_name}")

# Identify non-null columns
non_null_columns = [
c for c in df.columns
if df.select(_sum(col(c).isNotNull().cast("int"))).collect()[0][0] > 0
]

# Create a DataFrame with non-null columns
df_non_null = df.select(*non_null_columns)

# Check if the table exists in the Salesforce schema
table_exists = spark.sql(f"SHOW TABLES IN Salesforce_Lakehouse.Salesforce").filter(f"tableName = '{table_name}'").count() > 0

if table_exists:
print(f"Table {table_name} exists in Salesforce schema. Overwriting...")
else:
print(f"Table {table_name} does not exist in Saleforce schema. Creating new table...")

# Write the non-null DataFrame to the Salesforce schema
df_non_null.write.mode("overwrite").saveAsTable(f"Salesforce_Lakehouse.Salesforce.{table_name}")

print(f"Table {table_name} processed and saved successfully.")

except AnalysisException as e:
print(f"Error processing table {table_name}: {str(e)}")
except Exception as e:
print(f"Unexpected error with table {table_name}: {str(e)}")

----------------------

The Problem

The code above works seamlessly when using the dbo schema as the source (e.g., Salesforce_Lakehouse.dbo). However, when I attempt to retrieve tables from a different schema, such as Salesforce_Lakehouse.SOQL, it throws an error indicating that the schema cannot be found: ( I have changed to other schemas with the same results)

[SCHEMA_NOT_FOUND] The schema `Salesforce_Lakehouse.SOQL` cannot be found. Verify the spelling and correctness of the schema and catalog.

I have verified that the schema exists by running SHOW SCHEMAS;, and I can see the schema listed. However, the script does not seem to retrieve tables unless the schema is dbo.

I’d appreciate guidance on:

Why this behavior occurs with non-dbo schemas in Microsoft Fabric.
Any potential workarounds or configuration changes to allow the script to work with any schema (not just dbo).
Best practices for handling schema-specific access in a Microsoft Fabric Lakehouse.

Thank you in advance for your help!

v-csrikanth · ‎01-02-2025

Hi @goldenarm253

Thank you for being part of the Microsoft Fabric Community.

Please find the below workarounds that might helpful to resolve the issue.

1. Use Dynamic Schema Retrieval:

Instead of hardcoding a schema (like dbo), dynamically retrieve all schemas in the Lakehouse and iterate through them:

-----------------------------------------------------------------------------------

Code changes:

# Set the catalog to ensure correct context

spark.sql("USE CATALOG Salesforce_Lakehouse")

# Retrieve all schemas in the catalog

schemas = spark.sql("SHOW SCHEMAS IN Salesforce_Lakehouse").select("namespace").rdd.flatMap(lambda x: x).collect()

# Iterate through schemas to retrieve tables dynamically

for schema in schemas:

try:

print(f"Processing schema: {schema}")

table_list = spark.sql(f"SHOW TABLES IN `Salesforce_Lakehouse`.`{schema}`").select("tableName").rdd.flatMap(lambda x: x).collect()

for table_name in table_list:

print(f"Processing table: {schema}.{table_name}")

df = spark.sql(f"SELECT * FROM `Salesforce_Lakehouse`.`{schema}`.`{table_name}`")

# Further processing here

except Exception as e:

print(f"Error processing schema {schema}: {str(e)}")

-----------------------------------------------------------------------------------

2. Use a Parameterized Schema:

If the schema varies but you know it beforehand, pass the schema as a parameter or an environment variable.

-----------------------------------------------------------------------------------

import os
# Example: Pass schema as an environment variable or a parameter

schema = os.getenv("TARGET_SCHEMA", "dbo") # Default to 'dbo' if not set

# Retrieve tables from the specified schema

table_list = spark.sql(f"SHOW TABLES IN `Salesforce_Lakehouse`.`{schema}`").select("tableName").rdd.flatMap(lambda x: x).collect()

for table_name in table_list:

print(f"Processing table: {schema}.{table_name}")

df = spark.sql(f"SELECT * FROM `Salesforce_Lakehouse`.`{schema}`.`{table_name}`")

-----------------------------------------------------------------------------------

3. Set the Default Schema:

To set the default schema for the session to simplify queries, especially if working with one schema at a time.

-----------------------------------------------------------------------------------

# Use the specific schema

spark.sql("USE `Salesforce_Lakehouse.SOQL`")

# Query tables directly without prefixing schema

table_list = spark.sql("SHOW TABLES").select("tableName").rdd.flatMap(lambda x: x).collect()

for table_name in table_list:

print(f"Processing table: {table_name}")

df = spark.sql(f"SELECT * FROM `{table_name}`")

-----------------------------------------------------------------------------------

4. Grant Permissions to Non-dbo Schemas:

Ensure the user running the script has the necessary permissions for all schemas.

-----------------------------------------------------------------------------------

SQL Commands to Grant Permissions:

GRANT USAGE ON SCHEMA `Salesforce_Lakehouse`.`SOQL` TO <user>;

GRANT SELECT ON ALL TABLES IN SCHEMA `Salesforce_Lakehouse`.`SOQL` TO <user>;

-----------------------------------------------------------------------------------

If the above information helps you, please give us a Kudos and marked the reply as a solution.

Thanks,
Cheri Srikanth

View solution in original post

v-csrikanth · ‎06-27-2025

Hi @goldenarm253

We haven't heard from you since last response and just wanted to check whether the solution provided has worked for you. If yes, please Accept as Solution to help others benefit in the community.
Thank you.

If the above information is helpful, please give us Kudos and mark the response as Accepted as solution.
Best Regards,
Community Support Team _ C Srikanth.

v-csrikanth · ‎06-23-2025

Hi @goldenarm253

It's been a while since I heard back from you and I wanted to follow up. Have you had a chance to try the solutions that have been offered?
If the issue has been resolved, can you mark the post as resolved? If you're still experiencing challenges, please feel free to let us know and we'll be happy to continue to help!
Looking forward to your reply!

Best Regards,
Community Support Team _ C Srikanth.

v-csrikanth · ‎06-19-2025

Hi @goldenarm253
Sorry for the late response.
I wanted to follow up since I haven't heard from you in a while. Have you had a chance to try the suggested solutions?
If your issue is resolved, please consider marking the post as solved. However, if you're still facing challenges, feel free to share the details, and we'll be happy to assist you further.
Looking forward to your response!

As per Microsoft community guidelines, we will now proceed to close this thread to keep discussions focused and manageable. If you still need assistance, you're welcome to start a new thread in the community at any time.

We appreciate your understanding and participation.

Best Regards,
Cheri Srikanth

v-csrikanth · ‎01-02-2025

Hi @goldenarm253

Thank you for being part of the Microsoft Fabric Community.

Please find the below workarounds that might helpful to resolve the issue.

1. Use Dynamic Schema Retrieval:

Instead of hardcoding a schema (like dbo), dynamically retrieve all schemas in the Lakehouse and iterate through them:

-----------------------------------------------------------------------------------

Code changes:

# Set the catalog to ensure correct context

spark.sql("USE CATALOG Salesforce_Lakehouse")

# Retrieve all schemas in the catalog

schemas = spark.sql("SHOW SCHEMAS IN Salesforce_Lakehouse").select("namespace").rdd.flatMap(lambda x: x).collect()

# Iterate through schemas to retrieve tables dynamically

for schema in schemas:

try:

print(f"Processing schema: {schema}")

table_list = spark.sql(f"SHOW TABLES IN `Salesforce_Lakehouse`.`{schema}`").select("tableName").rdd.flatMap(lambda x: x).collect()

for table_name in table_list:

print(f"Processing table: {schema}.{table_name}")

df = spark.sql(f"SELECT * FROM `Salesforce_Lakehouse`.`{schema}`.`{table_name}`")

# Further processing here

except Exception as e:

print(f"Error processing schema {schema}: {str(e)}")

-----------------------------------------------------------------------------------

2. Use a Parameterized Schema:

If the schema varies but you know it beforehand, pass the schema as a parameter or an environment variable.

-----------------------------------------------------------------------------------

import os
# Example: Pass schema as an environment variable or a parameter

schema = os.getenv("TARGET_SCHEMA", "dbo") # Default to 'dbo' if not set

# Retrieve tables from the specified schema

table_list = spark.sql(f"SHOW TABLES IN `Salesforce_Lakehouse`.`{schema}`").select("tableName").rdd.flatMap(lambda x: x).collect()

for table_name in table_list:

print(f"Processing table: {schema}.{table_name}")

df = spark.sql(f"SELECT * FROM `Salesforce_Lakehouse`.`{schema}`.`{table_name}`")

-----------------------------------------------------------------------------------

3. Set the Default Schema:

To set the default schema for the session to simplify queries, especially if working with one schema at a time.

-----------------------------------------------------------------------------------

# Use the specific schema

spark.sql("USE `Salesforce_Lakehouse.SOQL`")

# Query tables directly without prefixing schema

table_list = spark.sql("SHOW TABLES").select("tableName").rdd.flatMap(lambda x: x).collect()

for table_name in table_list:

print(f"Processing table: {table_name}")

df = spark.sql(f"SELECT * FROM `{table_name}`")

-----------------------------------------------------------------------------------

4. Grant Permissions to Non-dbo Schemas:

Ensure the user running the script has the necessary permissions for all schemas.

-----------------------------------------------------------------------------------

SQL Commands to Grant Permissions:

GRANT USAGE ON SCHEMA `Salesforce_Lakehouse`.`SOQL` TO <user>;

GRANT SELECT ON ALL TABLES IN SCHEMA `Salesforce_Lakehouse`.`SOQL` TO <user>;

-----------------------------------------------------------------------------------

If the above information helps you, please give us a Kudos and marked the reply as a solution.

Thanks,
Cheri Srikanth

Issue with Accessing Tables from Non-dbo Schema in Microsoft Fabric Using PySpark

The Problem

Helpful resources

Fabric Monthly Update - July 2025

Fabric Community Update - August 2025

Join us at FabCon Vienna from September 15-18, 2025

Issue with Accessing Tables from Non-dbo Schema in Microsoft Fabric Using PySpark

The Problem

Helpful resources

Fabric Monthly Update - July 2025

Fabric Community Update - August 2025