Solved: Re: Fabric failing to read from lakehouse

rookie111 · ‎06-25-2025

I have created a package which builds the endpoint for lakehouse - correctly as I have validated the output - and reads the data from files in the lakehouse using spark, but unfortunately the same code keeps failing with the following error

======================
2025-06-25 07:53:33,693 . ERROR - mcp_edp_pipeline - mcp.edp.ingestion_framework.data_io.lakehouse_reader - read:55 - Lakehouse read failed. Traceback (most recent call last): File "/home/trusted-service-user/cluster-env/trident_env/lib/python3.11/site-packages/ingestion_framework/data_io/lakehouse_reader.py", line 53, in read return self._read_files() ^^^^^^^^^^^^^^^^^^ File "/home/trusted-service-user/cluster-env/trident_env/lib/python3.11/site-packages/ingestion_framework/data_io/lakehouse_reader.py", line 90, in _read_files reader = reader.schema(config["schema"]) ^^^^^^^^^^^^^^^^^^^^^^ File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 307, in load return self._df(self._jreader.load(path)) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/trusted-service-user/cluster-env/trident_env/lib/python3.11/site-packages/py4j/java_gateway.py", line 1322, in __call__ return_value = get_return_value( ^^^^^^^^^^^^^^^^^ File "/opt/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py", line 179, in deco return f(*a, **kw) ^^^^^^^^^^^ File "/home/trusted-service-user/cluster-env/trident_env/lib/python3.11/site-packages/py4j/protocol.py", line 326, in get_return_value raise Py4JJavaError( py4j.protocol.Py4JJavaError: An error occurred while calling o9381.load. : Operation failed: "Bad Request", 400, GET, http://onelake.dfs.fabric.microsoft.com/ef4a09c9-61b6-45ca-b99f-1d8ea1cee548?upn=false&resource=file... FriendlyNameSupportDisabled, "Request Failed with WorkspaceId and ArtifactId should be either valid Guids or valid Names" at org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.completeExecute(AbfsRestOperation.java:231) at org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.lambda$execute$0(AbfsRestOperation.java:191) at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDurationOfInvocation(IOStatisticsBinding.java:464) at org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.execute(AbfsRestOperation.java:189) at org.apache.hadoop.fs.azurebfs.services.AbfsClient.listPath(AbfsClient.java:311) at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.listStatus(AzureBlobFileSystemStore.java:1173) at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.listStatus(AzureBlobFileSystemStore.java:1143) at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.listStatus(AzureBlobFileSystemStore.java:1125) at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.listStatus(AzureBlobFileSystem.java:513) at org.apache.hadoop.fs.Globber.listStatus(Globber.java:128) at org.apache.hadoop.fs.Globber.doGlob(Globber.java:291) at org.apache.hadoop.fs.Globber.glob(Globber.java:202) at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:2123) at org.apache.spark.deploy.SparkHadoopUtil.globPath(SparkHadoopUtil.scala:301) at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$checkAndGlobPathIfNecessary$3(DataSource.scala:740) at org.apache.spark.util.ThreadUtils$.$anonfun$parmap$2(ThreadUtils.scala:384) at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659) at scala.util.Success.$anonfun$map$1(Try.scala:255) at scala.util.Success.map(Try.scala:213) at scala.concurrent.Future.$anonfun$map$1(Future.scala:292) at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33) at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33) at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64) at java.base/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1426) at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290) at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020) at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656) at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594) at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183) --------------------------------------------------------------------------- Py4JJavaError Traceback (most recent call last) File ~/cluster-env/trident_env/lib/python3.11/site-packages/ingestion_framework/data_io/lakehouse_reader.py:53, in LakehouseReader.read(self) 52 else: ---> 53 return self._read_files() 54 except Exception as e: File ~/cluster-env/trident_env/lib/python3.11/site-packages/ingestion_framework/data_io/lakehouse_reader.py:90, in LakehouseReader._read_files(self) 88 reader = reader.schema(config["schema"]) ---> 90 df = reader.load(base_path) 92 # optional row skipping File /opt/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py:307, in DataFrameReader.load(self, path, format, schema, **options) 306 if isinstance(path, str😞 --> 307 return self._df(self._jreader.load(path)) 308 elif path is not None: File ~/cluster-env/trident_env/lib/python3.11/site-packages/py4j/java_gateway.py:1322, in JavaMember.__call__(self, *args) 1321 answer = self.gateway_client.send_command(command) -> 1322 return_value = get_return_value( 1323 answer, self.gateway_client, self.target_id, self.name) 1325 for temp_arg in temp_args: File /opt/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py:179, in capture_sql_exception.<locals>.deco(*a, **kw) 178 try: --> 179 return f(*a, **kw) 180 except Py4JJavaError as e: File ~/cluster-env/trident_env/lib/python3.11/site-packages/py4j/protocol.py:326, in get_return_value(answer, gateway_client, target_id, name) 325 if answer[1] == REFERENCE_TYPE: --> 326 raise Py4JJavaError( 327 "An error occurred while calling {0}{1}{2}.\n". 328 format(target_id, ".", name), value) 329 else: Py4JJavaError: An error occurred while calling o9381.load. : Operation failed: "Bad Request", 400, GET, http://onelake.dfs.fabric.microsoft.com/ef4a09c9-61b6-45ca-b99f-1d8ea1cee548?upn=false&resource=file... FriendlyNameSupportDisabled,
===================================
here is the code that reads it

def _read_files(self) -> DataFrame:

config = self.config

base_path = f"abfss://{config['workspace_id']}@onelake.dfs.fabric.microsoft.com/{config['lakehouse_id']}/Files/{config['folder_path']}"

fmt = config["format"].lower()

_logger.info(f"Reading files from '{base_path}' with format '{fmt}'.")

reader = self.spark.read.format(fmt)

if fmt == "json" and config.get("multiline", False😞

reader = reader.option("multiLine", "true")

if config.get("wholeFile"😞

reader = reader.option("wholeFile", str(config["wholeFile"]).lower())

if config.get("infer_schema"😞

reader = reader.option("inferSchema", str(config["infer_schema"]).lower())

if config.get("header") is not None:

reader = reader.option("header", str(config["header"]).lower())

for k, v in config.get("options", {}).items():

reader = reader.option(k, v)

# static schema

if isinstance(config.get("schema"), StructType):

reader = reader.schema(config["schema"])

print("Loading data from base_path: ", base_path)

df = reader.load(base_path)
This works fine locally, and if I extract the base_path and run it manually it runs fine and loads the data
I don't know if this is a bug but why is this not working fine?

I have created a whl file for this and loaded to the env.

rookie111 · ‎07-01-2025

I have tested all the logic ensuring the GUID is fine as i am using fabric's api to extract, i am pretty sure its some backend issue, which i do not have the time to sit and debug, i have worked it around and fixed.

View solution in original post

v-kathullac · ‎06-29-2025

Thanks @BhaveshPatel for addressing the issue.

Hi @rookie111 ,

we wanted to kindly follow up to check if the solution provided for the issue worked? or Let us know if you need any further assistance?

Regards,

Chaithanya

BhaveshPatel · ‎06-27-2025

Hi there,

Locally, the spark dataframe aka python dataframe does not work. For that to work, You need to move the data to public cloud source ( azure environment).

For example, you can use below code:

from pyspark.sql.functions import *

sdf = spark.createDataFrame(df)

sdf.write.mode("overwrite").option("overwriteSchema", "true").format("delta").saveAsTable("DimTables")

display(sdf)

Thanks & Regards,
Bhavesh

Love the Self Service BI.
Please use the 'Mark as answer' link to mark a post that answers your question. If you find a reply helpful, please remember to give Kudos.

v-kathullac · ‎06-26-2025

Hi @rookie111 ,

Thank you for reaching out to Microsoft Fabric Community Forum.

Below are the few observations and few debud points to resolve your issue please try with the points and let us know if you need and further assistance.

The error you're seeing (400 Bad Request - FriendlyNameSupportDisabled) means that the system does not support friendly names for workspace_id or lakehouse_id and expects valid GUIDs only.
Please ensure that both workspace_id and lakehouse_id in your config are actual GUIDs (e.g., ef4a09c9-61b6-45ca-b99f-1d8ea1cee548) and not friendly names like "SalesLakehouse" or "MyWorkspace".
You can find the correct GUIDs by checking the settings of the Fabric workspace and lakehouse from the UI.
the path should like this: abfss://<workspace_id>@onelake.dfs.fabric.microsoft.com/<lakehouse_id>/Files/<folder_path>
In some managed environments (like Fabric), the workspace context is automatically inferred and including it explicitly may cause errors.
Even if this code works locally, the Fabric-managed Spark environment may enforce stricter validation, causing the same path to fail there.
the actual value of base_path just before calling .load() to confirm what’s being passed into Spark when it fails.
Temporarily hardcode the path value that works locally to see if the same path succeeds when running in Fabric.
If schema binding is causing issues, try commenting out the .schema() line and allow Spark to infer the schema to test path access:reader = self.spark.read.format(fmt).load(base_path)
Make sure the file path inside the lakehouse actually exists and is accessible from the environment where the Spark job is running.
Check if the Spark cluster running your job has data access permissions to the lakehouse — permission issues can sometimes appear as path resolution errors.
If you created a .whl package and deployed it, ensure the package is correctly picked up in the runtime environment and is not using stale or overridden configurations.
You can try running the same logic inside a Fabric notebook directly to verify whether the code fails only inside pipelines or universally in the Fabric context.
As a temporary workaround, if schema binding is causing issues, try removing .schema(config["schema"]) and allow Spark to infer the schema just to see if .load() works that helps isolate whether the path is invalid or the reader is misconfigured.
Lastly, ensure your workspace_id, lakehouse_id, and storage path haven’t changed recently due to renaming or redeployment, as that can invalidate the previous GUIDs used.

Regards,

Chaithanya.

rookie111 · ‎07-01-2025

I have tested all the logic ensuring the GUID is fine as i am using fabric's api to extract, i am pretty sure its some backend issue, which i do not have the time to sit and debug, i have worked it around and fixed.

rookie111 · ‎07-01-2025

Thanks all for your time and in looking into this.

Fabric failing to read from lakehouse

Helpful resources

Fabric Monthly Update - July 2025

Fabric Community Update - July 2025

Party with Power BI’s own Guy in a Cube

Fabric failing to read from lakehouse

Helpful resources

Fabric Monthly Update - July 2025

Fabric Community Update - July 2025