Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Reply
xefere
New Member

Fabric tutorial failing on files path

Hi

 

I am doing this Fabric tutorial (Lakehouse tutorial - Prepare and transform data in the lakehouse - Microsoft Fabric | Microsoft Lear...)

 

When I use the code as provided,

from pyspark.sql.functions import col, year, month, quarter

table_name = 'fact_sale'

df = spark.read.parquet('Files/wwi-raw-data/full/fact_sale_1y_full')
df = df.withColumn('Year', year(col("InvoiceDateKey")))
df = df.withColumn('Quarter', quarter(col("InvoiceDateKey")))
df = df.withColumn('Month', month(col("InvoiceDateKey")))

df.write.mode("overwrite").format("delta").partitionBy("Year","Quarter").save("Tables/" + table_name)

 

I get the following error:

--------------------------------------------------------------------------- AnalysisException Traceback (most recent call last) Cell In[104], line 5 1 from pyspark.sql.functions import col, year, month, quarter 3 table_name = 'fact_sale' ----> 5 df = spark.read.parquet('Files/wwi-raw-data/full/fact_sale_1y_full') 6 df = df.withColumn('Year', year(col("InvoiceDateKey"))) 7 df = df.withColumn('Quarter', quarter(col("InvoiceDateKey")))

File /opt/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py:531, in DataFrameReader.parquet(self, *paths, **options) 520 int96RebaseMode = options.get("int96RebaseMode", None) 521 self._set_opts( 522 mergeSchema=mergeSchema, 523 pathGlobFilter=pathGlobFilter, (...) 528 int96RebaseMode=int96RebaseMode, 529 ) --> 531 return self._df(self._jreader.parquet(_to_seq(self._spark._sc, paths)))

File ~/cluster-env/trident_env/lib/python3.10/site-packages/py4j/java_gateway.py:1322, in JavaMember.__call__(self, *args) 1316 command = proto.CALL_COMMAND_NAME +\ 1317 self.command_header +\ 1318 args_command +\ 1319 proto.END_COMMAND_PART 1321 answer = self.gateway_client.send_command(command) -> 1322 return_value = get_return_value( 1323 answer, self.gateway_client, self.target_id, self.name) 1325 for temp_arg in temp_args: 1326 if hasattr(temp_arg, "_detach"😞

File /opt/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py:175, in capture_sql_exception.<locals>.deco(*a, **kw) 171 converted = convert_exception(e.java_exception) 172 if not isinstance(converted, UnknownException): 173 # Hide where the exception came from that shows a non-Pythonic 174 # JVM exception message. --> 175 raise converted from None 176 else: 177 raise AnalysisException:

[PATH_NOT_FOUND] Path does not exist: abfss://8c1fa0f9-27f5-4bd6-9266-e6dfccd1cf2f@onelake.dfs.fabric.microsoft.com/99c8f3be-4e9e-4f83-83f1-cc325343cf6b/Files/wwi-raw-data/full/fact_sale_1y_full.

 

xefere_0-1710114140024.png

 

I've now noticed that the abfss path is not the same.

 

When I run this code with the abfss path copied form my lakehouse, it work perfectly:

 

from pyspark.sql.functions import col, year, month, quarter

table_name = 'fact_sale'

# Read each CSV file in the folder
df = spark.read.option("header", "true").parquet(files).select("*", "_metadata.file_name","_metadata.file_modification_time")

df = spark.read.parquet('abfss://8c1fa0f9-27f5-4bd6-9266-e6dfccd1cf2f@onelake.dfs.fabric.microsoft.com/cbbd6d1f-0ac3-402a-ab8f-fbc7093b6ccc/Files/wwi-raw-data/full/fact_sale_1y_full')
df = df.withColumn('Year', year(col("InvoiceDateKey")))
df = df.withColumn('Quarter', quarter(col("InvoiceDateKey")))
df = df.withColumn('Month', month(col("InvoiceDateKey")))
 
df.write.mode("overwrite").format("delta").partitionBy("Year","Quarter").save("abfss://8c1fa0f9-27f5-4bd6-9266-e6dfccd1cf2f@onelake.dfs.fabric.microsoft.com/cbbd6d1f-0ac3-402a-ab8f-fbc7093b6ccc/Tables/" + table_name)
 
What am I doing wrong or is wrong in my setup?
1 ACCEPTED SOLUTION
v-gchenna-msft
Community Support
Community Support

Hi @xefere ,

Apologies for the delay in reply from our side. 
Based on the screenshot you provided, I can notice that lakehouse is not the default lakehouse in your case.
Once you change it to default lakehouse then you would be able to use the Relative File Path, i.e. 'Files/wwi-raw-data/full/fact_sale_1y_full'

vgchennamsft_0-1710313804660.png


Hope this is helpful. Please let me know incase of further queries.

 

View solution in original post

4 REPLIES 4
xefere
New Member

Thank you, I've added the Lakehouse in the Sources panel of the notebook and the relative path worked perfectly.

Glad to know that your query resolved. Please continue using fabric community for your further queries.

v-gchenna-msft
Community Support
Community Support

Hi @xefere ,

Apologies for the delay in reply from our side. 
Based on the screenshot you provided, I can notice that lakehouse is not the default lakehouse in your case.
Once you change it to default lakehouse then you would be able to use the Relative File Path, i.e. 'Files/wwi-raw-data/full/fact_sale_1y_full'

vgchennamsft_0-1710313804660.png


Hope this is helpful. Please let me know incase of further queries.

 

Hi @xefere ,

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet .
In case if you have any resolution please do share that same with the community as it can be helpful to others .
Otherwise, will respond back with the more details and we will try to help .

Helpful resources

Announcements
LearnSurvey

Fabric certifications survey

Certification feedback opportunity for the community.

April Fabric Update Carousel

Fabric Monthly Update - April 2024

Check out the April 2024 Fabric update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Kudoed Authors