Solved: Notebook fails with error 200 when run from Pipeli...

ToddChitt · ‎03-11-2024

I'm fairly new to Spark Notebooks. I have one that dumps a JSON file into a couple of tables in my Lakehouse. It works fine when the Notebook is run on its own. But when I try to run it from the context of a Pipeline, I get this error:

Notebook execution failed at Notebook service with http status code - '200', please check the Run logs on Notebook, additional details - 'Error name - AnalysisException, Error value - org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Spark SQL queries are only possible in the context of a lakehouse. Please attach a lakehouse to proceed.)'

What is going on? "Please attache a lakehouse to proceed"? What does that mean? In the setup of the Notebook activity in the Pipeline, I select the Workspace and Notebook from validated lists. The Pipeline should *know* that the Notebook is a part of a particular Lakehouse.

Any help would be appreciated.

Did I answer your question? If so, mark my post as a solution. Also consider helping someone else in the forums!

Proud to be a Super User!

v-nikhilan-msft · ‎03-12-2024

Hi @ToddChitt

Thanks for the details. Can you please send me the code which you are running in the notebook? I will try to repro this from my side and let you know.

And also can you please try to create a new notebook and then attach the lakehouse. Now try to run this notebook from the pipeline. If the issue still persists please do let me know. This way there wont be a grey out option and you can attach or detach the lakehouse.

Did you try to set the target lakehouse as the default lakehouse? Default lakehouse is identified by the pin icon.

Thanks.

View solution in original post

pshepunde · ‎05-07-2024

I am getting similar error but it is about variable/dataframe note defined.. Name Error. When I ran notebook manually, it is successfull but when triggered from pipeline it is failing with exceptions mentioned above.

below is the line of code causing error, where read_particular_day_files is a user defined function.

numfiles_pti_ar, df_toprocess_pti_ar, df_policy_transaction_info_ar = read_particular_day_files(accountingDate, '', 'PolicyTransactionInfo')

ToddChitt · ‎03-12-2024

Hello @v-nikhilan-msft and thank you for the prompt reply.

Per your suggestion I created a new stand-alone notebook (i.e.: NOT from already inside the lakehouse) then manually ADDED the lakehouse as a new source.

This worked! Invoking the Notebook from the context of a Pipeline run just fine.

Thank you.

Do you still want the code? All it does is read a JSON file, parse it into 4 data frames and save each to a table in the lakehouse.

Did I answer your question? If so, mark my post as a solution. Also consider helping someone else in the forums!

Proud to be a Super User!

v-nikhilan-msft · ‎03-12-2024

Hi @ToddChitt

Thanks for the code. Glad that your query got resolved. Please continue using Fabric Community for any help regarding your queries.

pshepunde · ‎05-07-2024

hello v-nikhilan-msft,

I am getting similar error but it is about variable/dataframe note defined.. Name Error. When I ran notebook manually, it is successfull but when triggered from pipeline it is failing with exceptions mentioned above.

below is the line of code causing error, where read_particular_day_files is a user defined function.

numfiles_pti_ar, df_toprocess_pti_ar, df_policy_transaction_info_ar = read_particular_day_files(accountingDate, '', 'PolicyTransactionInfo')

please help me resolve this issue

v-nikhilan-msft · ‎05-07-2024

Hi @pshepunde

Thanks for the post. But as the initial ask is different from your issue, can you please create a new post and tag me. I will surely help.

Please attach some screenshots of the error too.

Thanks.

ToddChitt · ‎03-12-2024

I would consider this a bug: A Notebook created from the context of a Lakehouse has trouble access that lakehouse when run from inside a Pipeline. But a Notebook created on its own, with the Lakehouse added afterwards has no issues.

Please don't tell me that this was "by design"!

Did I answer your question? If so, mark my post as a solution. Also consider helping someone else in the forums!

Proud to be a Super User!

dpombal · ‎03-26-2024

This is clearly a bug, I managed to make it work creating a new empty notebook , but this should be fixed

Ben1133111 · ‎05-06-2024

I also experienced this issue and agree it is a bug. However an alternative workaround to creating a new notebook is to completely remove and add the linked lakehouse back. To do this you can select the default pinned lakehouse and select Remove all Lakehouses. Once added back it seems to trigger from pipeline with no errors.

ToddChitt · ‎03-12-2024

Hello @v-nikhilan-msft

The Notebook was created from within the Lakehouse where the tables are. It is already connected. As stated, it runs just fine by itself and populates the tables in the lakehouse.

I cannot add it again via the method you describe above as it is already connected.

If I follow the steps for "+ Data Sources", I select "Lakehouses", then the radio buttong for "Existing lakehouse". The lakehouse that the notebook was created under is listed, but grayed out with the message: "This item is preselected and can't be unchecked."

For all intents and purposes, I have to assume that the Notebook is already joined/knows about/is connected to the lakehouse.

The error mentions "Spark SQL queries" but none of my three code blocks uses Spark SQL, they all use the default of PySpark.

Regards,

Did I answer your question? If so, mark my post as a solution. Also consider helping someone else in the forums!

Proud to be a Super User!

v-nikhilan-msft · ‎03-12-2024

Hi @ToddChitt

Thanks for the details. Can you please send me the code which you are running in the notebook? I will try to repro this from my side and let you know.

And also can you please try to create a new notebook and then attach the lakehouse. Now try to run this notebook from the pipeline. If the issue still persists please do let me know. This way there wont be a grey out option and you can attach or detach the lakehouse.

Did you try to set the target lakehouse as the default lakehouse? Default lakehouse is identified by the pin icon.

Thanks.

v-nikhilan-msft · ‎03-11-2024

Hi @ToddChitt
Thanks for using Fabric Community.
Can you please follow the below steps and retry :

At the leftside of the notebook please click on lakehouses.

Click on Add lakehouse.

Select the lakehouse where the table resides. Attach the lakehouse to the notebook.

Now try to run the pipeline. Please let me know if the issue still persists. Hope this helps.

Notebook fails with error 200 when run from Pipeline

Helpful resources

New forum boards available in Synapse

Fabric certifications survey

Fabric Monthly Update - April 2024

Fabric Community Update - April 2024