Join us at FabCon Atlanta from March 16 - 20, 2026, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.
Register now!Special holiday offer! You and a friend can attend FabCon with a BOGO code. Supplies are limited. Register now.
I'm new to Data Engineering / Notebooks and am trying to follow a youtube video to setup a notebook for schema validation after data has been landed in Lakehouse files.
I have a parameter cell with a 'fileToTest' and a output_table_name. When I try to use the 'fileToTest' parameter outside of the parameter code cell it doesn't work and I get a "NameError: name 'fileToTest' is not defined" error.
--- Updated with some additional findings
I only have 3 code cells.
Cell 1: (Parameters)
fileToTest = "Files/folder/file.csv"
output_table_name = 'raw_users'
Cell 2 (attempting to install GreatExpectations
%pip install --q great_expectations
df = spark.read.format("csv").option("header","true").load(fileToTest)
display(df)
UPDATE:
IF I comment out the pip install command the spark.read operation will work. Not sure what this means. Of course in my notebook that means cell 3 fails when it tries to import great_expectations.
Cell 3 (attempting to create validations within Great Expectations Context
import great_expectations as gx
gxContext = gx.get_context()
validator = gxContext.sources.pandas_default.read_csv(fileToTest)
Either of the references above to 'fileToTest' fail with the NameError.
If I move this code to Cell 1 it works w/o issue
df = spark.read.format("csv").option("header","true").load(fileToTest)
display(df)
For reference the original video is here:
https://youtu.be/wAayC-J9TsU?si=D25oMc7oZfpGFrxc
Solved! Go to Solution.
Found the cause of my issue. Learning newbie here.
spark.read works with the Files/.... path
great_Expectations / ?pandas? requires /lakehouse/default/ to be prepended to the path
I don't have experience with parameter cell, however my initial thought when you got the "NameError: name 'fileToTest' is not defined" was that you had not executed (run) Cell 1 before you tried to use the fileToTest variable in another cell.
So therefore the 'fileToTest' variable didn't exist at that moment when you executed another cell.
Found the cause of my issue. Learning newbie here.
spark.read works with the Files/.... path
great_Expectations / ?pandas? requires /lakehouse/default/ to be prepended to the path