Microsoft is giving away 50,000 FREE Microsoft Certification exam vouchers. Get Fabric certified for FREE! Learn more
I recently started using the VSCode Fabric Data Engineering extension, but haven't been able to make it work well enough to actually develop notebooks in VSCode.
When I try to use the "display" command on a pyspark DataFrame, I get a button that says "Open 'df' in Data Wrangler", and the results for the cell just pring "Using custom display..., showing 50 records". If I click the button, it opens data wrangler, which fails to load the data, I believe because pyspark DataFrames aren't supported in data wrangler, only pandas.
I would like the display command to just show an inline scrollable or truncated formatted table, the way that jupyter would show a pandas dataframe by default. I have tried uninstalling data wrangler, which removes the button, but the "custom display" message still appears and I can't see the dataframe.
data = [('Alice', 25) , ('Bob', 30), ('Charlie', 24)]
schema = ['name', 'age']
df = spark.createDataFrame(data, schema)
display(df)
I'm using the "fabric-synapse-runtime-1-2" kernel that was automatically created when I installed the Fabric extension. Notably, it also seems like there might be a mismatch between the spark version and the pandas version - I also tried converting to a pandas dataframe to see if I could view that, but the pyspark DataFrame.toPandas() call fails.
Solved! Go to Solution.
Hi @merganzeruncle,
Thankyou for using Microsoft Community Forum!
You have a good understanding of the issue, and you're right about the root cause. I've provided additional clarification to help give a clearer understanding of it.
When you select the fabric-synapse-runtime-1-2 kernel in VSCode, it is expected to activate the corresponding Conda environment. However, because your Conda configuration (envs_dirs) was not set up correctly, VSCode couldn't locate and activate the intended environment, causing it to fall back to the base environment instead.
The Fabric Data Engineering extension appears to default to using Data Wrangler for displaying DataFrames. Since Data Wrangler only supports Pandas DataFrames and not PySpark DataFrames, using the display(df) command mistakenly triggers the Data Wrangler prompt, even though it’s incompatible.
The below is the Link you can have a look for your reference:
Explore and transform Spark data with Data Wrangler - Microsoft Fabric | Microsoft Learn
If this solution worked for you, kindly mark it as Accept as Solution and feel free to give a Kudos, it would be much appreciated!
I might have solved this, but I don't understand why this worked. I was trying to figure out why pyspark and pandas weren't interacting well, so I opened the vscode terminal and noticed that even though I had the fabric-synapse-runtime-1-2 kernel selected, the terminal opened in the base conda environment. When I ran "conda list env", it printed no environments. So I checked "conda config --describe envs_dirs" and saw that my conda envs_dirs path was empty. I appended my envs directory (I have conda through miniforge installed for just my user so the path was %USERPROFILE%\AppData\Local\miniforge3\envs) so that conda could find my envs by name, then I activated the fabric synapse environment in the terminal. Now I can use toPandas() and display(). I figured VSCode would automatically activate the environment when I selected that kernel, but it's possible that because my original conda configuration was botched, it wasn't finding the environment when it tried to activate it, so it got stuck in the base environment. I don't know how pyspark worked at all in the base environment to begin with because I haven't worked in the base environment, but whatever, problem solved!
Hi @merganzeruncle,
Thankyou for using Microsoft Community Forum!
You have a good understanding of the issue, and you're right about the root cause. I've provided additional clarification to help give a clearer understanding of it.
When you select the fabric-synapse-runtime-1-2 kernel in VSCode, it is expected to activate the corresponding Conda environment. However, because your Conda configuration (envs_dirs) was not set up correctly, VSCode couldn't locate and activate the intended environment, causing it to fall back to the base environment instead.
The Fabric Data Engineering extension appears to default to using Data Wrangler for displaying DataFrames. Since Data Wrangler only supports Pandas DataFrames and not PySpark DataFrames, using the display(df) command mistakenly triggers the Data Wrangler prompt, even though it’s incompatible.
The below is the Link you can have a look for your reference:
Explore and transform Spark data with Data Wrangler - Microsoft Fabric | Microsoft Learn
If this solution worked for you, kindly mark it as Accept as Solution and feel free to give a Kudos, it would be much appreciated!
Check out the March 2025 Fabric update to learn about new features.
Explore and share Fabric Notebooks to boost Power BI insights in the new community notebooks gallery.
User | Count |
---|---|
11 | |
4 | |
2 | |
1 | |
1 |
User | Count |
---|---|
6 | |
4 | |
4 | |
3 | |
3 |