Join us at FabCon Atlanta from March 16 - 20, 2026, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.
Register now!Calling all Data Engineers! Fabric Data Engineer (Exam DP-700) live sessions are back! Starting October 16th. Sign up.
Hi community,
Currently I'm working with an F32 based in Europe.
I have a notebook in which I am forced to use pyspark because I have to get tables from my lakehouse - if not I would totally use python instead of pyspark because my notebook runs a code with not too many transformations and the tables are small.
I am not a big fan of pyspark because of the time it takes to create the spark session - it takes at least 5 minutes and I need to run the code fast.
My question is, is there any specific configuration to make in my spark enviroment in order to create the session faster?
Thanks community!
Solved! Go to Solution.
Hi @lsabetta ,
The issue happens because your Lakehouse table is stored in Delta format, not plain Parquet.
When you overwrite the table, new Parquet files are created, but old ones remain in the folder for versioning.
If you read the folder directly as Parquet, it loads both the old and new files - that’s why you see duplicate or outdated records.
To fix this, make sure you read the table as a Delta table instead of as raw Parquet.
This ensures that only the latest valid version of the data is returned, without mixing older files.
Thank you.
Hi @lsabetta ,
The issue happens because your Lakehouse table is stored in Delta format, not plain Parquet.
When you overwrite the table, new Parquet files are created, but old ones remain in the folder for versioning.
If you read the folder directly as Parquet, it loads both the old and new files - that’s why you see duplicate or outdated records.
To fix this, make sure you read the table as a Delta table instead of as raw Parquet.
This ensures that only the latest valid version of the data is returned, without mixing older files.
Thank you.
Hi @lsabetta ,
Thank you @BhaveshPatel @BalajiL @anilgavhane for the prompt response.
I wanted to check if you had the opportunity to review the information provided and resolve the issue..?Please let us know if you need any further assistance.We are happy to help.
Thank you.
@lsabetta : If concern is spark takes longer time for spin then use starter pool, it will be up within few seconds and then you can query it.
Hi @lsabetta
You should use Power BI Dataflow Gen 2.
This is how it works :
Either use Notebooks ( Python ( Pandas) + Data Lake + Delta Lake ) or Use Power BI Dataflow Gen 2 ( UI + UX).
Yes You can read the tables in Lakehouse using Python Pandas library. ( pd.read_tables)
Hi @BhaveshPatel ,
Could you please give me an example of how to read the tables in a Lakehouse only with python?
Hi Isabella,
This is how it works using Pandas:
The problem that has reading the parquet is that if I overwrite the table, the parquet brings me the new records and the old ones.
Yes, you can read tables from a Microsoft Fabric Lakehouse without creating a Spark session, by using Lakehouse shortcuts and native connectors in Python. Here are a few options:
🔹 1. Use the fabric Python SDK (Preview)
Microsoft Fabric offers a Python SDK that allows you to interact with Lakehouse tables directly from a notebook using Pandas, without spinning up Spark.
from fabric import LakehouseClient client = LakehouseClient(workspace_id="your_workspace_id", lakehouse_id="your_lakehouse_id") df = client.read_table("your_table_name")
This avoids Spark entirely and loads the table as a Pandas DataFrame.
🔹 2. Use REST APIs or ODBC/JDBC Connectors
You can access Lakehouse tables via:
These methods allow you to query structured data using Python libraries like pyodbc, sqlalchemy, or pandas.read_sql().
🔹 3. Export Lakehouse Tables to Files
If your Lakehouse tables are stored as Delta or Parquet files:
import pandas as pd df = pd.read_parquet("https://yourlakehouseurl/path/to/table")
This bypasses Spark and loads the data directly into memory.
Hi @lsabetta ,
Thank you for reaching out to Microsoft Fabric Community.
Thank you @pallavi_r for the prompt response.
The slow startup isn’t from your code or data size - it’s Spark cluster spin-up, which always takes a few minutes.
There’s no Spark config that makes session creation instant.
Below are few Options:
If tables are small, skip Spark and load them directly into Pandas/SQL (faster, no cluster).Keep the Spark session alive instead of restarting often.Ask your admin if a smaller/faster Spark pool is available.Use Spark only when you need distributed compute; otherwise stick with Python. you can’t make Spark spin up faster, but you can avoid Spark altogether or keep the session warm.
HI @v-venuppu ,
Thanks for your answer.
My notebooks are written in Pandas because my tables are small and transformations are simple. I could use Python instead of Pyspark but the thing is that I need to read tables from my lakehouse.
Is there any way to read tables from a lakehouse without creating a sparksession?
Hi @lsabetta ,
Here are couple of tips shared to reduce spark session by keeping the high concurrency mode enabled, detaching it from one notebook session, so it frees up for other sessions and keep it alive.
Thanks,
Pallavi
Join the Fabric FabCon Global Hackathon—running virtually through Nov 3. Open to all skill levels. $10,000 in prizes!
Check out the September 2025 Fabric update to learn about new features.
User | Count |
---|---|
16 | |
4 | |
4 | |
3 | |
2 |