Solved: working with large data set, say, 1 trillion rows ...

KailashPareek · ‎11-25-2024

Hi There

We have a delta table of about 1 trillion rows in F8 capacity. The table is partitioned on DATE+HOUR and has only 10 columns of data. All work fine in Spark (both PySpark and Spark SQL). When a simple sql like select * from table where DATE=<date> and HOUR=<hour> in SQL Analytics endpoint, it goes on forever and nothing is returned even in couple of hours.

We expect it should read Parquet files of that date+hour (2-3 files <500 MB and 30 million rows). It seems to be trying to read all instead of using partition info. MS Support is not helpful so far.

The question is are there anyone in the forum using such data and what exactly is their experience on such data volumes with partitioned tables and SQL Analytics endpoint?

The same behaviour in Power BI as well as are abel to connect only as DQ not as Direct lake due to size limitations.

Kailash Pareek

Anonymous · ‎11-27-2024

Hi @KailashPareek,

Have you tied to upgrade the SKU of the capacity? AFAIK, different level has their own configurations/resources and limitations.

Since you are working with Lakehouse semantic model and SQL endpoint with larger number of records. I think the higher level SKU should provide more available resource to processing these records.

What is Power BI Premium? - Power BI | Microsoft Learn

SKU	Max memory (GB)1, 2	Max concurrent DirectQuery connections (per semantic model)1	Max DirectQuery parallelism3	Live connection (per second)1	Max memory per query (GB)1	Model refresh parallelism	Direct Lake rows per table (in millions)1, 4	Max Direct Lake model size on OneLake (GB)1, 4
F2	3	5	1	2	1	1	300	10
F4	3	5	1	2	1	2	300	10
F8	3	10	1	3.75	1	5	300	10
F16	5	10	1	7.5	2	10	300	20
F32	10	10	1	15	5	20	300	40
F64	25	50	8	30	10	40	1,500	Unlimited
F128	50	75	12	60	10	80	3,000	Unlimited
F256	100	100	16	120	10	160	6,000	Unlimited
F512	200	200	20	240	20	320	12,000	Unlimited
F1024	400	200	24	480	40	640	24,000	Unlimited
F2048	400	200		960	40	1,280	24,000	Unlimited

Regards,

Xiaoxin Sheng

View solution in original post

AndyDDC · ‎11-28-2024

Hi @KailashPareek there's a performance guidelines document, hopefully you'll find some insight

SQL analytics endpoint performance considerations - Microsoft Fabric | Microsoft Learn

Anonymous · ‎11-27-2024

Hi @KailashPareek,

Have you tied to upgrade the SKU of the capacity? AFAIK, different level has their own configurations/resources and limitations.

Since you are working with Lakehouse semantic model and SQL endpoint with larger number of records. I think the higher level SKU should provide more available resource to processing these records.

What is Power BI Premium? - Power BI | Microsoft Learn

SKU	Max memory (GB)1, 2	Max concurrent DirectQuery connections (per semantic model)1	Max DirectQuery parallelism3	Live connection (per second)1	Max memory per query (GB)1	Model refresh parallelism	Direct Lake rows per table (in millions)1, 4	Max Direct Lake model size on OneLake (GB)1, 4
F2	3	5	1	2	1	1	300	10
F4	3	5	1	2	1	2	300	10
F8	3	10	1	3.75	1	5	300	10
F16	5	10	1	7.5	2	10	300	20
F32	10	10	1	15	5	20	300	40
F64	25	50	8	30	10	40	1,500	Unlimited
F128	50	75	12	60	10	80	3,000	Unlimited
F256	100	100	16	120	10	160	6,000	Unlimited
F512	200	200	20	240	20	320	12,000	Unlimited
F1024	400	200	24	480	40	640	24,000	Unlimited
F2048	400	200		960	40	1,280	24,000	Unlimited

Regards,

Xiaoxin Sheng

working with large data set, say, 1 trillion rows in Fabric Lakehouse and SQL Analytics endpoint

Helpful resources

Fabric Data Days

Fabric Monthly Update - October 2025

FabCon Atlanta 2026

Fabric Data Days starts November 4th!

working with large data set, say, 1 trillion rows in Fabric Lakehouse and SQL Analytics endpoint

Helpful resources

Fabric Data Days

Fabric Monthly Update - October 2025

FabCon Atlanta 2026