The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
Hi There
We have a delta table of about 1 trillion rows in F8 capacity. The table is partitioned on DATE+HOUR and has only 10 columns of data. All work fine in Spark (both PySpark and Spark SQL). When a simple sql like select * from table where DATE=<date> and HOUR=<hour> in SQL Analytics endpoint, it goes on forever and nothing is returned even in couple of hours.
We expect it should read Parquet files of that date+hour (2-3 files <500 MB and 30 million rows). It seems to be trying to read all instead of using partition info. MS Support is not helpful so far.
The question is are there anyone in the forum using such data and what exactly is their experience on such data volumes with partitioned tables and SQL Analytics endpoint?
The same behaviour in Power BI as well as are abel to connect only as DQ not as Direct lake due to size limitations.
Kailash Pareek
Solved! Go to Solution.
Hi @KailashPareek,
Have you tied to upgrade the SKU of the capacity? AFAIK, different level has their own configurations/resources and limitations.
Since you are working with Lakehouse semantic model and SQL endpoint with larger number of records. I think the higher level SKU should provide more available resource to processing these records.
What is Power BI Premium? - Power BI | Microsoft Learn
SKU | Max memory (GB)1, 2 | Max concurrent DirectQuery connections (per semantic model)1 | Max DirectQuery parallelism3 | Live connection (per second)1 | Max memory per query (GB)1 | Model refresh parallelism | Direct Lake rows per table (in millions)1, 4 | Max Direct Lake model size on OneLake (GB)1, 4 |
F2 | 3 | 5 | 1 | 2 | 1 | 1 | 300 | 10 |
F4 | 3 | 5 | 1 | 2 | 1 | 2 | 300 | 10 |
F8 | 3 | 10 | 1 | 3.75 | 1 | 5 | 300 | 10 |
F16 | 5 | 10 | 1 | 7.5 | 2 | 10 | 300 | 20 |
F32 | 10 | 10 | 1 | 15 | 5 | 20 | 300 | 40 |
F64 | 25 | 50 | 8 | 30 | 10 | 40 | 1,500 | Unlimited |
F128 | 50 | 75 | 12 | 60 | 10 | 80 | 3,000 | Unlimited |
F256 | 100 | 100 | 16 | 120 | 10 | 160 | 6,000 | Unlimited |
F512 | 200 | 200 | 20 | 240 | 20 | 320 | 12,000 | Unlimited |
F1024 | 400 | 200 | 24 | 480 | 40 | 640 | 24,000 | Unlimited |
F2048 | 400 | 200 | 960 | 40 | 1,280 | 24,000 | Unlimited |
Regards,
Xiaoxin Sheng
Hi @KailashPareek there's a performance guidelines document, hopefully you'll find some insight
SQL analytics endpoint performance considerations - Microsoft Fabric | Microsoft Learn
Hi @KailashPareek,
Have you tied to upgrade the SKU of the capacity? AFAIK, different level has their own configurations/resources and limitations.
Since you are working with Lakehouse semantic model and SQL endpoint with larger number of records. I think the higher level SKU should provide more available resource to processing these records.
What is Power BI Premium? - Power BI | Microsoft Learn
SKU | Max memory (GB)1, 2 | Max concurrent DirectQuery connections (per semantic model)1 | Max DirectQuery parallelism3 | Live connection (per second)1 | Max memory per query (GB)1 | Model refresh parallelism | Direct Lake rows per table (in millions)1, 4 | Max Direct Lake model size on OneLake (GB)1, 4 |
F2 | 3 | 5 | 1 | 2 | 1 | 1 | 300 | 10 |
F4 | 3 | 5 | 1 | 2 | 1 | 2 | 300 | 10 |
F8 | 3 | 10 | 1 | 3.75 | 1 | 5 | 300 | 10 |
F16 | 5 | 10 | 1 | 7.5 | 2 | 10 | 300 | 20 |
F32 | 10 | 10 | 1 | 15 | 5 | 20 | 300 | 40 |
F64 | 25 | 50 | 8 | 30 | 10 | 40 | 1,500 | Unlimited |
F128 | 50 | 75 | 12 | 60 | 10 | 80 | 3,000 | Unlimited |
F256 | 100 | 100 | 16 | 120 | 10 | 160 | 6,000 | Unlimited |
F512 | 200 | 200 | 20 | 240 | 20 | 320 | 12,000 | Unlimited |
F1024 | 400 | 200 | 24 | 480 | 40 | 640 | 24,000 | Unlimited |
F2048 | 400 | 200 | 960 | 40 | 1,280 | 24,000 | Unlimited |
Regards,
Xiaoxin Sheng