Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Learn from the best! Meet the four finalists headed to the FINALS of the Power BI Dataviz World Championships! Register now

Reply
mkj1213
Advocate I
Advocate I

1 Trillion rows table in Fabric and Power BI.

In the article querying-one-trillion-rows-of-data-with-powerbi-and-azure-databricks written by Kyle Hale, he had tested combining azure databricks with Power BI sucessfully. 

We followed the same steps for a table with around 1 Trillion rows:

  • The table was created in a lakehouse by the following code:
from delta.tables import DeltaTable
DeltaTable.create(spark)\
    .tableName("FACT_TABLE")\
    .addColumn("ID", "STRING")\
    .addColumn("TIC", "STRING")\
    .addColumn("UPDATE_TIMESTAMP", "TIMESTAMP")\
    .addColumn("SRC", "STRING")\
    .addColumn("PRICE", "DOUBLE")\
    .addColumn("QUANTITY", "DOUBLE")\
    .addColumn("TYPE", "STRING")\
    .addColumn("DATE", "DATE", generatedAlwaysAs="CAST(UPDATE_TIMESTAMP AS DATE)")\
    .addColumn("HOUR", "INT", generatedAlwaysAs="HOUR(UPDATE_TIMESTAMP)")\
    .addColumn("MINUTE", "INT", generatedAlwaysAs="MINUTE(UPDATE_TIMESTAMP)")\
    .addColumn("SECOND", "INT", generatedAlwaysAs="SECOND(UPDATE_TIMESTAMP)")\
    .partitionedBy("DATE", "HOUR")\
    .execute()
  • Then the table was populated by writing pyspark dataframe from a notebook.

In power bi we used Direct Query with Dynamic M parameters (Date, Hour and Minute) to try to limit the number of records being fetched from a query. For one hour we have around 30 million rows or around 0.5 million per minute when we are counting using:

df = spark.sql("SELECT * FROM LH.FACT_TABLE where date  = '2023-06-10' and HOUR = 11 and minute = 12")
df.count()

from a notebook. The result is returned very quickly (less than 4 minutes).
When trying to run the same sql query from the sql endpoint of the LH, the result is never returned even after waiting for several hours. As if the SQL Query is not doing any partition pruning.

As we had chosen to connect the sql endpoint in power bi, we are unable to report on that table. 

 

Is there anyone out there dealing with 1 Trillion row tables in power bi? If yes, were you able to report on the table, or are you facing the same problem as mine. 

Thanks

 

2 REPLIES 2
Anonymous
Not applicable

Hi @mkj1213 ,

 

Thanks for reaching out to our community.

You could check partition pruning. Please ensure that the SQL endpoint is correctly configured and supports partition pruning. Try explicitly specifying partition columns in your SQL query to ensure the query can leverage partitions.

Also consider using aggregate queries or precomputed views to reduce the amount of data being queried. For example, create daily or hourly aggregate tables and query these aggregates in Power BI.

Hope it helps.

 

Best Regards,

Stephen Tao

 

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

 

Thanks for your reply Stephen,

1- How am i supposed to check if the sql endpoint is correctly configured and supports partition pruning. We are using the sql endpoint that is provided by Fabric when i am creating a lake house. 
2-We had already tried to query only using the partition columns, and i am not getting data at all.

3-We had already created daily and hourly aggregate tables.

Thanks in Advance

Helpful resources

Announcements
Join our Fabric User Panel

Join our Fabric User Panel

Share feedback directly with Fabric product managers, participate in targeted research studies and influence the Fabric roadmap.

February Power BI Update Carousel

Power BI Monthly Update - February 2026

Check out the February 2026 Power BI update to learn about new features.