Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Calling all Data Engineers! Fabric Data Engineer (Exam DP-700) live sessions are back! Starting October 16th. Sign up.

Reply
ashcool1006
New Member

Keep history of usage metrics in Power BI and save daily data into Lakehouse table

Hi There,

 

I have craeted a report from semantic model" Usage Report Metrics " out of which i need below columns :

"Report Name", "Section Name"."Unique User", "Latest Date" "Total page views"

ashcool1006_0-1760367895471.png

ashcool1006_1-1760367966309.png

I am currently facing challenges in setting up a Lakehouse table within Microsoft Fabric to support my Usage Metrics Report. The objective is to maintain a repository that retains data for over 90 days, with new data appended automatically on a daily basis.

 

I would like to extract data from specific tables, limited to the columns I have identified, and ensure that this data is ingested directly into the Lakehouse table without relying on Excel or CSV files stored in SharePoint.

 

I believe this can be accomplished using a Notebook in Fabric, but I am unsure of the exact implementation steps. I would greatly appreciate any guidance or support from someone familiar with this process.

Thank you in advance for your assistance.

 

Thanks,

Ashish

@Amit chandak  

1 ACCEPTED SOLUTION
MFelix
Super User
Super User

Hi @ashcool1006 ,

 

If I understood your question properly you need to get the data from your report/semantic model to the lakehouse, this can be done using semantic link labs that allow you to run a dax query on top of your semantic model, you can then save it to a lakehouse table, you can also on that specific notebook have a cell with the retention policy that will delete everything past that point.

 

You can see the examples here:

https://learn.microsoft.com/en-us/fabric/data-science/read-write-power-bi-python

 

See a very small example below:

CELL 1 - Connect to sempy
# Welcome to your new notebook
# Type here in the cell editor to add code!
import sempy.fabric as fabric


CELL 2 - generate your dax query
df_dax = fabric.evaluate_dax("DLake Mulitple Tables",
    """
    EVALUATE SUMMARIZECOLUMNS(
        'Calendar'[Year],
        "Total Rows",
        CALCULATE([Rows (Small Table)]),
        "InsertDate", TODAY())
    """)

CELL 3 - create the table with the dax

# If needed, check what columns you really have:
# print(df_dax.columns)

# Convert pandas -> Spark and cast in one go (no spaces in names)
df_spark = (
    spark.createDataFrame(df_dax)
         .selectExpr(
             "CAST(Year AS INT)        AS Year",
             "CAST(Total_Rows AS BIGINT) AS Total_Rows",
             "CAST(Insert_Date AS DATE)  AS Insert_Date"
         )
)

# Write to Lakehouse (append requires the table to exist with the same schema)
table_name = "dax_total_rows_by_year"

# If the table does NOT exist yet, create it first to lock the schema:
spark.sql(f"""
CREATE TABLE IF NOT EXISTS {table_name} (
  Year INT,
  Total_Rows BIGINT,
  Insert_Date DATE
) USING delta
""")

(df_spark.write
    .format("delta")
    .mode("append")     # use "overwrite" if you want to replace all rows
    .saveAsTable(table_name)
)

CELL 4 - DELETE last 90 days
from pyspark.sql import functions as F

table_name = "dax_total_rows_by_year"

spark.sql(f"""
DELETE FROM {table_name}
WHERE Insert_Date < date_sub(current_date(), 90)
""")

 

Be sure to change the schema and columns on the cell 3 to adjust to your model.


Regards

Miguel Félix


Did I answer your question? Mark my post as a solution!

Proud to be a Super User!

Check out my blog: Power BI em Português



View solution in original post

3 REPLIES 3
MFelix
Super User
Super User

Hi @ashcool1006 ,

 

If I understood your question properly you need to get the data from your report/semantic model to the lakehouse, this can be done using semantic link labs that allow you to run a dax query on top of your semantic model, you can then save it to a lakehouse table, you can also on that specific notebook have a cell with the retention policy that will delete everything past that point.

 

You can see the examples here:

https://learn.microsoft.com/en-us/fabric/data-science/read-write-power-bi-python

 

See a very small example below:

CELL 1 - Connect to sempy
# Welcome to your new notebook
# Type here in the cell editor to add code!
import sempy.fabric as fabric


CELL 2 - generate your dax query
df_dax = fabric.evaluate_dax("DLake Mulitple Tables",
    """
    EVALUATE SUMMARIZECOLUMNS(
        'Calendar'[Year],
        "Total Rows",
        CALCULATE([Rows (Small Table)]),
        "InsertDate", TODAY())
    """)

CELL 3 - create the table with the dax

# If needed, check what columns you really have:
# print(df_dax.columns)

# Convert pandas -> Spark and cast in one go (no spaces in names)
df_spark = (
    spark.createDataFrame(df_dax)
         .selectExpr(
             "CAST(Year AS INT)        AS Year",
             "CAST(Total_Rows AS BIGINT) AS Total_Rows",
             "CAST(Insert_Date AS DATE)  AS Insert_Date"
         )
)

# Write to Lakehouse (append requires the table to exist with the same schema)
table_name = "dax_total_rows_by_year"

# If the table does NOT exist yet, create it first to lock the schema:
spark.sql(f"""
CREATE TABLE IF NOT EXISTS {table_name} (
  Year INT,
  Total_Rows BIGINT,
  Insert_Date DATE
) USING delta
""")

(df_spark.write
    .format("delta")
    .mode("append")     # use "overwrite" if you want to replace all rows
    .saveAsTable(table_name)
)

CELL 4 - DELETE last 90 days
from pyspark.sql import functions as F

table_name = "dax_total_rows_by_year"

spark.sql(f"""
DELETE FROM {table_name}
WHERE Insert_Date < date_sub(current_date(), 90)
""")

 

Be sure to change the schema and columns on the cell 3 to adjust to your model.


Regards

Miguel Félix


Did I answer your question? Mark my post as a solution!

Proud to be a Super User!

Check out my blog: Power BI em Português



GrowthNatives
Continued Contributor
Continued Contributor

Hi @ashcool1006 ,this is a common Fabric use case: keeping Power BI Usage Metrics data persistently beyond 90 days by automating daily snapshots into a Lakehouse table. 
Let's break the process into steps:


PART 1 — Prepare Your Environment

Step 1: Prerequisites

Make sure you have:

  • A Microsoft Fabric-capable workspace

  • Permission to access the dataset
    👉 “Usage Report Metrics” (or “Power BI Premium Usage Metrics Model”)

  • A Lakehouse in that workspace

  • Notebook access in Fabric

Step 2: Create a Lakehouse

  1. Go to your Fabric workspace → New → Lakehouse

  2. Name it something like UsageMetrics_Lakehouse

  3. Once created, note:

    • It has Tables (for Delta tables)

    • It has SQL endpoint (for Power BI connection later)


PART 2 — Create the Fabric Notebook

Step 3: Create a New Notebook

In your workspace:

  • Click + New → Notebook

  • Attach it to your UsageMetrics_Lakehouse

  • Select Language = PySpark

Step 4: Paste and Run the Full Notebook Script

Below is your complete script (all steps merged, with deduplication and 90-day cleanup).

Paste each block into a Notebook cell sequentially 👇


 Cell 1 — Import & Start Log

from pyspark.sql import functions as F
from datetime import datetime

print(f"Notebook run started at: {datetime.now()}")


Cell 2 — Load Power BI Dataset

# Replace with your actual dataset name
df_usage = spark.read.format("powerbi") \
    .option("dataset", "Usage Report Metrics") \
    .load()

print(" Successfully loaded data from Power BI semantic model.")


Cell 3 — Select Required Columns

selected_df = df_usage.select(
    F.col("Report Name"),
    F.col("Section Name"),
    F.col("Unique User"),
    F.col("Latest Date"),
    F.col("Total Page Views")
)

print(" Selected required columns.")


 Cell 4 — Add Snapshot Date

snapshot_df = selected_df.withColumn("SnapshotDate", F.current_date())

print(" Added snapshot date column.")


Cell 5 — Deduplicate (Prevent Duplicates)

dedup_df = snapshot_df.dropDuplicates(
    ["Report Name", "Section Name", "Latest Date", "SnapshotDate"]
)

print(f" Deduplicated data. Row count: {dedup_df.count()}")


Cell 6 — Append or Merge to Lakehouse Delta Table

This will append data daily and ensure older duplicates are replaced:

from delta.tables import DeltaTable

table_name = "UsageMetricsHistory"

# Check if the table already exists
if spark.catalog.tableExists(table_name):
    deltaTable = DeltaTable.forName(spark, table_name)
    (
        deltaTable.alias("t")
        .merge(
            dedup_df.alias("s"),
            "t.[Report Name] = s.[Report Name] AND \
             t.[Section Name] = s.[Section Name] AND \
             t.[Latest Date] = s.[Latest Date] AND \
             t.[SnapshotDate] = s.[SnapshotDate]"
        )
        .whenMatchedUpdateAll()
        .whenNotMatchedInsertAll()
        .execute()
    )
    print("🔁 Existing table found — data merged successfully (UPSERT).")
else:
    dedup_df.write.format("delta").mode("overwrite").saveAsTable(table_name)
    print("🆕 Table created and data loaded successfully.")


Cell 7 — Delete Data Older Than 90 Days

spark.sql("""
DELETE FROM UsageMetricsHistory 
WHERE SnapshotDate < dateadd(day, -90, current_date())
""")

print("🧹 Cleaned data older than 90 days.")


Cell 8 — Verification

display(spark.sql("SELECT COUNT(*) as total_rows, MAX(SnapshotDate) as latest_load FROM UsageMetricsHistory"))

Step 5: Schedule Notebook to Run Daily

  1. Click the clock icon (Schedule) on top of the Notebook.

  2. Set:

    • Recurrence: Daily

    • Time: e.g. 2:00 AM

    • Attached Lakehouse: UsageMetrics_Lakehouse

  3. Save.

Now this notebook will automatically:

  • Pull latest Usage Metrics

  • Append to UsageMetricsHistory Delta table

  • Remove data older than 90 days


PART 3 — Connect in Power BI Desktop

Step 6: Connect Power BI Desktop to Fabric Lakehouse

In Power BI Desktop:

  1. Go to Home → Get Data → OneLake Data Hub

  2. Find your workspace → select your Lakehouse

  3. Click SQL Endpoint

  4. Choose the table UsageMetricsHistory

You can also use DirectQuery or Import mode.


Step 7: Create a Simple Model & Report

Now you have daily historical usage data!

Hope this solution helps you make the most of Power BI! If it did, click 'Mark as Solution' to help others find the right answers.
💡Found it helpful? Show some love with kudos 👍 as your support keeps our community thriving!
🚀Let’s keep building smarter, data-driven solutions together!🚀 [Explore More]

Hi @GrowthNatives ,

Thank you so much for your brief solution and i appreicte your time.

I am getting stuck on Part 2 where i need to load the data from dataset into notebook.

I need Usage Metrics Dataset however i am not able to see anything in my model .

 

Below are the reports and dataset names  and when i replaced "Usage Metrics Report" with other data set as well i am getting below error :

ashcool1006_3-1760431207975.png

 

ashcool1006_0-1760432567597.png

 

 

 

 

ashcool1006_2-1760431042593.png

 

Request if you can help me in solving this error?

Also i do not want to delete any older data it should remain there in the lakehouse and daily new data will append in this .

 

Coul;d please help me in creating that kind of notebook?

 

Thanks,

Ashish

Helpful resources

Announcements
FabCon Global Hackathon Carousel

FabCon Global Hackathon

Join the Fabric FabCon Global Hackathon—running virtually through Nov 3. Open to all skill levels. $10,000 in prizes!

October Power BI Update Carousel

Power BI Monthly Update - October 2025

Check out the October 2025 Power BI update to learn about new features.

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.