Join us at FabCon Atlanta from March 16 - 20, 2026, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.
Register now!Calling all Data Engineers! Fabric Data Engineer (Exam DP-700) live sessions are back! Starting October 16th. Sign up.
Hi There,
I have craeted a report from semantic model" Usage Report Metrics " out of which i need below columns :
"Report Name", "Section Name"."Unique User", "Latest Date" "Total page views"
I am currently facing challenges in setting up a Lakehouse table within Microsoft Fabric to support my Usage Metrics Report. The objective is to maintain a repository that retains data for over 90 days, with new data appended automatically on a daily basis.
I would like to extract data from specific tables, limited to the columns I have identified, and ensure that this data is ingested directly into the Lakehouse table without relying on Excel or CSV files stored in SharePoint.
I believe this can be accomplished using a Notebook in Fabric, but I am unsure of the exact implementation steps. I would greatly appreciate any guidance or support from someone familiar with this process.
Thank you in advance for your assistance.
Thanks,
Ashish
@Amit chandak
Solved! Go to Solution.
Hi @ashcool1006 ,
If I understood your question properly you need to get the data from your report/semantic model to the lakehouse, this can be done using semantic link labs that allow you to run a dax query on top of your semantic model, you can then save it to a lakehouse table, you can also on that specific notebook have a cell with the retention policy that will delete everything past that point.
You can see the examples here:
https://learn.microsoft.com/en-us/fabric/data-science/read-write-power-bi-python
See a very small example below:
CELL 1 - Connect to sempy
# Welcome to your new notebook
# Type here in the cell editor to add code!
import sempy.fabric as fabric
CELL 2 - generate your dax query
df_dax = fabric.evaluate_dax("DLake Mulitple Tables",
"""
EVALUATE SUMMARIZECOLUMNS(
'Calendar'[Year],
"Total Rows",
CALCULATE([Rows (Small Table)]),
"InsertDate", TODAY())
""")
CELL 3 - create the table with the dax
# If needed, check what columns you really have:
# print(df_dax.columns)
# Convert pandas -> Spark and cast in one go (no spaces in names)
df_spark = (
spark.createDataFrame(df_dax)
.selectExpr(
"CAST(Year AS INT) AS Year",
"CAST(Total_Rows AS BIGINT) AS Total_Rows",
"CAST(Insert_Date AS DATE) AS Insert_Date"
)
)
# Write to Lakehouse (append requires the table to exist with the same schema)
table_name = "dax_total_rows_by_year"
# If the table does NOT exist yet, create it first to lock the schema:
spark.sql(f"""
CREATE TABLE IF NOT EXISTS {table_name} (
Year INT,
Total_Rows BIGINT,
Insert_Date DATE
) USING delta
""")
(df_spark.write
.format("delta")
.mode("append") # use "overwrite" if you want to replace all rows
.saveAsTable(table_name)
)
CELL 4 - DELETE last 90 days
from pyspark.sql import functions as F
table_name = "dax_total_rows_by_year"
spark.sql(f"""
DELETE FROM {table_name}
WHERE Insert_Date < date_sub(current_date(), 90)
""")
Be sure to change the schema and columns on the cell 3 to adjust to your model.
Regards
Miguel Félix
Proud to be a Super User!
Check out my blog: Power BI em PortuguêsHi @ashcool1006 ,
If I understood your question properly you need to get the data from your report/semantic model to the lakehouse, this can be done using semantic link labs that allow you to run a dax query on top of your semantic model, you can then save it to a lakehouse table, you can also on that specific notebook have a cell with the retention policy that will delete everything past that point.
You can see the examples here:
https://learn.microsoft.com/en-us/fabric/data-science/read-write-power-bi-python
See a very small example below:
CELL 1 - Connect to sempy
# Welcome to your new notebook
# Type here in the cell editor to add code!
import sempy.fabric as fabric
CELL 2 - generate your dax query
df_dax = fabric.evaluate_dax("DLake Mulitple Tables",
"""
EVALUATE SUMMARIZECOLUMNS(
'Calendar'[Year],
"Total Rows",
CALCULATE([Rows (Small Table)]),
"InsertDate", TODAY())
""")
CELL 3 - create the table with the dax
# If needed, check what columns you really have:
# print(df_dax.columns)
# Convert pandas -> Spark and cast in one go (no spaces in names)
df_spark = (
spark.createDataFrame(df_dax)
.selectExpr(
"CAST(Year AS INT) AS Year",
"CAST(Total_Rows AS BIGINT) AS Total_Rows",
"CAST(Insert_Date AS DATE) AS Insert_Date"
)
)
# Write to Lakehouse (append requires the table to exist with the same schema)
table_name = "dax_total_rows_by_year"
# If the table does NOT exist yet, create it first to lock the schema:
spark.sql(f"""
CREATE TABLE IF NOT EXISTS {table_name} (
Year INT,
Total_Rows BIGINT,
Insert_Date DATE
) USING delta
""")
(df_spark.write
.format("delta")
.mode("append") # use "overwrite" if you want to replace all rows
.saveAsTable(table_name)
)
CELL 4 - DELETE last 90 days
from pyspark.sql import functions as F
table_name = "dax_total_rows_by_year"
spark.sql(f"""
DELETE FROM {table_name}
WHERE Insert_Date < date_sub(current_date(), 90)
""")
Be sure to change the schema and columns on the cell 3 to adjust to your model.
Regards
Miguel Félix
Proud to be a Super User!
Check out my blog: Power BI em PortuguêsHi @ashcool1006 ,this is a common Fabric use case: keeping Power BI Usage Metrics data persistently beyond 90 days by automating daily snapshots into a Lakehouse table.
Let's break the process into steps:
Make sure you have:
A Microsoft Fabric-capable workspace
Permission to access the dataset
👉 “Usage Report Metrics” (or “Power BI Premium Usage Metrics Model”)
A Lakehouse in that workspace
Notebook access in Fabric
Go to your Fabric workspace → New → Lakehouse
Name it something like UsageMetrics_Lakehouse
Once created, note:
It has Tables (for Delta tables)
It has SQL endpoint (for Power BI connection later)
In your workspace:
Click + New → Notebook
Attach it to your UsageMetrics_Lakehouse
Select Language = PySpark
Below is your complete script (all steps merged, with deduplication and 90-day cleanup).
Paste each block into a Notebook cell sequentially 👇
from pyspark.sql import functions as F from datetime import datetime print(f"Notebook run started at: {datetime.now()}")
# Replace with your actual dataset name
df_usage = spark.read.format("powerbi") \
.option("dataset", "Usage Report Metrics") \
.load()
print("✅ Successfully loaded data from Power BI semantic model.")
selected_df = df_usage.select(
F.col("Report Name"),
F.col("Section Name"),
F.col("Unique User"),
F.col("Latest Date"),
F.col("Total Page Views")
)
print("✅ Selected required columns.")
snapshot_df = selected_df.withColumn("SnapshotDate", F.current_date())
print("✅ Added snapshot date column.")
dedup_df = snapshot_df.dropDuplicates(
["Report Name", "Section Name", "Latest Date", "SnapshotDate"]
)
print(f"✅ Deduplicated data. Row count: {dedup_df.count()}")
This will append data daily and ensure older duplicates are replaced:
from delta.tables import DeltaTable table_name = "UsageMetricsHistory" # Check if the table already exists if spark.catalog.tableExists(table_name): deltaTable = DeltaTable.forName(spark, table_name) ( deltaTable.alias("t") .merge( dedup_df.alias("s"), "t.[Report Name] = s.[Report Name] AND \ t.[Section Name] = s.[Section Name] AND \ t.[Latest Date] = s.[Latest Date] AND \ t.[SnapshotDate] = s.[SnapshotDate]" ) .whenMatchedUpdateAll() .whenNotMatchedInsertAll() .execute() ) print("🔁 Existing table found — data merged successfully (UPSERT).") else: dedup_df.write.format("delta").mode("overwrite").saveAsTable(table_name) print("🆕 Table created and data loaded successfully.")
spark.sql(""" DELETE FROM UsageMetricsHistory WHERE SnapshotDate < dateadd(day, -90, current_date()) """) print("🧹 Cleaned data older than 90 days.")
display(spark.sql("SELECT COUNT(*) as total_rows, MAX(SnapshotDate) as latest_load FROM UsageMetricsHistory"))
Click the clock icon (Schedule) on top of the Notebook.
Set:
Recurrence: Daily
Time: e.g. 2:00 AM
Attached Lakehouse: UsageMetrics_Lakehouse
Save.
Now this notebook will automatically:
Pull latest Usage Metrics
Append to UsageMetricsHistory Delta table
Remove data older than 90 days
In Power BI Desktop:
Go to Home → Get Data → OneLake Data Hub
Find your workspace → select your Lakehouse
Click SQL Endpoint
Choose the table UsageMetricsHistory
You can also use DirectQuery or Import mode.
Now you have daily historical usage data!
Hi @GrowthNatives ,
Thank you so much for your brief solution and i appreicte your time.
I am getting stuck on Part 2 where i need to load the data from dataset into notebook.
I need Usage Metrics Dataset however i am not able to see anything in my model .
Below are the reports and dataset names and when i replaced "Usage Metrics Report" with other data set as well i am getting below error :
Request if you can help me in solving this error?
Also i do not want to delete any older data it should remain there in the lakehouse and daily new data will append in this .
Coul;d please help me in creating that kind of notebook?
Thanks,
Ashish
Join the Fabric FabCon Global Hackathon—running virtually through Nov 3. Open to all skill levels. $10,000 in prizes!
Check out the October 2025 Power BI update to learn about new features.