Snapshot data from Power BI Report automatically a...

zenisekd · ‎10-24-2024

Foreword

For some time I was dreaming of automatically saving calculated data on a regular basis from my report and using it to show the history trend. My use-case was that every quarter a commissions are paid to the employees per their performance. The problem is that the numbers may change over time, as some confirmed orders get canceled, so at the time of the compensation calculation, the sale is included, while a week later, when sale is canceled, it is excluded, which can change the compensation amount and thus causing discussion, why the number, based on which the commission was paid has changed. That is why we wanted to snapshot the amounts so we could reliably say, what amount was paid out to the employees over time.

Originally such snapshots could be done either with Power Automate or a third-party app. Nowadays it can be achieved also with Notebooks in Microsoft Fabric. As I was searching for the tutorial, I could not find any that would work for me right away, so I decided to write this blog post and summarize what I have done so you could set it up yourself quickly.

Requirements

Power BI PRO licence and higher
Power BI report published to service
Fabric capacity (because we will be using notebooks)

Steps in short

Create a Notebook
Add a semantic-link library to the environment
Add a Lakehouse
Use Semantic-link
Select your dataset
With Dax, specify the data you want to snapshot
Save the data to Lakehouse table
Schedule a refresh
Connect to the Lakehouse

More detailed steps (that worked for me)

1. Create a Notebook. Fairly simple, just click on "new item" and select "Notebook".

2. Set up the environment. In order to work with the semantic link. You need to import/install the semantic-link library to the environment, either on the workspace level or on the notebook level.

When you start writing the name "semantic-link", it doesn't appear in the helper. Don't worry about it and just write the full library name there. It will stick.

3. Add a lakehouse to the notebook. Either take the existing one or create a new one.

4. Use Semantic Link. Next we import the semantic link, pandas and initialize spark session.

#%pip install semantic-link
import sempy.fabric as fabric
import pandas as pd
from pyspark.sql import SparkSession
# Initialize Spark session
spark = SparkSession.builder.getOrCreate()

5. Access the dataset you want to use for the snapshot through code./p>

mydataset = "Management Report"

6. Write DAX to retrieve the data you want to snapshot.
In my case, I wanted to do a quarterly snapshot, which would save the employee, his compensation (commission), Quarter, Year and a timestamp of when the snapshot was done.
Next, I saved the dax to a data frame and renamed the columns to a more suitable format.

myDax = """
EVALUATE
	ADDCOLUMNS(
		SUMMARIZECOLUMNS(
			'Compensation User-Role'[User],
			'Compensation Roles'[Category],
			FILTER(
				'Date table',
				'Date table'[Qarter numbr] = QUARTER(TODAY() - 30) && 'Date table'[Year] = YEAR(TODAY() - 30)
			),
			"Compensation", [Total Compensation (All Curr)]
		),
		"Year", YEAR(TODAY() - 30),
		"Quarter", "Q" & QUARTER(TODAY() - 30),
		"Time Stamp", NOW() + TIME(1,0,0)
	)
"""    
result = fabric.evaluate_dax(mydataset, myDax)

# Convert the DAX query result to a DataFrame
df = pd.DataFrame(result)

# Rename columns if needed
df.columns = ['User', 'Category', 'Compensation', 'Year', 'Quarter', 'TimeStamp']
df

For some reason, I had troubles with using the magic %%, so instead I used """. Also I recommend to write and test the Dax query first in Power BI, and see if it works and how it behaves, then copy-paste it to Notebook.

7. Save the data (frame) to the lakehouse - I used append mode (so the snapshotted tables stac on top of each other.)

spark.createDataFrame(df).write.mode("append").format("delta").saveAsTable("Appache_spark_test.Compensations_Snapshot_test")

Appache_spark_test - is the name of the Lakehouse

Compensations_Snapshot_test - is the name of the table, under which you want the data to be saved (no need for creating the table ahead).

8. Set a schedule in the notebook.

You can use the native schedule in a notebook.

Or use a Pipeline with a more complex refresh process, e.g. if you want to do the snapshot e.g. monthly/quarterly and under some conditions.

9. Use the Lakehouse as a source for your report. In my case, I connected to the SQL endpoint.

Final Word

And this is it. This worked for me. Let me know, if you have any improvements or comments. It took me a while to get it working and it certainly is not perfect 🙂

Sources:

https://learn.microsoft.com/en-us/fabric/data-science/semantic-link-power-bi?tabs=python
https://medium.com/@sdaelemans/store-your-data-in-power-bi-to-onelake-with-semantic-link-in-microsof...
https://www.youtube.com/watch?v=zMiRGZsfQgs&t=2s&ab_channel=GuyinaCube
https://www.youtube.com/watch?v=VAHOl9KqSsk&ab_channel=HoosierBI

https://community.fabric.microsoft.com/t5/Service/Snapshot-data-from-Power-BI-Report-to-Lakehouse/m-...