Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Don't miss out! 2025 Microsoft Fabric Community Conference, March 31 - April 2, Las Vegas, Nevada. Use code MSCUST for a $150 discount. Prices go up February 11th. Register now.

Reply
pmscorca
Post Patron
Post Patron

Supporting a snapshot dimension for a big data scenario

Hi,

I'd like to know how Fabric supports a snapshot dimension for a big data scenario.

Thanks

4 REPLIES 4
v-cboorla-msft
Community Support
Community Support

Hi @pmscorca 

 

Thanks for uisng Microsoft Fabric Community.

Microsoft Fabric doesn't directly handle snapshot dimensions for big data scenarios in the same way a traditional data warehouse might. However, Fabric offers functionalities that can handle large datasets with techniques for slowly changing dimensions (SCDs).

For additional information please refer the following links :

Link1 : Implementing SCD Type 2 with Microsoft Fabric — PySpark / DataflowGen2

Link2 : How to create Slowly Changing Dimension (SCD)| Create SCD in Microsoft Fabric (youtube.com)

Link3 : How to Implement SCD Type 2 Using Delta Table (iterationinsights.com)

 

I hope this information helps. Please do let us know if you have any further queries.

 

Thank you.

Hi,

to implement a SCD2 in ADF/Synapse Analytics I used T-SQL or a mapping data flow. In Fabric I need to use a Spark notebook or a Dataflow gen2.

I hope that it could exist a template to create a SCD2 by a Spark notebook; moreover, with Dataflow gen2 I cannot iterate respect to such dimensions.

Now, by using partioning by date or month and year or year is it possible to implement a snapshot dimension?

Hi @pmscorca 

 

Apologies for the delay in response.

Yes, Fabric offers two main approaches for implementing Slowly Changing Dimension Type 2 (SCD2), and each has its considerations as follows:

1. Spark Notebook:

Pros:

  • Provides more flexibility and control for complex SCD2 logic, especially with large datasets.
  • Allows for easier iteration and development compared to Dataflow Gen2.
  • You can leverage libraries like PySpark for efficient data manipulation.

Cons:

  • Requires knowledge of PySpark or Scala for development.
  • Might have a steeper learning curve for those unfamiliar with Spark notebooks.

Templates:

2. Dataflow Gen2:

Pros:

  • Offers a user-friendly, visual interface for building data pipelines.
  • Ideal for those comfortable with a GUI-based approach.
  • Integrates well with Power BI for data visualization.

Cons:

  • Currently lacks the ability to iterate over source files with parametrized file names. This can be a limitation for processing multiple dimension tables. You can find a feature request for this functionality here: (mention the specific link for the feature request).
  • Might be less flexible for complex SCD2 logic compared to Spark notebooks.

Implementing Snapshot Dimensions with Partitioning:

Yes, partitioning by date (or a combination of year and month) is a viable approach to achieve a similar effect to snapshot dimensions in Fabric.

  • Create a partitioned table for your dimension data.
  • Each partition represents a specific time period (e.g., daily, monthly, yearly).
  • When a dimension record changes, instead of updating the existing record, insert a new record with the updated information into the corresponding partition for the current date.
  • This way, you maintain a historical record of dimension states along with the actual data.

For additional information please refer : Efficient Data Partitioning with Microsoft Fabric: Best Practices and Implementation Guide - Microso...

 

I hope this information helps.

 

Thank you.

Hi @pmscorca 

 

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others .
If you have any question relating to the current thread, please do let us know and we will try out best to help you.
In case if you have any other question on a different issue, we request you to open a new thread.

 

Thank you.

Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Prices go up Feb. 11th.

JanFabricDE_carousel

Fabric Monthly Update - January 2025

Explore the power of Python Notebooks in Fabric!

JanFabricDW_carousel

Fabric Monthly Update - January 2025

Unlock the latest Fabric Data Warehouse upgrades!

JanFabricDF_carousel

Fabric Monthly Update - January 2025

Take your data replication to the next level with Fabric's latest updates!