Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more. Get started

Reply
relitownsend
Regular Visitor

Creating a slowly changing dimension table off of noncontiguous data with Power Query

Hello all,

 

Was hoping I could get someone to point me in the right direction - I'm trying to develop some M that, given a couple dozen excel files (each containing a snapshot of the list of employees for a given month), will be able to build a slowly changing dimension employees table.

 

I have two problems I'm trying to wrap my head around with generating correct Effective Dates for these Type 2 records.

  • Noncontigous data
    • Essentially I have a pretty good history of recent versions, but older versions are sparse, as you can see by the months of my snapshots below. My current expectation is that I'll duplicate snapshots to fill in the gaps, but I'm hoping there's a smarter way to do it.
    • relitownsend_0-1600785912842.png

       

  • Overlapping Effective Dates for Versions
    • This is the tougher one - I can't use 'Group By' with a MIN/MAX on the snapshot date to determine the record version's Effective Date, because I can't guarantee that the effective date periods of record versions don't overlap, as shown below.
    • relitownsend_1-1600786520788.png

I've done googling, but everything I've found so far is about implementing SCDs in a Power BI data model, not about how we create the SCD table when there are problems like this. Anyone else see this?

 

Edit: This would be the goal

relitownsend_0-1600787291311.png

 

1 ACCEPTED SOLUTION
relitownsend
Regular Visitor

Following up - I was able to find a partial solution - the GroupKind.Local parameter for the Group By function, along with a couple sorts, solved for the Overlapping Effective Dates.

 

https://blog.crossjoin.co.uk/2014/01/03/aggregating-by-local-groups-in-power-query/

 

Still working on how to extend the max/min dates to cover across any gaps in the data, but I can ways just copy data for missing months to fill in gaps if that solution proves illusive.

View solution in original post

2 REPLIES 2
relitownsend
Regular Visitor

Following up - I was able to find a partial solution - the GroupKind.Local parameter for the Group By function, along with a couple sorts, solved for the Overlapping Effective Dates.

 

https://blog.crossjoin.co.uk/2014/01/03/aggregating-by-local-groups-in-power-query/

 

Still working on how to extend the max/min dates to cover across any gaps in the data, but I can ways just copy data for missing months to fill in gaps if that solution proves illusive.

@relitownsend 

 

As regards the gaps in you dates, I suggest you create date table (with continuous dates ranging from the min to max date in the dataset) as a dimension instead. That way you don't need to fill in the gaps; just use the Date Table fields in measures, visuals, filters, slicers...It is a best practice anyway

 

To create a Date Table, use something along the lines of this M-code (just change the expression 'Table1[Date]' in the first two lines of the code to your 'fact table [date field]'):

 

 

 

let
    MinDataDate = List.Min(Table1[Date]),
    MaxDataDate = List.Max(Table1[Date]),
    #"MaxDataDate1" = Date.AddDays(MaxDataDate, 1),
    DayCount = Duration.Days(Duration.From(MaxDataDate1 - MinDataDate)), 
    Source = List.Dates(MinDataDate,DayCount,#duration(1,0,0,0)),
    TableFromList = Table.FromList(Source, Splitter.SplitByNothing()),
    ChangedType = Table.TransformColumnTypes(TableFromList,{{"Column1", type date}}),
    RenamedColumns = Table.RenameColumns(ChangedType,{{"Column1", "Date"}}),
    #"Inserted Year" = Table.AddColumn(RenamedColumns, "Year", each Date.Year([Date]), 
    Int64.Type),
    #"Inserted Month" = Table.AddColumn(#"Inserted Year", "Month", each Date.Month([Date]), 
    Int64.Type),
    #"Inserted Month Name" = Table.AddColumn(#"Inserted Month", "Month Name", each 
    Date.MonthName([Date], "en-US"), type text), 
    #"Inserted Quarter" = Table.AddColumn(#"Inserted Month Name", "Quarter", each 
    Date.QuarterOfYear([Date]), Int64.Type),
    #"Inserted Day of Week" = Table.AddColumn(#"Inserted Quarter", "Day of Week", each 
    Date.DayOfWeek([Date],1), Int64.Type),
    #"Inserted Day Name" = Table.AddColumn(#"Inserted Day of Week", "Day Name", each 
    Date.DayOfWeekName([Date], "en-US"), type text), 
    #"Inserted Week of Year" = Table.AddColumn(#"Inserted Day Name", "Week of Year", each 
    Date.WeekOfYear([Date]), Int64.Type),
    #"Sorted Rows" = Table.Sort(#"Inserted Week of Year",{{"Date", Order.Descending}}),
    #"Added Index" = Table.AddIndexColumn(#"Sorted Rows", "Index", 0, 1),
    #"Renamed Columns" = Table.RenameColumns(#"Added Index",{{"Index", "Days from Today"}}),
    #"Added Custom" = Table.AddColumn(#"Renamed Columns", "Custom", each [Year]*100 + 
    [Month]),
    #"Renamed Columns1" = Table.RenameColumns(#"Added Custom",{{"Custom", "YearMonth"}}),
    #"Changed Type" = Table.TransformColumnTypes(#"Renamed Columns1",{{"YearMonth", 
    Int64.Type}})
    
in
    #"Changed Type"

 

 

 

 





Did I answer your question? Mark my post as a solution!
In doing so, you are also helping me. Thank you!

Proud to be a Super User!
Paul on Linkedin.






Helpful resources

Announcements
Europe Fabric Conference

Europe’s largest Microsoft Fabric Community Conference

Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.

Power BI Carousel June 2024

Power BI Monthly Update - June 2024

Check out the June 2024 Power BI update to learn about new features.

RTI Forums Carousel3

New forum boards available in Real-Time Intelligence.

Ask questions in Eventhouse and KQL, Eventstream, and Reflex.