cancel
Showing results for 
Search instead for 
Did you mean: 

Fabric is Generally Available. Browse Fabric Presentations. Work towards your Fabric certification with the Cloud Skills Challenge.

Reply
Gryph87
New Member

Max Date Values

I am trying to access the records for the most current date on a very large table.

Below is a example of the data I am working with.  The "System Number" and "Threshold Type" can have multiple "Date From"  I need the most current.

New to Power BI , any help would be greatly appreciated

 

Date FromGGGZThreshold TypeTarget Avg FactorTarget Min FactorSystem DescriptionSystem Number
2020-10-01ABCUNITYLEVEL 10.2580.241UNITY MAXABC-1234
2020-10-01ABCUNITYLEVEL 20.2580.241UNITY MAXABC-1234
2020-10-01ABCUNITYLEVEL 30.2990.288UNITY MAXABC-1234
2022-10-01ABCUNITYLEVEL 10.28620.2673UNITY MAXABC-1234
2022-10-01ABCUNITYLEVEL 30.33160.3194UNITY MAXABC-1234
2022-10-01ABCUNITYLEVEL 40.33160.3194UNITY MAXABC-1234
1 ACCEPTED SOLUTION
ronrsnfld
Resident Rockstar
Resident Rockstar

OK, all you need to do is Group By System Number and Threshold Type, then filter each sub table in the Table.Group aggregation:

 

 

let

//change next line to reflect your actual data source
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],

    #"Changed Type" = Table.TransformColumnTypes(Source,{
        {"Date From", type date}, {"GG", type text}, {"GZ", type text}, {"Threshold Type", type text}, 
        {"Target Avg Factor", type number}, {"Target Min Factor", type number}, 
        {"System Description", type text}, {"System Number", type text}}),
        
    #"Grouped Rows" = Table.Group(#"Changed Type", {"Threshold Type", "System Number"}, {
    
        {"Latest", (t)=>Table.SelectRows(t, each [Date From] = List.Max(t[Date From])), 
        
        type table [Date From=nullable date, GG=nullable text, GZ=nullable text, Threshold Type=nullable text, Target Avg Factor=nullable number, Target Min Factor=nullable number, System Description=nullable text, System Number=nullable text]}}),
    
    #"Expanded Latest" = Table.ExpandTableColumn(#"Grouped Rows", "Latest", {"Date From", "GG", "GZ", "Target Avg Factor", "Target Min Factor", "System Description"})
in
    #"Expanded Latest"

 

Results from your data

ronrsnfld_0-1673393656064.png

 

 

View solution in original post

11 REPLIES 11
ronrsnfld
Resident Rockstar
Resident Rockstar

OK, all you need to do is Group By System Number and Threshold Type, then filter each sub table in the Table.Group aggregation:

 

 

let

//change next line to reflect your actual data source
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],

    #"Changed Type" = Table.TransformColumnTypes(Source,{
        {"Date From", type date}, {"GG", type text}, {"GZ", type text}, {"Threshold Type", type text}, 
        {"Target Avg Factor", type number}, {"Target Min Factor", type number}, 
        {"System Description", type text}, {"System Number", type text}}),
        
    #"Grouped Rows" = Table.Group(#"Changed Type", {"Threshold Type", "System Number"}, {
    
        {"Latest", (t)=>Table.SelectRows(t, each [Date From] = List.Max(t[Date From])), 
        
        type table [Date From=nullable date, GG=nullable text, GZ=nullable text, Threshold Type=nullable text, Target Avg Factor=nullable number, Target Min Factor=nullable number, System Description=nullable text, System Number=nullable text]}}),
    
    #"Expanded Latest" = Table.ExpandTableColumn(#"Grouped Rows", "Latest", {"Date From", "GG", "GZ", "Target Avg Factor", "Target Min Factor", "System Description"})
in
    #"Expanded Latest"

 

Results from your data

ronrsnfld_0-1673393656064.png

 

 

Hi @ronrsnfld 

Elegant solution, thanks. 

Open question, and something that I did not test yet, but in terms of query performance, would your approach be faster on large tables? Any insights would be appreciated. 

It might be worthwhile testing various methods of solving his problem. I think that, on a large database, the efficiency of the sort would depend more on how large each subgroup is. A sort has to load the entire table and is recommended to be done as a "last step". I have assumed that a sort within a Table.Group aggregation will only load the "sub-table". But I don't know for sure. Worth testing.

I tested both approaches on a 100'000 record fact table, running each query twice.

 

My query, using the self-join approach ran in 0.033 ms and in 0.27 ms respectively, and your query, using a "group and filter sub-table" approach, ran in 27.49 ms and 10.92 ms respectively. 

 

My big fact table of 100'000 records, consisted of columns "System ID", "Threshold Level", "Date", and "Value", where "System ID" and "Threshold Level" of each record was a random pick from the integer interval [1 .. 5], Date was a date between 2023-01-01 and 2023-01-01, and Value was a random number between 0 and 10. 

 

code to generate the random table is given below just in case you want to test also: 

 

/*
    This query will generate a big table with columns 
        "System ID" = a dimension key where possible values are defined in ListSystems
        "Threshold Level" = a dimension key where possible values are defined in ListThreshold, 
        "Date" = a date on or between pStartDate and pEndDate
        "Value" = a random number between 0 and 10

    The only input required is
        * pStartDate = the earliest date that should appear in the Date column
        * pEndDate = the latest date that should appear in the Date column
        * pNbrOfTransactions = the desired number of transaction records in the final table

    The way the big fact table is constructed: 
    Each dimension key pair (System ID, Threshold Level) will appear for all dates in the date range. 
    Hence, if we need X transactions in the final table, and we have Y dates, 
    then we need X/Y = Z dimension key pairs (System ID, Threshold Level)
    So the steps to construct the final table are:
        1. Calculate pNumberOfDays = the number of days between pStartDate and pEndDate
        2. Calculate the required number of Dimension Keys: pNumberOfDimensionKeys = pNumberOfTransactions / pNumberOfDays
        3. Create a table with "System ID" and "Threshold Level" as columns, having pNumberOfDimensionKeys records
        4. For each record in that table, add the list of all dates {pStartDate .. pEndDate}
        5. Expand the list as new rows
        6. Add a column "Value" and fill with a random number between 0 and 10
*/



let
    //define the starting parameters
    pStartDate = #date(2023,1,1),
    pEndDate = #date(2023,2,1),
    pNbrOfTransactions = 100000,  //the desired number of rows in the final table

    //define the list of valid keys for System ID and for Threshold Level
    ListSystem = {1 .. 5},
    ListThreshold = {1 .. 5},

    //all required input data are known, so start the generation process

    //first calculate some counters
    pNumberOfDays = Duration.Days(pEndDate-pStartDate) + 1,
    pSystemCount = List.Count(ListSystem),
    pThresholdCount = List.Count(ListThreshold),

    pNumberOfDimensionKeys = Number.RoundUp(pNbrOfTransactions / pNumberOfDays),

    //generate a list of System keys randomly picked from ListSystem
    ShuffledListSystems = List.Generate(
        () => 1, each _ <= pNumberOfDimensionKeys, each _ + 1, 
        each ListSystem{Number.RoundDown(Number.RandomBetween(0, pSystemCount))}
        ),
        
    //generate a list of Threshold keys randomly picked from ListThreshold
    ShuffledListThresholds = List.Generate(
        () => 1, each _ <= pNumberOfDimensionKeys, each _ + 1, 
        each ListThreshold{Number.RoundDown(Number.RandomBetween(0, pThresholdCount))}
        ),

    //create a table by adding ShuffledListThresholds and ShuffledListSystems together as columns
    Table1 = Table.FromColumns(
        {ShuffledListSystems, ShuffledListThresholds},
        type table[System ID = text, Threshold Level = text]
        ),

    //for each record in Table1, add all dates as a list    
    TableWithDates = Table.AddColumn(Table1, "Date", each List.Dates(pStartDate, pNumberOfDays, #duration(1,0,0,0))),
    #"Expanded Custom" = Table.ExpandListColumn(TableWithDates, "Date"),

    #"Changed Type" = Table.TransformColumnTypes(#"Expanded Custom",{{"Date", type date}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "Value", each Number.RandomBetween(0,10)),
    #"Changed Type1" = Table.TransformColumnTypes(#"Added Custom",{{"Value", type number}})
in
    #"Changed Type1"

 

Nice work. I will try to duplicate it using your generated table if I have time this weekend.

Obviously your method seems to be an order of magnitude faster than mine. Would probably make a huge difference on a large database, if that difference scales (which I suspect it would).

nickvanmaele
Advocate II
Advocate II

Hi, 

to rephrase the question asked by @ronrsnfld :

What is the business key of each record?

In other words, which columns together uniquely define the record?

Can you be more specific about which record you expect to keep in the sample data that you have provided?

 

The fields to uniquely identify a record are: (System Number, Threshold Type, Date From).

I need to capture the ones with the most current Date From.

 

In your sample data, there are still two different records for (System Number = ABC-1234) and (Threshold Type = Level 1) and (Date = 2022-10-01). 

 

Please explicitly define which records from your sample data you wish to see returned by the query. Do you want to keep both those records, or only one of them? If only one, then which one?

 

So far, I feel that we lack some information to understand what you are looking for. Please define your expected solution well - e.g., "in this sample table with X rows, I expect rows A, C, and F to be returned." (where X, A, C, and F are numbers)

Date FromGGGZThreshold TypeTarget Avg FactorTarget Min FactorSystem DescriptionSystem Number
10/1/2020ABCUNITYLEVEL 10.2580.241UNITY MAXABC-1234
10/1/2020ABCUNITYLEVEL 20.2580.241UNITY MAXABC-1234
10/1/2020ABCUNITYLEVEL 30.2990.288UNITY MAXABC-1234
        
10/1/2022ABCUNITYLEVEL 10.28620.2673UNITY MAXABC-1234
10/1/2022ABCUNITYLEVEL 30.33160.3194UNITY MAXABC-1234
10/1/2022ABCUNITYLEVEL 40.33160.3194UNITY MAXABC-1234
        
The 3 fields in yellow create a unique record
Based on these unique records I need to keep the System Number and Threshold Type with the most current date.
In the example attached I would need to keep the ones in green.

Notice not all Threshold Types have a record with 10/1/2022 Date From.  In these cases the most current date is 10/1/2020 (see above in blue) I need to see these records

 

Does this help?

hi @Gryph87 

Thanks for providing more details. Based on that, here is my proposed solution.

 

First, this is an overview of the input data and the result I have obtained:

 

overview of input data and solutionoverview of input data and solutionSolution approach

I based myself on the principle of an inner join. First I created a table with system number, threshold level, and the max(Date From) for that combination. This table, which I call 'MostRecentKeys', can then be joined with an inner join on the original table. The result of this join will be that only the records from both tables are kept where the business keys are identical. This is exactly what you want. 

 

In more detail: 

1. I created the table and used "from table/range" in Excel to obtain a Power Query

2. I used the "Group by" function with aggregations on "System number" and "Threshold Type", with a max on column "Date From" to obtain a table with 3 columns. Using the Advanced Editor or the rename functionality, I call this step 'MostRecentKeys'

3. Then I used the self-join functionality to join the query on itself. In other words, I pressed the 'Merge Queries' button and joined the query with itself with an inner join type, using the three business columns at the same time in both sides of the join

4. In the code obtained in step 3, I used the Advanced Editor to manually change the joined queries to 'MostRecentKeys' and '#"Changed Type" respectively, and I ensured that the three listed keys of each had correct column names. 

5. Then I expanded the merged query column to reveal all necessary data

6. To finish off, I reordered the columns

 

Power Query Code

The Power Query code is as follows:

 

let
Source = Excel.CurrentWorkbook(){[Name="tblInputData"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Date From", type date}, {"GG", type text}, {"GZ", type text}, {"Threshold Type", type text}, {"Target Avg Factor", type number}, {"Target Min Factor", type number}, {"System Description", type text}, {"System Number", type text}}),
MostRecentKeys = Table.Group(#"Changed Type", {"System Number", "Threshold Type"}, {{"MaxDate", each List.Max([Date From]), type nullable date}}),
#"Merged Queries" = Table.NestedJoin(MostRecentKeys, {"System Number", "Threshold Type", "MaxDate"}, #"Changed Type", {"System Number", "Threshold Type", "Date From"}, "AdditionalData", JoinKind.Inner),
#"Expanded AdditionalData" = Table.ExpandTableColumn(#"Merged Queries", "AdditionalData", {"GG", "GZ", "Target Avg Factor", "Target Min Factor", "System Description"}, {"GG", "GZ", "Target Avg Factor", "Target Min Factor", "System Description"}),
#"Reordered Columns" = Table.ReorderColumns(#"Expanded AdditionalData",{"System Number", "System Description", "Threshold Type", "MaxDate", "GG", "GZ", "Target Avg Factor", "Target Min Factor"})
in
#"Reordered Columns"

 

If you like this approach, please mark this post as solution.  

ronrsnfld
Resident Rockstar
Resident Rockstar

Since all the dates are the same, when you have multiple entries on the same date, how do you determine which is most recent?

Helpful resources

Announcements
PBI November 2023 Update Carousel

Power BI Monthly Update - November 2023

Check out the November 2023 Power BI update to learn about new features.

Power BI Fabric Summit Carousel

The largest Power BI and Fabric virtual conference

130+ sessions, 130+ speakers, Product managers, MVPs, and experts. All about Power BI and Fabric. Attend online or watch the recordings.

Top Solution Authors