Which approach is likely to be more performant?

rpiboy_1 · ‎01-19-2024

I have a fact table coming from a data source that is in an attribute/value configuration. That is there are effectivly two columns of interest (Attribute and Value). However, to make the data useful in the data model the table needs to be pivoted such that we end up with unique columns for each unique attribute.

Source Data Looks like this:

dim.id	Attribute	Value
1	Property1	Val1
1	Property2	Val2
1.1	Property1	Val1.1
1.1	Property2	Val2.1
1.1	Property3	Val3

Transformed to look like this:

dim.id	Property1	Property2	Property3
1	Val1	Val2	null
1.1	Val1.1	Val2.1	Val3

My actual data has somewhere on the order of a dozen attributes, with rows at a granuality of individual dates, going back upwards of a decade in data. After the pivot, we group and aggregate the data on a monthly basis as we only need monthly statisitcs.

Initially when I built my first model in Power Query, I used the Pivot table function to pivot all of the attributes, but this is a very 'expensive' operation in terms of processing and time.

Subsequently in building a smaller model from a different (but effectively identical datasource) I realized I could built a function that would:

Get the unique attributes
Filter the table on a given attribute
Return a table with only the value column and rename the Value Column to the Attribute name

Subsequently I can expand the tables for each run of the function for each attribute and basically 'build' out the same table that results from the pivot operation.

The function based approach 'seems' faster, but the dataset is smaller in this case. Before 're-creating' the Function approach with my larger dataset I was hoping the community could weigh in on which approach is likely to be more performant, or maybe they're likely to be very smiliar. Possibly, the Pivot function is really just doing nearly the exact same thing as my custom function, and I just happened to manually build my own 'pivot process'?

dufoq3 · ‎01-19-2024

You can try these 2 versions

v1_FinalPivotColumn (which you already tried)
v2_Final_ListAccumulateExpanded

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUQooyi9ILSqpBLHDEnMMlWJ1UCWMIBJGEAk9bHr0DLFIwvRhlTSGSBorxcYCAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [dim.id = _t, Attribute = _t, Value = _t]),
    v1_Final_PivotColumn = Table.Pivot(Source, List.Distinct(Source[Attribute]), "Attribute", "Value"),
    v2_BufferedAttributes = List.Buffer(List.Distinct(Source[Attribute])),
    v2_GroupedRows = Table.Group(Source, {"dim.id"}, {{"All", each _, type table}}),
    v2_Ad_CustomColumn = Table.AddColumn(v2_GroupedRows, "CustomColumn", each
        [v_AllBuffered = Table.Buffer([All]),
         v_adAttributeColumns = 
            List.Accumulate(
                List.Buffer(v_AllBuffered[Attribute]), 
                Table.FirstN(v_AllBuffered, 1),
                (s,c)=> Table.AddColumn(s, c, (x)=> Table.SelectRows(v_AllBuffered, (r)=> r[Attribute] = c){0}[Value], type text))
        ][v_adAttributeColumns],
        type table),
    #"v2_Removed Other Columns" = Table.SelectColumns(v2_Ad_CustomColumn,{"dim.id", "CustomColumn"}),
    v2_Final_ListAccumulateExpanded = Table.ExpandTableColumn(#"v2_Removed Other Columns", "CustomColumn", v2_BufferedAttributes, v2_BufferedAttributes)
in
    v2_Final_ListAccumulateExpanded

Let me know which one is faster pls.

Note: Check this link to learn how to use my query.
Check this link if you don't know how to provide sample data.

Anonymous · ‎01-19-2024

Yes, do as much of your grouping and aggregating as possible first. The fewer values to pivot the greater the performance.

--Nate

rpiboy_1 · ‎01-19-2024

Fair enough, however is there likely to be any performance advantage between the built-in Pivot function versus a manually written custom function that parses the source table with filtering to acheive the same result as the Pivot, or are the 'paths' equal. I know the 'Pivot' can not be folded back to SQL (the data source), does 'filtering' and merging get folded back to SQL as part of building the query?