Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Power BI is turning 10! Let’s celebrate together with dataviz contests, interactive sessions, and giveaways. Register now.

Reply
AlexisOlson
Super User
Super User

SHA256 in Power Query (and DAX) -- Performance Analysis Tools?

Just to see if I could, I decided to try implementing SHA256 in DAX. I was partly successful but limited lack of recursion, which meant I had to unroll loops manually with a bunch of variables (like @Greg_Deckler emulating recursion). I did that but ultimately ran into the problem that the hash for each chunk uses the output of the prior chunk, so the DAX version only works on strings short than 55 characters. (If you want to take a peek at the gory details, check the attached .pbix.)

 

After brooding on that for while, I decided to try to write an M function that wouldn't have the single chunk limitation. This is my first version that appears to be working (also included in the attached file).

(text as text) as text => 
let
    input = text,
    len = Text.Length(input),
    nChunks = Number.RoundUp( ( len * 8 + 65 ) / 512 ),
    bytes = nChunks * 64,
    output = Text.ToBinary(input, TextEncoding.Ascii),

    /*Define Constants*/
    t = List.Transform({0..32}, each Number.Power(2, _)),
    h0 = { 1779033703, 3144134277, 1013904242, 2773480762, 1359893119, 2600822924, 528734635, 1541459225 },
    k = {
        1116352408, 1899447441, 3049323471, 3921009573,  961987163, 1508970993, 2453635748, 2870763221,
        3624381080,  310598401,  607225278, 1426881987, 1925078388, 2162078206, 2614888103, 3248222580,
        3835390401, 4022224774,  264347078,  604807628,  770255983, 1249150122, 1555081692, 1996064986,
        2554220882, 2821834349, 2952996808, 3210313671, 3336571891, 3584528711,  113926993,  338241895,
         666307205,  773529912, 1294757372, 1396182291, 1695183700, 1986661051, 2177026350, 2456956037,
        2730485921, 2820302411, 3259730800, 3345764771, 3516065817, 3600352804, 4094571909,  275423344,
         430227734,  506948616,  659060556,  883997877,  958139571, 1322822218, 1537002063, 1747873779,
        1955562222, 2024104815, 2227730452, 2361852424, 2428436474, 2756734187, 3204031479, 3329325298 },
    
    /*Define Functions*/
    XOR = Number.BitwiseXor,
    AND = Number.BitwiseAnd,
	MOD32 = (n) => Number.Mod(n, t{32}),
    RROT32 = (n, r) => Number.BitwiseShiftRight(n, r) + t{32-r} * Number.Mod(n, t{r}),
    s0 = (w) => XOR(XOR(RROT32(w,  7), RROT32(w, 18)), Number.BitwiseShiftRight(w,  3)),
    s1 = (w) => XOR(XOR(RROT32(w, 17), RROT32(w, 19)), Number.BitwiseShiftRight(w, 10)),
	S1 = (w) => XOR(XOR(RROT32(w,  6), RROT32(w, 11)), RROT32(w, 25)),
	S0 = (w) => XOR(XOR(RROT32(w,  2), RROT32(w, 13)), RROT32(w, 22)),

    /*Main compression function*/
    CHUNK = (hash, w) =>
        let
            G = List.Generate(
                () => [
                        i = -1,
                        a = hash{0}, b = hash{1}, c = hash{2}, d = hash{3},
                        e = hash{4}, f = hash{5}, g = hash{6}, h = hash{7},
                        temp1 = 0, temp2 = 0, ch = 0, maj = 0
                    ],
                each [i] < 64,
                each [
                        ch    = XOR(AND([e], [f]), AND(XOR([e], t{32} - 1), [g])),
                        temp1 = MOD32([h] + S1([e]) + ch + k{i} + w{i}),
                        maj   = XOR(XOR(AND([a], [b]), AND([a], [c])), AND([b], [c])),
                        temp2 = MOD32(S0([a]) + maj),
                        h = [g],
                        g = [f],
                        f = [e],
                        e = MOD32([d] + temp1),
                        d = [c],
                        c = [b],
                        b = [a],
                        a = MOD32(temp1 + temp2),
                        i = [i] + 1
                    ]//,
                //each [s]
            ),
            list_abc = Record.ToList(Record.SelectFields(List.Last(G), {"a", "b", "c", "d", "e", "f", "g", "h"}))
        in
            list_abc,

    /*Create Message Schedule (w)*/
    list_bytes  = {1..bytes},
    list_chunks = List.Transform(list_bytes, each Number.IntegerDivide((_ - 1), 64)),
    list_words  = List.Transform(list_bytes, each Number.Mod(Number.IntegerDivide((_ - 1), 4), 16)),
    list_chars  = List.Transform(list_bytes, each
        if _ < len + 1 then Character.ToNumber(Text.At(input, _ - 1))
        else if _ = len + 1 then 128
        else if _ > bytes - 8 then Number.Mod(Number.BitwiseShiftRight(len * 8, (bytes - _) * 8), 256)
        else 0
        ),
    lists_table = Table.FromColumns( {list_bytes, list_chunks, list_words, list_chars }, {"index", "chunk", "word", "u"}),
    group_words = Table.Group(lists_table, {"chunk", "word"}, {{"w", 
        each List.Accumulate([u], 0, (state, current) => 256 * state + current), Int32.Type}}),

    /*Loop through chunks*/
    update_hash = List.Accumulate(
            {0..nChunks - 1},
            h0, /*Initial hash*/
            (state, current) =>
                let
                    w16 = List.Range(group_words[w], 16 * current, 16),
                    w48 = List.Generate(
                        () => [i = 15, w = w16, s = 0],
                        each [i] < 64,
                        each [
                                s = MOD32([w]{i - 16} + s0([w]{i - 15}) + [w]{i - 7} + s1([w]{i - 2})),
                                w = [w] & { s },
                                i = [i] + 1
                            ],
                        each [s]
                    ),
                    w64 = w16 & List.Skip(w48),
                    hash = CHUNK(state, w64),
                    endstate = List.Transform({0..7}, each MOD32(state{_} + hash{_}))
                in
                    endstate
        ),

    /*Convert hash to hex code*/
    hash_bytes  = List.Transform(update_hash,
        (hash) => List.Transform({0..3},
            (byte) => Number.BitwiseShiftRight(Number.Mod(hash, t{32 - 8 * byte}), 24 - 8 * byte))),
    hash_hex    = List.Transform(hash_bytes, each Binary.ToText(Binary.FromList(_), BinaryEncoding.Hex)),
    concatenate = Text.Combine(hash_hex)
in
    concatenate

This is clearly not a production-ready function and isn't much faster than the DAX version (on the order of seconds per call).

 

If I want to improve the performance of this function, how do I even approach this? I've used DAX Studio to examine DAX performance but I don't know if there is anything comparable for M that people actually use. Any advice or references are welcome, either related to the M code itself or related to tools/resources.

 

I haven't learned who the best members to ask this are, so I hope you'll indulge me in a bit of mention spamming.

@edhans@ImkeF@MarcelBeug@mahoneypat@BA_Pete@AlB@KNP@lbendlin@HotChilli@Jimmy801, @Anonymous, @Vera_33 @CNENFRNL@artemus

Feel free to ignore if you're not interested.

2 ACCEPTED SOLUTIONS
lbendlin
Super User
Super User

I would flatten it out (make it back into a query) and then run Power Query Diagnostics on the steps that you are interested in.  It will take a long time but eventually it will come back with a couple of diagnostics queries where you can go into detail on the timing of each step.

 

If you run the diagnostics on the function then you may not get the granularity you'd like. Frankly haven't tried that.

 

They key here is patience, and a freshly started Power BI Desktop.  Those diagnostics result take ages to appear, and you may think they never come back.  Better step away, do something else and not look at the screen for a while...

 

Once done you can safely delete the diagnostics query group.

View solution in original post

ImkeF
Community Champion
Community Champion

Hi @AlexisOlson ,
hats off to this amazing function - and to provide an example of how useful a buffer can be.
Never in my live have I been able to speed up a query with a single buffer like this.
(But maybe this shouldn't be a surprise when triple nested iterators are involved 😉 )

Just add a buffer here and the results will be returned almost immediately:

 

 

    group_words = Table.Buffer( Table.Group(lists_table, {"chunk", "word"}, {{"w", 
        each List.Accumulate([u], 0, (state, current) => 256 * state + current), Int32.Type}}) ),

 

BTW: This "trick" is included in my page with Power Query performance tricks here: Speed/Performance aspects – The BIccountant

 

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

View solution in original post

6 REPLIES 6
ImkeF
Community Champion
Community Champion

Hi @AlexisOlson ,
hats off to this amazing function - and to provide an example of how useful a buffer can be.
Never in my live have I been able to speed up a query with a single buffer like this.
(But maybe this shouldn't be a surprise when triple nested iterators are involved 😉 )

Just add a buffer here and the results will be returned almost immediately:

 

 

    group_words = Table.Buffer( Table.Group(lists_table, {"chunk", "word"}, {{"w", 
        each List.Accumulate([u], 0, (state, current) => 256 * state + current), Int32.Type}}) ),

 

BTW: This "trick" is included in my page with Power Query performance tricks here: Speed/Performance aspects – The BIccountant

 

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

@ImkeF Amazing! It makes perfect sense why this would help.

 

Thanks for the link too. I will definitely give it a careful read.

mahoneypat
Microsoft Employee
Microsoft Employee

I didn't check if the digest package is available in the Service, but I would do this with R. I am no R expert and adapted code I found online.

 

mahoneypat_0-1639090275227.png

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("NZDBcoIxCIRfZcez8z9Fe+uxPTkeaILKNAkxAfXxS7S9hQDL7nc47HbH/WF34VIUdx0lP+vPC+Pqkn7wPfTecNLH8/9DB1dIn16RtejAFANVtj2StsnJ2HyAsnSZSdoZXCSak3MsgMVn1Qzj2mNZWpIs2ZvBDYW+Qx5sL2lGpXMjUJGr04YvAzepoY0q63GLkup+GZ1oOm14Bj94JDEy0QYvhWrSl/Iakinr0lNSegyDKYzX8KSvAHHKNrwtSXJjyPBw8soqDYP74Au3zCOCx8dNi/c4x2EnkoLnZCQp5Z9QBHKc/CxkaMsQOo0ofGx4fyTuxr4wBgNNiTjFXPIumWxtRIo+VDK3RXGRiqPJS6eVG3o6SRJC5sljdauWZYMWIAkc84+r1213PP4C", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Text = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Text", type text}}),
    #"Run R script" = R.Execute("# 'dataset' holds the input data for this script#(lf)library(digest)#(lf)dataset$sha256 <- sapply(dataset$Text, digest, algo=""sha256"")#(lf)output <- dataset",[dataset=#"Changed Type"]),
    #"""output""" = #"Run R script"{[Name="output"]}[Value]
in
    #"""output"""

 

Pat





Did I answer your question? Mark my post as a solution! Kudos are also appreciated!

To learn more about Power BI, follow me on Twitter or subscribe on YouTube.


@mahoneypa HoosierBI on YouTube


The documentation here does list the digtest package:
https://docs.microsoft.com/en-us/power-bi/connect-data/service-r-packages-support

 

The does seem like a more practical solution if I were actually using SHA256 for anything except tinkering.

lbendlin
Super User
Super User

I would flatten it out (make it back into a query) and then run Power Query Diagnostics on the steps that you are interested in.  It will take a long time but eventually it will come back with a couple of diagnostics queries where you can go into detail on the timing of each step.

 

If you run the diagnostics on the function then you may not get the granularity you'd like. Frankly haven't tried that.

 

They key here is patience, and a freshly started Power BI Desktop.  Those diagnostics result take ages to appear, and you may think they never come back.  Better step away, do something else and not look at the screen for a while...

 

Once done you can safely delete the diagnostics query group.

Thanks for the advice, @lbendlin. I entirely forgot about that tab.

 

Documentation link for my future self:
https://docs.microsoft.com/en-us/power-query/querydiagnostics

Helpful resources

Announcements
June 2025 Power BI Update Carousel

Power BI Monthly Update - June 2025

Check out the June 2025 Power BI update to learn about new features.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.

Top Solution Authors
Top Kudoed Authors