Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Reply
gamvazdelima
Regular Visitor

Customer's purchase frequency based on transactions table. Autonomous Store Frauds

Hi All.

I have a table of customers transactions with ~550k rows. Contaning 20 columns such as Customer ID, date of purchase, types, $ Total, $ Subtotal, $ Discount, payment method, number of itens.

 

I need to calculate the customer frequency along the time, meaning I want every row to contain the last customer ID Transaction, so I can check average days of customer's visits and average purchase values. I used this to check suspicious behaviour because too much vistis with low purchase values and/or same values could indicate fraud in autonomous store, customer is paying for 01 item and taking more itens.

for that I use power query as below. 
My problem is that this query is taking too long, mainly because the highlights blocks in the code

"The group solution" I took the idea from this video: https://www.youtube.com/watch?v=Hc3d8rMSXcQ&list=LL&index=1


I am basic user in powerBi and Power Query, I do a lot of thing on try and erros. The report in PowerBi show the suspisious customer is working with some results. To improve the accurance I need to speed up theses updates or even scale with more data analysis like this to cross-check other variables.

I have try Table.Buffer based on some articles but with no sucess.

Can I have some help and instructions.

 

let
Fonte = Folder.Files("C:\Users\gamva\OneDrive\300_M4U\01_GESTAO\0000_Automacao_NL\Automate\01_Rotina_Transacao\arquivos_transacoes"),
#"Arquivos Ocultos Filtrados1" = Table.SelectRows(Fonte, each [Attributes]?[Hidden]? <> true),
#"Invocar Função Personalizada1" = Table.AddColumn(#"Arquivos Ocultos Filtrados1", "Transformar Arquivo", each #"Transformar Arquivo"([Content])),
#"Colunas Renomeadas1" = Table.RenameColumns(#"Invocar Função Personalizada1", {"Name", "Nome da Origem"}),
#"Outras Colunas Removidas1" = Table.SelectColumns(#"Colunas Renomeadas1", {"Nome da Origem", "Transformar Arquivo"}),
#"Transformar Arquivo Expandido" = Table.ExpandTableColumn(#"Outras Colunas Removidas1", "Transformar Arquivo", {"ID", "Id Franqueado", "Franqueado", "Franqueado e-mail", "ID PDX", "PDX", "Total gasto", "Cliente ID", "Nome Cliente", "Status Cliente", "Tipo", "Distância", "CPF na nota", "NFC-e", "Qtde produtos", "Qtde produtos removidos", "Qtde compras", "Qtde não compras", "Forma pagamento", "Total", "Voucher", "Total Estorno", "Desconto", "Cashback", "Subtotal", "Emissão NFCe", "Data e hora", "Leitura QRCODE", "Via", "Column16"}, {"ID", "Id Franqueado", "Franqueado", "Franqueado e-mail", "ID PDX", "PDX", "Total gasto", "Cliente ID", "Nome Cliente", "Status Cliente", "Tipo", "Distância", "CPF na nota", "NFC-e", "Qtde produtos", "Qtde produtos removidos", "Qtde compras", "Qtde não compras", "Forma pagamento", "Total", "Voucher", "Total Estorno", "Desconto", "Cashback", "Subtotal", "Emissão NFCe", "Data e hora", "Leitura QRCODE", "Via", "Column16"}),

#"Tipo Alterado" = Table.TransformColumnTypes(#"Transformar Arquivo Expandido",{{"Nome da Origem", type text}, {"ID", Int64.Type}, {"Id Franqueado", Int64.Type}, {"Franqueado", type text}, {"Franqueado e-mail", type text}, {"ID PDX", type any}, {"PDX", type text}, {"Total gasto", type number}, {"Cliente ID", Int64.Type}, {"Nome Cliente", type text}, {"Status Cliente", type text}, {"Tipo", type text}, {"Distância", type text}, {"CPF na nota", type text}, {"NFC-e", type text}, {"Qtde produtos", Int64.Type}, {"Qtde produtos removidos", Int64.Type}, {"Qtde compras", Int64.Type}, {"Qtde não compras", Int64.Type}, {"Forma pagamento", type text}, {"Total", type any}, {"Voucher", type number}, {"Total Estorno", Int64.Type}, {"Desconto", type number}, {"Cashback", type number}, {"Subtotal", type number}, {"Emissão NFCe", type datetime}, {"Data e hora", type datetime}}),

#"Data Inserida" = Table.AddColumn(#"Tipo Alterado", "Data", each DateTime.Date([Data e hora]), type date),
#"Hora Inserida" = Table.AddColumn(#"Data Inserida", "Hora", each DateTime.Time([Data e hora]), type time),
#"Tipo Alterado2" = Table.TransformColumnTypes(#"Hora Inserida",{{"Hora", type number}}),
#"Módulo Inserido" = Table.AddColumn(#"Tipo Alterado2", "Módulo", each Number.Mod([Hora], 1/1440), type number),
#"Subtração Inserida" = Table.AddColumn(#"Módulo Inserido", "Subtração", each [Hora] - [Módulo], type number),
#"Tipo Alterado3" = Table.TransformColumnTypes(#"Subtração Inserida",{{"Subtração", type time}}),

#"Dividir Coluna por Delimitador" = Table.SplitColumn(#"Tipo Alterado3", "Tipo", Splitter.SplitTextByDelimiter("(", QuoteStyle.Csv), {"Tipo.1", "Tipo.2"}),
#"Tipo Alterado5" = Table.TransformColumnTypes(#"Dividir Coluna por Delimitador",{{"Tipo.1", type text}, {"Tipo.2", type text}}),
#"Texto Aparado" = Table.TransformColumns(#"Tipo Alterado5",{{"Tipo.1", Text.Trim, type text}}),

#"Colunas Renomeadas2" = Table.RenameColumns(#"Texto Aparado",{{"Subtração", "HORA-MINUTO"} , {"ID PDX", "ID PDV"}, {"ID", "ID_transacao"}, {"PDX", "PDV"} , {"Tipo.1", "Tipo Resumido"} }),


#"Tipo Alterado4" = Table.TransformColumnTypes(#"Colunas Renomeadas2",{{"Hora", type time}, {"Total", type number}, {"ID PDV", Int64.Type}}),

#"Colunas Removidas1" = Table.RemoveColumns(#"Tipo Alterado4",{"Franqueado e-mail", "Módulo" , "CPF na nota"}),

#"Texto Inserido Entre os Delimitadores" = Table.AddColumn(#"Colunas Removidas1", "Texto Entre os Delimitadores", each Text.BetweenDelimiters([Nome da Origem], "-", ".xlsx"), type text),
#"Dividir Coluna por Delimitador1" = Table.SplitColumn(#"Texto Inserido Entre os Delimitadores", "Texto Entre os Delimitadores", Splitter.SplitTextByEachDelimiter({"-"}, QuoteStyle.Csv, false), {"Texto Entre os Delimitadores.1", "Texto Entre os Delimitadores.2"}),
#"Dividir Coluna pela Posição" = Table.SplitColumn(#"Dividir Coluna por Delimitador1", "Texto Entre os Delimitadores.1", Splitter.SplitTextByPositions({0, 6}, false), {"Texto Entre os Delimitadores.1.1", "Texto Entre os Delimitadores.1.2"}),
#"Tipo Alterado9" = Table.TransformColumnTypes(#"Dividir Coluna pela Posição",{{"Texto Entre os Delimitadores.1.1", Int64.Type}, {"Texto Entre os Delimitadores.1.2", Int64.Type}, {"Texto Entre os Delimitadores.2", type text}}),
#"Dividir Coluna pela Posição1" = Table.SplitColumn(Table.TransformColumnTypes(#"Tipo Alterado9", {{"Texto Entre os Delimitadores.1.1", type text}}, "pt-BR"), "Texto Entre os Delimitadores.1.1", Splitter.SplitTextByPositions({0, 4}, false), {"Texto Entre os Delimitadores.1.1.1", "Texto Entre os Delimitadores.1.1.2"}),
#"Tipo Alterado7" = Table.TransformColumnTypes(#"Dividir Coluna pela Posição1",{{"Texto Entre os Delimitadores.1.2", type text}}),
#"Valor Substituído" = Table.ReplaceValue(#"Tipo Alterado7","","00",Replacer.ReplaceValue,{"Texto Entre os Delimitadores.1.2"}),
#"Valor Substituído2" = Table.ReplaceValue(#"Valor Substituído",null,"00",Replacer.ReplaceValue,{"Texto Entre os Delimitadores.1.2"}),
#"Tipo Alterado8" = Table.TransformColumnTypes(#"Valor Substituído2",{{"Texto Entre os Delimitadores.1.2", Int64.Type}}),
#"Valor Substituído1" = Table.ReplaceValue(#"Tipo Alterado8","","00",Replacer.ReplaceValue,{"Texto Entre os Delimitadores.1.1.2"}),
#"Tipo Alterado10" = Table.TransformColumnTypes(#"Valor Substituído1",{{"Texto Entre os Delimitadores.1.1.2", Int64.Type}}),
#"Personalização Adicionada" = Table.AddColumn(#"Tipo Alterado10", "data_coleta", each [Texto Entre os Delimitadores.1.1.1] & Number.ToText( [Texto Entre os Delimitadores.1.1.2] , "00" ) & Number.ToText( [Texto Entre os Delimitadores.1.2], "00" )),

#"Linhas Classificadas1" = Table.Sort(#"Personalização Adicionada",{{"data_coleta", Order.Descending}}),
#"Duplicatas Removidas" = Table.Distinct(#"Linhas Classificadas1", {"ID_transacao"}),

#"Linhas Agrupadas" = Table.Group(#"Duplicatas Removidas", {"Cliente ID"}, {{"GrupoClienteId", each _, Value.Type( Table.Sort(#"Duplicatas Removidas",{{"Data e hora", Order.Descending}}) ) }}),
#"Personalização Adicionada3" = Table.AddColumn(#"Linhas Agrupadas", "GrupoClienteIdIndex", each Table.AddIndexColumn( [GrupoClienteId] , "ClienteIdIndex", 1, 1, Int64.Type)),
#"Outras Colunas Removidas" = Table.SelectColumns(#"Personalização Adicionada3",{"GrupoClienteIdIndex"}),
#"GrupoClienteIdIndex Expandido1" = Table.ExpandTableColumn(#"Outras Colunas Removidas", "GrupoClienteIdIndex", {"Nome da Origem", "ID_transacao", "Id Franqueado", "Franqueado", "ID PDV", "PDV", "Total gasto", "Cliente ID", "Nome Cliente", "Status Cliente", "Tipo Resumido", "Tipo.2", "Distância", "NFC-e", "Qtde produtos", "Qtde produtos removidos", "Qtde compras", "Qtde não compras", "Forma pagamento", "Total", "Voucher", "Total Estorno", "Desconto", "Cashback", "Subtotal", "Emissão NFCe", "Data e hora", "Leitura QRCODE", "Via", "Column16", "Data", "Hora", "HORA-MINUTO", "Texto Entre os Delimitadores.1.1.1", "Texto Entre os Delimitadores.1.1.2", "Texto Entre os Delimitadores.1.2", "Texto Entre os Delimitadores.2", "data_coleta", "ClienteIdIndex"}, {"Nome da Origem", "ID_transacao", "Id Franqueado", "Franqueado", "ID PDV", "PDV", "Total gasto", "Cliente ID", "Nome Cliente", "Status Cliente", "Tipo Resumido", "Tipo.2", "Distância", "NFC-e", "Qtde produtos", "Qtde produtos removidos", "Qtde compras", "Qtde não compras", "Forma pagamento", "Total", "Voucher", "Total Estorno", "Desconto", "Cashback", "Subtotal", "Emissão NFCe", "Data e hora", "Leitura QRCODE", "Via", "Column16", "Data", "Hora", "HORA-MINUTO", "Texto Entre os Delimitadores.1.1.1", "Texto Entre os Delimitadores.1.1.2", "Texto Entre os Delimitadores.1.2", "Texto Entre os Delimitadores.2", "data_coleta", "ClienteIdIndex"}),
#"Subtração Inserida1" = Table.AddColumn(#"GrupoClienteIdIndex Expandido1", "Subtração", each [ClienteIdIndex] - 1, type number),
#"Consultas Mescladas" = Table.NestedJoin(#"Subtração Inserida1", {"ClienteIdIndex", "Cliente ID"}, #"Subtração Inserida1", {"Subtração", "Cliente ID"}, "Subtração Inserida1", JoinKind.LeftOuter),
#"Subtração Inserida1 Expandido" = Table.ExpandTableColumn(#"Consultas Mescladas", "Subtração Inserida1", {"Total", "Data e hora" , "ID_transacao"}, {"Total anterior", "Data e hora anterior", "ID_transacao anterior" }),
#"Colunas Removidas3" = Table.RemoveColumns(#"Subtração Inserida1 Expandido",{"ClienteIdIndex", "Subtração"}),

#"Coluna Duplicada" = Table.DuplicateColumn(#"Colunas Removidas3", "ID_transacao", "ID_transacao - Copiar"),

#"Personalização Adicionada1" = Table.AddColumn(#"Coluna Duplicada" , "ANOMESDIA", each Number.ToText ( Date.Year ( [Data e hora] ) , "0000" ) & Number.ToText ( Date.Month ( [Data e hora] ) , "00" ) & Number.ToText ( Date.Day ( [Data e hora] ) , "00" ) ),
#"Personalização Adicionada2" = Table.AddColumn(#"Personalização Adicionada1", "transação tipo", each "Transação"),
#"Coluna Duplicada1" = Table.DuplicateColumn(#"Personalização Adicionada2", "ID PDV", "ID PDV - Copiar"),
#"Colunas Renomeadas6" = Table.RenameColumns(#"Coluna Duplicada1",{{"ID PDV - Copiar", "ID PDX"} , {"ID_transacao - Copiar", "ID transação"}}),


#"Tipo Alterado12" = Table.TransformColumnTypes(#"Colunas Removidas3",{{"Total anterior", type number}, {"ID_transacao anterior", Int64.Type}, {"Data e hora anterior", type datetime}, {"ID PDX", Int64.Type}, {"Id Franqueado", Int64.Type}, {"Cliente ID", Int64.Type}, {"Data e hora", type datetime}, {"Total", type number}, {"ID_transacao", Int64.Type}, {"Desconto", type number}, {"Cashback", type number}, {"Subtotal", type number}, {"Qtde compras", Int64.Type}, {"Qtde não compras", Int64.Type}, {"Qtde produtos", Int64.Type}, {"Qtde produtos removidos", Int64.Type}, {"Total gasto", type number}, {"Data", type date}, {"Hora", type time}, {"HORA-MINUTO", type time}, {"ID transação", Int64.Type}})
in
#"Tipo Alterado12"

1 ACCEPTED SOLUTION

As stated before: Check out Fabric. You can use a Power BI Gateway to connect to on-premise sources like your folder. Then use Power Query to extract data using the script(s) you have now. (You may even be able to copy and paste the M code with minimal manual editing!) And finally, you can push the Power Query Dataflow output to a table in a Lakehouse or Warehouse.

Honestly, though, I would also incorporate a Pipeline or two to process one file at a time, then move that file out of the way. (you really don't want to be processing hundreds of files via Power Query every night if all you are doing is adding a few more to the folder each day.)

Best of luck.




Did I answer your question? If so, mark my post as a solution. Also consider helping someone else in the forums!

Proud to be a Super User!





View solution in original post

6 REPLIES 6
ToddChitt
Super User
Super User

>>I am doing this already, PBI updates runs off-hours already. <<

No, you are not. It sounds like you are asking Power BI to run a refresh of 200 flat files through a Power Query that looks to be like 60 lines of Applied Steps, oh, and access those files via a Gateway. 

 

What I am suggesting is that you figure out a way to process those files with some other tools so that the result is a fully processed dataset that is then consumed by Power BI.

 

The Folder source in Power BI is great, but it is horribly slow, probably made even more so by the 60 applied steps. 

How many files to you expect to have in 6 months time? An additional 200?

 

You are eating soup with a fork and wondering why it is taking so long to finish a bowl.




Did I answer your question? If so, mark my post as a solution. Also consider helping someone else in the forums!

Proud to be a Super User!





Thanks for the comments. You are right I need something more sophisticated to deal with the files. Not sure how or the tools to use to pre-process the files, specially for doing some data calculations as my 60 steps are doing it. I will try to learn more about Fabric.
Anyway I was expecting here to find ways to optimize the steps., speacialy the one calculating customer last purchase date.

As stated before: Check out Fabric. You can use a Power BI Gateway to connect to on-premise sources like your folder. Then use Power Query to extract data using the script(s) you have now. (You may even be able to copy and paste the M code with minimal manual editing!) And finally, you can push the Power Query Dataflow output to a table in a Lakehouse or Warehouse.

Honestly, though, I would also incorporate a Pipeline or two to process one file at a time, then move that file out of the way. (you really don't want to be processing hundreds of files via Power Query every night if all you are doing is adding a few more to the folder each day.)

Best of luck.




Did I answer your question? If so, mark my post as a solution. Also consider helping someone else in the forums!

Proud to be a Super User!





ToddChitt
Super User
Super User

If you are pulling from a local flat file in OneDrive, yes, performance is going to be very slow, especially 1/2 million rows. 

I suggest you look into Fabric. See if you can pre-process your flat file into something like a Data Warehouse via a Pipeline, Notebook, or Dataflow. Make this processing happen off-hours. 




Did I answer your question? If so, mark my post as a solution. Also consider helping someone else in the forums!

Proud to be a Super User!





Appreciate the comments, by the way" 🙂

I am doing this already, PBI updates runs off-hours already. But I am not using these Fabric layers. The files (around 200 files now, growing everyday as time goes by. A couple of files each new day) are located in one PC and PBI uses the gateway to get the updates.
By any chance, Can I also find ways to optimize this gateway update routine and time consuming?

Helpful resources

Announcements
LearnSurvey

Fabric certifications survey

Certification feedback opportunity for the community.

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors