The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredCompete to become Power BI Data Viz World Champion! First round ends August 18th. Get started.
Hello!
I'm facing a problem that at first doesn't make sense to me.
I uploaded 13 files of about 200Kb each to Power Query. After processing the data, when loading to PowerBI, it takes a long time and by the time I waited, it reached over 600 MB.
Has anyone experienced this and could help me resolve this situation?
Solved! Go to Solution.
Cases where the worker may not have registered or there may have been a failure in the entry or exit registration will be validated, this logic and the calculation I already have. Remember that the problem I was facing was the slowness and the large number of bytes that were being loaded when loading the data because of the use of readings in subsequent rows in the same query.
I've already managed to resolve the situation. What I did was duplicate the query, then I added a column with a 1-based index to the first query and a 0-based index to the second query and I merged the queries based on the indexes, taking only the "Movement Time" column and that's it.
That is not necessarily a large size. But I see you have chosen the "Combine Binaries" feature. That is not optimal, and you can improve the ingestion by re-implementing the process and by using CSV files instead of XLSX (if possible).
I was testing loading the data at each step and the problem starts after this second line.
What is done is to generate an index column and create a new column that receives the data from the next row in the current row based on the index.
#"Índice Adicionado" = Table.AddIndexColumn(#"Linhas Filtradas", "Índice", 1, 1, Int64.Type),
#"Tempo do Próximo Movimento" = Table.AddColumn(#"Índice Adicionado", "Next Movement Time", each try #"Índice Adicionado"[Movement Time]{[Índice]} otherwise null),
The full code is this:
let
Fonte = Folder.Files("C:\FOLDER"),
#"Arquivos Ocultos Filtrados1" = Table.SelectRows(Fonte, each [Attributes]?[Hidden]? <> true),
#"Invocar Função Personalizada1" = Table.AddColumn(#"Arquivos Ocultos Filtrados1", "Transformar Arquivo (2)", each #"Transformar Arquivo (2)"([Content])),
#"Colunas Renomeadas1" = Table.RenameColumns(#"Invocar Função Personalizada1", {"Name", "Nome da Origem"}),
#"Outras Colunas Removidas1" = Table.SelectColumns(#"Colunas Renomeadas1", {"Nome da Origem", "Transformar Arquivo (2)"}),
#"Coluna de Tabela Expandida1" = Table.ExpandTableColumn(#"Outras Colunas Removidas1", "Transformar Arquivo (2)", Table.ColumnNames(#"Transformar Arquivo (2)"(#"Arquivo de Amostra (2)"))),
#"Linhas Filtradas" = Table.SelectRows(#"Coluna de Tabela Expandida1", each ([Column1] <> "" and [Column1] <> "Total Movement: 1676")),
#"Cabeçalhos Promovidos" = Table.PromoteHeaders(#"Linhas Filtradas", [PromoteAllScalars=true]),
#"Outras Colunas Removidas" = Table.SelectColumns(#"Cabeçalhos Promovidos",{"Last Name", "First Name", "Employer", "Occupation", "Shift", "Movement Time", "From Location", "To Location", "Type"}),
#"Colunas Mescladas" = Table.CombineColumns(#"Outras Colunas Removidas",{"First Name", "Last Name"},Combiner.CombineTextByDelimiter(" ", QuoteStyle.None),"Full Name"),
#"Tipo Alterado" = Table.TransformColumnTypes(#"Colunas Mescladas",{{"Movement Time", type datetime}}),
#"Linhas Classificadas" = Table.Sort(#"Tipo Alterado",{{"Full Name", Order.Ascending}, {"Movement Time", Order.Ascending}}),
#"Linhas Filtradas1" = Table.SelectRows(#"Linhas Classificadas", each ([Type] = "A")),
#"Índice Adicionado" = Table.AddIndexColumn(#"Linhas Filtradas1", "Índice", 1, 1, Int64.Type),
#"Tempo do Próximo Movimento" = Table.AddColumn(#"Índice Adicionado", "Next Movement Time", each try #"Índice Adicionado"[Movement Time]{[Índice]} otherwise null),
#"Permanência Adicionada" = Table.AddColumn(#"Tempo do Próximo Movimento", "Tempo Permanência", each
if [To Location] = "PMXL1" and [From Location] = "AQUARIUS BRASIL" then Duration.TotalMinutes([Next Movement Time] - [Movement Time]) else null),
#"Linhas Filtradas1" = Table.SelectRows(#"Permanência Adicionada", each ([Tempo Permanência] <> null)),
Personalizar1 = Table.Group(#"Linhas Filtradas1", {"Full Name", "Movement Time"}, {{"Tempo Total (Minutos)", each List.Sum([Tempo Permanência]), type number}})
in
Personalizar1
try
#"Índice Adicionado" = Table.Buffer(Table.AddIndexColumn(#"Linhas Filtradas", "Índice", 1, 1, Int64.Type)),
RC = Table.RowCount(#"Índice Adicionado"),
#"Tempo do Próximo Movimento" = Table.AddColumn(#"Índice Adicionado", "Next Movement Time", each if [Índice]<RC then #"Índice Adicionado"[Movement Time]{[Índice]} else null),
Or - use the OFFSET function in DAX.
I changed it to use the conditional but I got the same result. I think the try handles the error when it reaches the last line. I don't know in terms of performance which would be the best way.
I really think I need to try with DAX.
If you have any other solution proposals I would be very grateful.
Thanks!!!
Please provide sample data that covers your issue or question completely, in a usable format (not as a screenshot).
Do not include sensitive information. Do not include anything that is unrelated to the issue or question.
Please show the expected outcome based on the sample data you provided.
Need help uploading data? https://community.fabric.microsoft.com/t5/Community-Blog/How-to-provide-sample-data-in-the-Power-BI-...
Want faster answers? https://community.fabric.microsoft.com/t5/Desktop/How-to-Get-Your-Question-Answered-Quickly/m-p/1447...
I basically need to calculate the time an employee spends on the platform each time he/she goes to the platform. The data source will be a folder where files with daily access control records will be stored daily. It is important to consider that the employee may go to the platform one day and return only the next day.
Here are examples of daily records:
Gangway 19.03.25.csv
Full Name | Employer | Occupation | Shift | Movement Time | From Location | To Location | Type |
NAME_1 | COMPANY_1 | COORDENADOR | DAY | 19/03/2025 16:47 | FLOTEL | PLATFORM | A |
NAME_1 | COMPANY_1 | COORDENADOR | DAY | 19/03/2025 18:14 | PLATFORM | FLOTEL | A |
NAME_2 | COMPANY_1 | CALDEIREIRO | DAY | 19/03/2025 01:26 | FLOTEL | PLATFORM | A |
NAME_2 | COMPANY_1 | CALDEIREIRO | DAY | 19/03/2025 06:48 | PLATFORM | FLOTEL | A |
NAME_2 | COMPANY_1 | CALDEIREIRO | DAY | 19/03/2025 18:56 | FLOTEL | PLATFORM | A |
NAME_2 | COMPANY_1 | CALDEIREIRO | DAY | 19/03/2025 23:48 | PLATFORM | FLOTEL | A |
NAME_3 | COMPANY_1 | PINTOR INDUSTRIAL | DAY | 19/03/2025 05:55 | FLOTEL | PLATFORM | A |
NAME_3 | COMPANY_1 | PINTOR INDUSTRIAL | DAY | 19/03/2025 11:25 | PLATFORM | FLOTEL | A |
NAME_3 | COMPANY_1 | PINTOR INDUSTRIAL | DAY | 19/03/2025 13:09 | FLOTEL | PLATFORM | A |
NAME_3 | COMPANY_1 | PINTOR INDUSTRIAL | DAY | 19/03/2025 17:49 | PLATFORM | FLOTEL | A |
NAME_4 | COMPANY_1 | PINTOR ESCALADOR | DAY | 19/03/2025 05:50 | FLOTEL | PLATFORM | A |
NAME_4 | COMPANY_1 | PINTOR ESCALADOR | DAY | 19/03/2025 11:26 | PLATFORM | FLOTEL | A |
NAME_4 | COMPANY_1 | PINTOR ESCALADOR | DAY | 19/03/2025 13:05 | FLOTEL | PLATFORM | A |
NAME_4 | COMPANY_1 | PINTOR ESCALADOR | DAY | 19/03/2025 17:51 | PLATFORM | FLOTEL | A |
Gangway 20.03.25.csv
Full Name | Employer | Occupation | Shift | Movement Time | From Location | To Location | Type |
NAME_1 | COMPANY_1 | COORDENADOR | DAY | 20/03/2025 06:55 | FLOTEL | PLATFORM | A |
NAME_1 | COMPANY_1 | COORDENADOR | DAY | 20/03/2025 07:24 | PLATFORM | FLOTEL | A |
NAME_2 | COMPANY_1 | CALDEIREIRO | DAY | 20/03/2025 01:26 | FLOTEL | PLATFORM | A |
NAME_2 | COMPANY_1 | CALDEIREIRO | DAY | 20/03/2025 06:50 | PLATFORM | FLOTEL | A |
NAME_2 | COMPANY_1 | CALDEIREIRO | DAY | 20/03/2025 19:01 | FLOTEL | PLATFORM | A |
NAME_2 | COMPANY_1 | CALDEIREIRO | DAY | 20/03/2025 23:43 | PLATFORM | FLOTEL | A |
NAME_346 | COMPANY_1 | MONTADOR DE ANDAIME | DAY | 20/03/2025 15:52 | FLOTEL | PLATFORM | A |
NAME_346 | COMPANY_1 | MONTADOR DE ANDAIME | DAY | 20/03/2025 17:48 | PLATFORM | FLOTEL | A |
NAME_3 | COMPANY_1 | PINTOR INDUSTRIAL | DAY | 20/03/2025 05:54 | FLOTEL | PLATFORM | A |
NAME_3 | COMPANY_1 | PINTOR INDUSTRIAL | DAY | 20/03/2025 11:27 | PLATFORM | FLOTEL | A |
NAME_3 | COMPANY_1 | PINTOR INDUSTRIAL | DAY | 20/03/2025 13:07 | FLOTEL | PLATFORM | A |
NAME_3 | COMPANY_1 | PINTOR INDUSTRIAL | DAY | 20/03/2025 17:49 | PLATFORM | FLOTEL | A |
NAME_4 | COMPANY_1 | PINTOR ESCALADOR | DAY | 20/03/2025 05:54 | FLOTEL | PLATFORM | A |
NAME_4 | COMPANY_1 | PINTOR ESCALADOR | DAY | 20/03/2025 11:25 | PLATFORM | FLOTEL | A |
NAME_4 | COMPANY_1 | PINTOR ESCALADOR | DAY | 20/03/2025 13:03 | FLOTEL | PLATFORM | A |
NAME_4 | COMPANY_1 | PINTOR ESCALADOR | DAY | 20/03/2025 17:48 | PLATFORM | FLOTEL | A |
Gangway 21.03.25.csv
Full Name | Employer | Occupation | Shift | Movement Time | From Location | To Location | Type |
NAME_247 | COMPANY_2 | SEGURANCA PATRIMONIAL | DAY | 21/03/2025 00:04 | PLATFORM | FLOTEL | A |
NAME_131 | COMPANY_8 | OPERADOR DE GUINDASTE | DAY | 21/03/2025 00:52 | FLOTEL | PLATFORM | A |
NAME_114 | COMPANY_8 | SUPERVISOR DE MOV. DE CARGA | DAY | 21/03/2025 01:05 | FLOTEL | PLATFORM | A |
NAME_224 | COMPANY_1 | SOLDADOR ESCALADOR | DAY | 21/03/2025 01:05 | FLOTEL | PLATFORM | A |
NAME_266 | COMPANY_8 | AUX. DE MOV. DE CARGA | DAY | 21/03/2025 01:07 | FLOTEL | PLATFORM | A |
NAME_141 | COMPANY_8 | MESTRE DE CABOTAGEM | DAY | 21/03/2025 01:07 | FLOTEL | PLATFORM | A |
NAME_235 | COMPANY_8 | AUX. DE MOV. DE CARGA | DAY | 21/03/2025 01:07 | FLOTEL | PLATFORM | A |
NAME_327 | COMPANY_1 | PINTOR ESCALADOR | DAY | 21/03/2025 01:11 | FLOTEL | PLATFORM | A |
NAME_329 | COMPANY_1 | ESCALADOR | DAY | 21/03/2025 01:12 | FLOTEL | PLATFORM | A |
NAME_368 | COMPANY_1 | PINTOR INDUSTRIAL | DAY | 21/03/2025 01:13 | FLOTEL | PLATFORM | A |
NAME_369 | COMPANY_1 | AJUDANTE | DAY | 21/03/2025 01:13 | FLOTEL | PLATFORM | A |
NAME_90 | COMPANY_1 | CALDEIREIRO | DAY | 21/03/2025 01:14 | FLOTEL | PLATFORM | A |
NAME_28 | COMPANY_1 | SUPERVISOR ESCALADOR | DAY | 21/03/2025 01:15 | FLOTEL | PLATFORM | A |
NAME_97 | COMPANY_1 | PINTOR | DAY | 21/03/2025 01:15 | FLOTEL | PLATFORM | A |
NAME_219 | COMPANY_1 | PINTOR | DAY | 21/03/2025 01:16 | FLOTEL | PLATFORM | A |
If you need more information I will add it in the next posts.
Thank you very much for your availability.
let
Source = Folder.Files("C:\FOLDER"),
#"Filtered Rows" = Table.SelectRows(Source, each Text.StartsWith([Name], "Gangway") and Text.EndsWith([Name], ".csv")),
Data = Table.AddColumn(#"Filtered Rows","Data", each Table.PromoteHeaders(Csv.Document([Content],[Delimiter="#(tab)", Columns=8, Encoding=65001, QuoteStyle=QuoteStyle.Csv]), [PromoteAllScalars=true])),
#"Removed Other Columns" = Table.SelectColumns(Data,{"Name", "Data"}),
#"Expanded Data" = Table.ExpandTableColumn(#"Removed Other Columns", "Data", Table.ColumnNames(#"Removed Other Columns"{0}[Data]))
in
#"Expanded Data"
I had already done these data loading steps, but what I need help with is to process them so that I can calculate the time each worker spent on the PLATFORM. Note that there are two columns, From Location and To Location, which indicate whether the worker is going to or returning from the platform, in addition to the Movement Time column, which indicates when the movement occurred. I need to subtract the data in the Movement Time column from the next row by the current row. To do this, I was duplicating the column and making all the values move to the previous row, and it was at this point that the data loading time started to slow down. I believe this is due to the large amount of calculations that Power Query needed to do when working with indexes.
The image above shows what I was doing.
To do this, I was duplicating the column and making all the values move to the previous row,
Your sample data doesn't support this. What is the expected outcome for NAME_247 ?
Remember that Power Query does not guarantee any sort order.
Your sample data doesn't support this. What is the expected outcome for NAME_247 ?
This is a case in which the worker will only return the next day, that is, his/her return will only be recorded in the next report. I only gave you examples covering three days, but this case may happen in the last report received, after all, we have night shifts that start at 7:00 pm and end at 7:00 am.
Please provide sample data that fully covers your issue.
Please show the expected outcome based on the sample data you provided.
The examples above cover exactly the scenario I have, it is unlikely that all workers will return to FLOTEL because the report starts at 0:00 a.m. and ends at 11:59 p.m. and work shifts are from 7:00 a.m. to 7 p.m. and 7 p.m. to 7 a.m. In cases where the worker goes from FLOTEL to PLATFORM and there is no return record, the output may be null or 0.
But please don't get hung up on the calculation itself or logic, I just need some help to the point of having the two columns with the departure and return times of the PLATFORM, from then on I know how to proceed with all the particularities. In the calculation I will check if the next line has the same worker record as the current line, if so it does the duration calculation, otherwise it returns null or 0.
Your problem is much more fundamental. You are trying to use Power BI for something it is not designed for - divining the relationship between events based on potentially incomplete sampling data.
In the real world they may not swipe in or swipe out reliably, and you are left with scenarios where they left the building that they never entered (mysterious) or they entered and never left (troublesome) and all kinds of other scenarios. That's just not something Power BI can help you with.
Cases where the worker may not have registered or there may have been a failure in the entry or exit registration will be validated, this logic and the calculation I already have. Remember that the problem I was facing was the slowness and the large number of bytes that were being loaded when loading the data because of the use of readings in subsequent rows in the same query.
I've already managed to resolve the situation. What I did was duplicate the query, then I added a column with a 1-based index to the first query and a 0-based index to the second query and I merged the queries based on the indexes, taking only the "Movement Time" column and that's it.