Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Compete to become Power BI Data Viz World Champion! First round ends August 18th. Get started.

Reply
DelanoLima
Frequent Visitor

Loading data too large

Hello!

I'm facing a problem that at first doesn't make sense to me.

I uploaded 13 files of about 200Kb each to Power Query. After processing the data, when loading to PowerBI, it takes a long time and by the time I waited, it reached over 600 MB.

Has anyone experienced this and could help me resolve this situation?

 

DelanoLima_0-1743528411890.jpeg

 

DelanoLima_1-1743528420694.jpeg

 

1 ACCEPTED SOLUTION

Cases where the worker may not have registered or there may have been a failure in the entry or exit registration will be validated, this logic and the calculation I already have. Remember that the problem I was facing was the slowness and the large number of bytes that were being loaded when loading the data because of the use of readings in subsequent rows in the same query.

I've already managed to resolve the situation. What I did was duplicate the query, then I added a column with a 1-based index to the first query and a 0-based index to the second query and I merged the queries based on the indexes, taking only the "Movement Time" column and that's it.

View solution in original post

14 REPLIES 14
lbendlin
Super User
Super User

That is not necessarily a large size. But I see you have chosen the "Combine Binaries"  feature. That is not optimal, and you can improve the ingestion by re-implementing the process and by using CSV files instead of XLSX (if possible).

I was testing loading the data at each step and the problem starts after this second line.
What is done is to generate an index column and create a new column that receives the data from the next row in the current row based on the index.

 

#"Índice Adicionado" = Table.AddIndexColumn(#"Linhas Filtradas", "Índice", 1, 1, Int64.Type),
#"Tempo do Próximo Movimento" = Table.AddColumn(#"Índice Adicionado", "Next Movement Time", each try #"Índice Adicionado"[Movement Time]{[Índice]} otherwise null),

 
The full code is this:

let
    Fonte = Folder.Files("C:\FOLDER"),
    #"Arquivos Ocultos Filtrados1" = Table.SelectRows(Fonte, each [Attributes]?[Hidden]? <> true),
    #"Invocar Função Personalizada1" = Table.AddColumn(#"Arquivos Ocultos Filtrados1", "Transformar Arquivo (2)", each #"Transformar Arquivo (2)"([Content])),
    #"Colunas Renomeadas1" = Table.RenameColumns(#"Invocar Função Personalizada1", {"Name", "Nome da Origem"}),
    #"Outras Colunas Removidas1" = Table.SelectColumns(#"Colunas Renomeadas1", {"Nome da Origem", "Transformar Arquivo (2)"}),
    #"Coluna de Tabela Expandida1" = Table.ExpandTableColumn(#"Outras Colunas Removidas1", "Transformar Arquivo (2)", Table.ColumnNames(#"Transformar Arquivo (2)"(#"Arquivo de Amostra (2)"))),
    #"Linhas Filtradas" = Table.SelectRows(#"Coluna de Tabela Expandida1", each ([Column1] <> "" and [Column1] <> "Total Movement: 1676")),
    #"Cabeçalhos Promovidos" = Table.PromoteHeaders(#"Linhas Filtradas", [PromoteAllScalars=true]),
    #"Outras Colunas Removidas" = Table.SelectColumns(#"Cabeçalhos Promovidos",{"Last Name", "First Name", "Employer", "Occupation", "Shift", "Movement Time", "From Location", "To Location", "Type"}),
    #"Colunas Mescladas" = Table.CombineColumns(#"Outras Colunas Removidas",{"First Name", "Last Name"},Combiner.CombineTextByDelimiter(" ", QuoteStyle.None),"Full Name"),
    #"Tipo Alterado" = Table.TransformColumnTypes(#"Colunas Mescladas",{{"Movement Time", type datetime}}),
    #"Linhas Classificadas" = Table.Sort(#"Tipo Alterado",{{"Full Name", Order.Ascending}, {"Movement Time", Order.Ascending}}),
    #"Linhas Filtradas1" = Table.SelectRows(#"Linhas Classificadas", each ([Type] = "A")),
    #"Índice Adicionado" = Table.AddIndexColumn(#"Linhas Filtradas1", "Índice", 1, 1, Int64.Type),
 #"Tempo do Próximo Movimento" = Table.AddColumn(#"Índice Adicionado", "Next Movement Time", each try #"Índice Adicionado"[Movement Time]{[Índice]} otherwise null),
    #"Permanência Adicionada" = Table.AddColumn(#"Tempo do Próximo Movimento", "Tempo Permanência", each 
        if [To Location] = "PMXL1" and [From Location] = "AQUARIUS BRASIL" then Duration.TotalMinutes([Next Movement Time] - [Movement Time]) else null),
    #"Linhas Filtradas1" = Table.SelectRows(#"Permanência Adicionada", each ([Tempo Permanência] <> null)),
    Personalizar1 = Table.Group(#"Linhas Filtradas1", {"Full Name", "Movement Time"}, {{"Tempo Total (Minutos)", each List.Sum([Tempo Permanência]), type number}})
in
    Personalizar1

 

DelanoLima_0-1743540357295.png

 

try 

#"Índice Adicionado" = Table.Buffer(Table.AddIndexColumn(#"Linhas Filtradas", "Índice", 1, 1, Int64.Type)),
RC = Table.RowCount(#"Índice Adicionado"),
#"Tempo do Próximo Movimento" = Table.AddColumn(#"Índice Adicionado", "Next Movement Time", each if [Índice]<RC then #"Índice Adicionado"[Movement Time]{[Índice]} else null),

Or - use the OFFSET function in DAX.

I changed it to use the conditional but I got the same result. I think the try handles the error when it reaches the last line. I don't know in terms of performance which would be the best way.

I really think I need to try with DAX.

If you have any other solution proposals I would be very grateful.

Thanks!!!

Please provide sample data that covers your issue or question completely, in a usable format (not as a screenshot).
Do not include sensitive information. Do not include anything that is unrelated to the issue or question.
Please show the expected outcome based on the sample data you provided.

Need help uploading data? https://community.fabric.microsoft.com/t5/Community-Blog/How-to-provide-sample-data-in-the-Power-BI-...
Want faster answers? https://community.fabric.microsoft.com/t5/Desktop/How-to-Get-Your-Question-Answered-Quickly/m-p/1447...

I basically need to calculate the time an employee spends on the platform each time he/she goes to the platform. The data source will be a folder where files with daily access control records will be stored daily. It is important to consider that the employee may go to the platform one day and return only the next day.

 

Here are examples of daily records:

Gangway 19.03.25.csv

Full NameEmployerOccupationShiftMovement TimeFrom LocationTo LocationType
NAME_1COMPANY_1COORDENADORDAY19/03/2025 16:47FLOTELPLATFORMA
NAME_1COMPANY_1COORDENADORDAY19/03/2025 18:14PLATFORMFLOTELA
NAME_2COMPANY_1CALDEIREIRODAY19/03/2025 01:26FLOTELPLATFORMA
NAME_2COMPANY_1CALDEIREIRODAY19/03/2025 06:48PLATFORMFLOTELA
NAME_2COMPANY_1CALDEIREIRODAY19/03/2025 18:56FLOTELPLATFORMA
NAME_2COMPANY_1CALDEIREIRODAY19/03/2025 23:48PLATFORMFLOTELA
NAME_3COMPANY_1PINTOR INDUSTRIALDAY19/03/2025 05:55FLOTELPLATFORMA
NAME_3COMPANY_1PINTOR INDUSTRIALDAY19/03/2025 11:25PLATFORMFLOTELA
NAME_3COMPANY_1PINTOR INDUSTRIALDAY19/03/2025 13:09FLOTELPLATFORMA
NAME_3COMPANY_1PINTOR INDUSTRIALDAY19/03/2025 17:49PLATFORMFLOTELA
NAME_4COMPANY_1PINTOR ESCALADORDAY19/03/2025 05:50FLOTELPLATFORMA
NAME_4COMPANY_1PINTOR ESCALADORDAY19/03/2025 11:26PLATFORMFLOTELA
NAME_4COMPANY_1PINTOR ESCALADORDAY19/03/2025 13:05FLOTELPLATFORMA
NAME_4COMPANY_1PINTOR ESCALADORDAY19/03/2025 17:51PLATFORMFLOTELA

 

Gangway 20.03.25.csv

Full NameEmployerOccupationShiftMovement TimeFrom LocationTo LocationType
NAME_1COMPANY_1COORDENADORDAY20/03/2025 06:55FLOTELPLATFORMA
NAME_1COMPANY_1COORDENADORDAY20/03/2025 07:24PLATFORMFLOTELA
NAME_2COMPANY_1CALDEIREIRODAY20/03/2025 01:26FLOTELPLATFORMA
NAME_2COMPANY_1CALDEIREIRODAY20/03/2025 06:50PLATFORMFLOTELA
NAME_2COMPANY_1CALDEIREIRODAY20/03/2025 19:01FLOTELPLATFORMA
NAME_2COMPANY_1CALDEIREIRODAY20/03/2025 23:43PLATFORMFLOTELA
NAME_346COMPANY_1MONTADOR DE ANDAIMEDAY20/03/2025 15:52FLOTELPLATFORMA
NAME_346COMPANY_1MONTADOR DE ANDAIMEDAY20/03/2025 17:48PLATFORMFLOTELA
NAME_3COMPANY_1PINTOR INDUSTRIALDAY20/03/2025 05:54FLOTELPLATFORMA
NAME_3COMPANY_1PINTOR INDUSTRIALDAY20/03/2025 11:27PLATFORMFLOTELA
NAME_3COMPANY_1PINTOR INDUSTRIALDAY20/03/2025 13:07FLOTELPLATFORMA
NAME_3COMPANY_1PINTOR INDUSTRIALDAY20/03/2025 17:49PLATFORMFLOTELA
NAME_4COMPANY_1PINTOR ESCALADORDAY20/03/2025 05:54FLOTELPLATFORMA
NAME_4COMPANY_1PINTOR ESCALADORDAY20/03/2025 11:25PLATFORMFLOTELA
NAME_4COMPANY_1PINTOR ESCALADORDAY20/03/2025 13:03FLOTELPLATFORMA
NAME_4COMPANY_1PINTOR ESCALADORDAY20/03/2025 17:48PLATFORMFLOTELA

 

Gangway 21.03.25.csv

Full NameEmployerOccupationShiftMovement TimeFrom LocationTo LocationType
NAME_247COMPANY_2SEGURANCA PATRIMONIALDAY21/03/2025 00:04PLATFORMFLOTELA
NAME_131COMPANY_8OPERADOR DE GUINDASTEDAY21/03/2025 00:52FLOTELPLATFORMA
NAME_114COMPANY_8SUPERVISOR DE MOV. DE CARGADAY21/03/2025 01:05FLOTELPLATFORMA
NAME_224COMPANY_1SOLDADOR ESCALADORDAY21/03/2025 01:05FLOTELPLATFORMA
NAME_266COMPANY_8AUX. DE MOV. DE CARGADAY21/03/2025 01:07FLOTELPLATFORMA
NAME_141COMPANY_8MESTRE DE CABOTAGEMDAY21/03/2025 01:07FLOTELPLATFORMA
NAME_235COMPANY_8AUX. DE MOV. DE CARGADAY21/03/2025 01:07FLOTELPLATFORMA
NAME_327COMPANY_1PINTOR ESCALADORDAY21/03/2025 01:11FLOTELPLATFORMA
NAME_329COMPANY_1ESCALADORDAY21/03/2025 01:12FLOTELPLATFORMA
NAME_368COMPANY_1PINTOR INDUSTRIALDAY21/03/2025 01:13FLOTELPLATFORMA
NAME_369COMPANY_1AJUDANTEDAY21/03/2025 01:13FLOTELPLATFORMA
NAME_90COMPANY_1CALDEIREIRODAY21/03/2025 01:14FLOTELPLATFORMA
NAME_28COMPANY_1SUPERVISOR ESCALADORDAY21/03/2025 01:15FLOTELPLATFORMA
NAME_97COMPANY_1PINTORDAY21/03/2025 01:15FLOTELPLATFORMA
NAME_219COMPANY_1PINTORDAY21/03/2025 01:16FLOTELPLATFORMA

 

If you need more information I will add it in the next posts.

Thank you very much for your availability.

let
    Source = Folder.Files("C:\FOLDER"),
    #"Filtered Rows" = Table.SelectRows(Source, each Text.StartsWith([Name], "Gangway") and Text.EndsWith([Name], ".csv")),
    Data = Table.AddColumn(#"Filtered Rows","Data", each Table.PromoteHeaders(Csv.Document([Content],[Delimiter="#(tab)", Columns=8, Encoding=65001, QuoteStyle=QuoteStyle.Csv]), [PromoteAllScalars=true])),
    #"Removed Other Columns" = Table.SelectColumns(Data,{"Name", "Data"}),
    #"Expanded Data" = Table.ExpandTableColumn(#"Removed Other Columns", "Data", Table.ColumnNames(#"Removed Other Columns"{0}[Data]))
in
    #"Expanded Data"

I had already done these data loading steps, but what I need help with is to process them so that I can calculate the time each worker spent on the PLATFORM. Note that there are two columns, From Location and To Location, which indicate whether the worker is going to or returning from the platform, in addition to the Movement Time column, which indicates when the movement occurred. I need to subtract the data in the Movement Time column from the next row by the current row. To do this, I was duplicating the column and making all the values ​​move to the previous row, and it was at this point that the data loading time started to slow down. I believe this is due to the large amount of calculations that Power Query needed to do when working with indexes.

DelanoLima_0-1743627654009.png

The image above shows what I was doing.

To do this, I was duplicating the column and making all the values ​​move to the previous row,

Your sample data doesn't support this. What is the expected outcome for NAME_247 ?

lbendlin_0-1743629247773.png

Remember that Power Query does not guarantee any sort order.

Your sample data doesn't support this. What is the expected outcome for NAME_247 ?

This is a case in which the worker will only return the next day, that is, his/her return will only be recorded in the next report. I only gave you examples covering three days, but this case may happen in the last report received, after all, we have night shifts that start at 7:00 pm and end at 7:00 am.

Please provide sample data that fully covers your issue.
Please show the expected outcome based on the sample data you provided.

The examples above cover exactly the scenario I have, it is unlikely that all workers will return to FLOTEL because the report starts at 0:00 a.m. and ends at 11:59 p.m. and work shifts are from 7:00 a.m. to 7 p.m. and 7 p.m. to 7 a.m. In cases where the worker goes from FLOTEL to PLATFORM and there is no return record, the output may be null or 0.

 

But please don't get hung up on the calculation itself or logic, I just need some help to the point of having the two columns with the departure and return times of the PLATFORM, from then on I know how to proceed with all the particularities. In the calculation I will check if the next line has the same worker record as the current line, if so it does the duration calculation, otherwise it returns null or 0.

Your problem is much more fundamental. You are trying to use Power BI for something it is not designed for - divining the relationship between events based on potentially incomplete sampling data.

 

In the real world they may not swipe in or swipe out reliably, and you are left with scenarios where they left the building that they never entered (mysterious) or they entered and never left (troublesome) and all kinds of other scenarios.  That's just not something Power BI can help you with.

Cases where the worker may not have registered or there may have been a failure in the entry or exit registration will be validated, this logic and the calculation I already have. Remember that the problem I was facing was the slowness and the large number of bytes that were being loaded when loading the data because of the use of readings in subsequent rows in the same query.

I've already managed to resolve the situation. What I did was duplicate the query, then I added a column with a 1-based index to the first query and a 0-based index to the second query and I merged the queries based on the indexes, taking only the "Movement Time" column and that's it.

Helpful resources

Announcements
August Power BI Update Carousel

Power BI Monthly Update - August 2025

Check out the August 2025 Power BI update to learn about new features.

August 2025 community update carousel

Fabric Community Update - August 2025

Find out what's new and trending in the Fabric community.

Top Solution Authors