Solved: Re: Combining files and updating matching rows

ctaylor · ‎05-15-2020

Hello,

I have an excel file that contains a data dump and a weekly emailed file that contains new/updated information. The data is an export of activity from a help desk, so each week there will be new incidents as well as updates to incidents created previously. That email is going to go into a Sharepoint folder and will get loaded in each time a file is received.

I set my datasource in PowerBI to folder and connected to said folder. Set my sample file as the bulk data file, transformed and set up the helper query to load and combine the files. It's working and I see a few instances of count > 1 for incident number.

What do I need to do to be able to overwrite/update incident numbers with data from the most recent file received so its 1 incident 1 record (being the most recent information).

Let me know if there are any screenshots or more data I can provide.

v-lid-msft · ‎05-18-2020

Hi @ctaylor ,

We can use the following steps to meet your requirement, if your excel has same data construction:

1. keep the content and "data modified" column and remove other

2. expand and combine the content as usual

3. group by the number column and make other as all rows.

4. create a custom column:

Table.Max([Data],{"Data Modified"})

5. remove the data column and expand the New Rows Column

6. result as following:

All queries are here (just a sample, did not include the combine file step)

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUTKFYiMDIwNdU11DBUMDK2MDKwMDpVidaCUjoIwZFGNXYQyUAWETvCrSoKqgKoxQVZhAZfG7A2RGBnYzYgE=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Number = _t, Location = _t, #"Updated by" = _t, #"Data Modified" = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Number", Int64.Type}, {"Location", type text}, {"Updated by", type text}, {"Data Modified", type datetime}}),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"Number"}, {{"Data", each _, type table [Number=number, Location=text, Updated by=text, Data Modified=datetime]}}),
    #"Added Custom" = Table.AddColumn(#"Grouped Rows", "Newest Rows", each Table.Max([Data],{"Data Modified"})),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Data"}),
    #"Expanded Newest Rows" = Table.ExpandRecordColumn(#"Removed Columns", "Newest Rows", {"Location", "Updated by", "Data Modified"})
in
    #"Expanded Newest Rows"

By the way, PBIX file as attached.

Best regards,

Community Support Team _ Dong Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

edhans · ‎05-15-2020

How to get good help fast. Help us help you.
How to Get Your Question Answered Quickly
How to provide sample data in the Power BI Forum

Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling

Proud to be a Super User!

MCSA: BI Reporting

camargos88 · ‎05-15-2020

@ctaylor ,

Can you give examples of it ?

Did I answer your question? Mark my post as a solution!

Proud to be a Super User!

ctaylor · ‎05-15-2020

@camargos88

I'm not exactly what to provide in this case because im more asking a conceptual question of once you get your files all setup, and the combine from folder operation is happening, how do you match the primary keys so that the more recent version of the record is kept and the old version discarded.

What can I provide for you to help answer that question?

Screenshots of the current files that get loaded?

Sample of some headers/data in the file?

The loading code?

let
    Source = Folder.Files("C:\Users\--\ServiceNow Files"),
    #"Filtered Hidden Files1" = Table.SelectRows(Source, each [Attributes]?[Hidden]? <> true),
    #"Invoke Custom Function1" = Table.AddColumn(#"Filtered Hidden Files1", "Transform File", each #"Transform File"([Content])),
    #"Renamed Columns1" = Table.RenameColumns(#"Invoke Custom Function1", {"Name", "Source.Name"}),
    #"Removed Other Columns1" = Table.SelectColumns(#"Renamed Columns1", {"Source.Name", "Transform File"}),
    #"Expanded Table Column1" = Table.ExpandTableColumn(#"Removed Other Columns1", "Transform File", Table.ColumnNames(#"Transform File"(#"Sample File"))),
    #"Changed Type" = Table.TransformColumnTypes(#"Expanded Table Column1",{{"Source.Name", type text}, {"Number", type text}, {"Created", type datetime}, {"Caller", type text}, {"Short description", type text}, {"Category", type text}, {"Subcategory", type text}, {"Priority", type text}, {"State", type text}, {"Assignment Group", type text}, {"Assigned to", type text}, {"Updated by", type text}, {"Reassignment count", Int64.Type}, {"Location", type text}, {"Actual resolution", type datetime}})
in
    #"Changed Type"

Thanks for your interest in this.

camargos88 · ‎05-15-2020

@ctaylor ,

Do you wanna to get only the last row per file/number by created date ?

Did I answer your question? Mark my post as a solution!

Proud to be a Super User!

ctaylor · ‎05-15-2020

@camargos88

No

I am starting with a folder and a bulk data file of 3 years worth of data. Again, the information is from a helpdesk so each incident number should be 1 record. I will be getting a weekly file that will be dropped into a folder, transformed and loaded into a single table in PBI. Because it needs to be 1 incident number and 1 record, there will be cases where the previously loaded file will contain "open" incidents that in the next sequential file will contain updates to that record along with brand new incident information.

So, if the incident number is a new number then a record is inserted. If the incident number is a duplicate, the most recent information should be kept and the older record discarded. This process should iterate through all files in a folder sequentially.

Thanks

camargos88 · ‎05-15-2020

@ctaylor ,

Check this file: Download PBIX

Did I answer your question? Mark my post as a solution!

Proud to be a Super User!

ctaylor · ‎05-15-2020

@camargos88

I tried to apply that logic ot my model and I think it works on a small level but against a much larget set of data this does some interesting things. First it takes a really long time to generate even a preview, like +5 minutes. I imagine that this is only going to get worse as more files are added each week. It also inflates the filesize by over 200x. The file is only 3 MB but as its trying to extrapolate the data here is what i see.

(I've taken screenshots 3 times now the filesize just keeps growing)

I'm not really sure whats happening there.

camargos88 · ‎05-15-2020

@ctaylor ,

Try it without the preview, just tap on close & apply and check how long it takes.

Are you getting those files locally or from network/internet ?

Did I answer your question? Mark my post as a solution!

Proud to be a Super User!

ctaylor · ‎05-15-2020

@camargos88

The files are local currently but will be moved to Sharepoint once i get this figured out.

Btw the filesize its loaded is now 1.33 GB and still going.

If I try to close and apply it ends up crashing out PowerBI.

This does actually work though.

The if statement works as it should but whatever is happening to the data size as it's running this comparison is not going to be feasible.

v-lid-msft · ‎05-18-2020

Hi @ctaylor ,

We can use the following steps to meet your requirement, if your excel has same data construction:

1. keep the content and "data modified" column and remove other

2. expand and combine the content as usual

3. group by the number column and make other as all rows.

4. create a custom column:

Table.Max([Data],{"Data Modified"})

5. remove the data column and expand the New Rows Column

6. result as following:

All queries are here (just a sample, did not include the combine file step)

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUTKFYiMDIwNdU11DBUMDK2MDKwMDpVidaCUjoIwZFGNXYQyUAWETvCrSoKqgKoxQVZhAZfG7A2RGBnYzYgE=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Number = _t, Location = _t, #"Updated by" = _t, #"Data Modified" = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Number", Int64.Type}, {"Location", type text}, {"Updated by", type text}, {"Data Modified", type datetime}}),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"Number"}, {{"Data", each _, type table [Number=number, Location=text, Updated by=text, Data Modified=datetime]}}),
    #"Added Custom" = Table.AddColumn(#"Grouped Rows", "Newest Rows", each Table.Max([Data],{"Data Modified"})),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Data"}),
    #"Expanded Newest Rows" = Table.ExpandRecordColumn(#"Removed Columns", "Newest Rows", {"Location", "Updated by", "Data Modified"})
in
    #"Expanded Newest Rows"

By the way, PBIX file as attached.

Best regards,

Community Support Team _ Dong Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

ctaylor · ‎05-18-2020

@v-lid-msft

I think this is exactly what I was looking for!

@camargos88

Thanks for your attempts on this issue!

camargos88 · ‎05-15-2020

@ctaylor ,

How does your machine handle just the import (remove the custom column added to check the max item).

Is it ok without it ?

Did I answer your question? Mark my post as a solution!

Proud to be a Super User!

ctaylor · ‎05-15-2020

@camargos88

Without the IF statement, both files load within seconds.

Like I said, the big one is only a 3MB file.

camargos88 · ‎05-15-2020

@ctaylor ,

Did you input the last step name here:

Also, you possible, send me the m code generated.

Did I answer your question? Mark my post as a solution!

Proud to be a Super User!

ctaylor · ‎05-15-2020

@camargos88

camargos88 · ‎05-15-2020

@ctaylor ,

Connect it directly to ServiceNow it not an option ?

Did I answer your question? Mark my post as a solution!

Proud to be a Super User!

ctaylor · ‎05-15-2020

@camargos88

No, not an option.

I guess I could just merge everything together in the load, index it, and just create a new table that contains distinct incident numbers and index by max created date.

So, is this just not a thing that powerquery can handle properly?

camargos88 · ‎05-15-2020

@ctaylor ,

I've updated the file.

I run it with 11.400 excel files and it took around 2 minutes to process them. Of course you may have file with more lines, however I think if you have half (or even less) of this amount of files, you should think about a stage process to handle it.

Did I answer your question? Mark my post as a solution!

Proud to be a Super User!

camargos88 · ‎05-15-2020

@ctaylor ,

I don't think this approach would change a lot, once you need to index it per number, it's a grouping, but it worths some tries.

I'm trying another approach here.

BTW, how many files are you testing it with ?

Did I answer your question? Mark my post as a solution!

Proud to be a Super User!

camargos88 · ‎05-15-2020

@ctaylor ,

So basically, you need to keep the most recent "created date" by number, right ?

Knowing that each file has a created date, we can keep the number with the information of the last file.

Did I answer your question? Mark my post as a solution!

Proud to be a Super User!

Combining files and updating matching rows

Helpful resources

Fabric Data Days

Power BI Monthly Update - October 2025

FabCon Atlanta 2026

Fabric Data Days starts November 4th!

Combining files and updating matching rows

Helpful resources

Fabric Data Days

Power BI Monthly Update - October 2025

FabCon Atlanta 2026