Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Don't miss out! 2025 Microsoft Fabric Community Conference, March 31 - April 2, Las Vegas, Nevada. Use code MSCUST for a $150 discount. Prices go up February 11th. Register now.

Reply
jjfr
Frequent Visitor

Best way to setup incremental refresh for large queries that need an index added

Hi All,

 

Initially I set up a staging query that refreshes but doesn't load in Desktop. This query brings the data in, cleans it a little and adds a unique Booking ID. That table is then referenced for numerous loaded queries - e.g. qryService, qryUser and I've applied incremental refresh on these. In Desktop incremental refresh has been working with this setup, but once published to the service I'm unable to refresh due to an error with the ID's duplicating. 

 

Am I right in thinking that because the ID's are added in a staging query, and the staging query isn't loaded to the model, the ID's could be attributed differently at each refresh - resulting in duplicate ID's being generated during refresh? And if so, what would you suggest is the best way to handle this, e.g.:

 

1. Loading the complete flat query (30 columns) including the step to add a Booking ID, and perform dimension ID transformations on that single huge query.

 

2. Duplicate the flat query import with added Booking ID step, to end up with 1 user query and 1 service query, and hope there are no anomalies with the Booking ID generation.

 

3. Any other suggestions?

 

I was trying to normalise the queries as much as possible given their huge size, but maybe I'm approaching this the wrong way.

 

Any help would be greatly appreciated.

4 REPLIES 4
v-fenling-msft
Community Support
Community Support

Thanks for Kedar_Pande's concern about this issue.

 

Hi, @jjfr 

May I ask what type of data source are you using please?


If your data source supports it, it might be possible to create unique booking IDs directly in the data source and then import that data into Power BI.This ensures that the IDs are consistent and unique across all refreshes and queries.


Or you can try writing custom M code in Power Query Editor to generate and maintain unique BookingIDs.

vfenlingmsft_0-1730184768415.png

 

vfenlingmsft_1-1730184791877.png

 

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WclTSUTI00Dcy1DcyMDJRMDSwAiIjY6VYHWxyRnjkjOFyThhyJnjkTPHJmcDknNHkDJHciSlnjEcOal8sAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Product = _t, Date = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Product", type text}, {"Date", type datetime}}),
    #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 1, 1, Int64.Type),
    #"Renamed Columns" = Table.RenameColumns(#"Added Index",{{"Index", "BookingID"}})
in
    #"Renamed Columns"

 

 

vfenlingmsft_2-1730184846318.png

 

 

I hope my suggestions give you good ideas, if you have any more questions, please clarify in a follow-up reply.
Best Regards,
Fen Ling,
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Hi Fen,

 

Thanks very much for your suggestions. Unfortunately the way we are loading data prevents us from creating IDs in the original source so we're hoping to handle it within Power Query.

 

In my original setup I created the ID's in a staging query using the Add Index Column method you have suggested. But as I wasn't loading the query (see original post) the ID's weren't generating correctly.

 

Following Kedar's advice I've since loaded the full query into my model, adding the ID's through power query. However I also need to pull 3 sub-tables from the full query where single columns hold multiple values. To do this, I duplicated the full query code up to the ID generation step and then performed any additional transformations for each sub-query. But there seems to be a problem with this approach, as the ID's aren't generating correctly, despite using the same code.

 

My next attempt will be to load the full query (including the columns needed for my sub-queries), then reference that query as the basis of the sub-queries. I'd prefer not to be storing duplicate data even if it is hidden, but it may be the quickest way for me to solve this at present. If you have any other suggestions though, they'd be greatly appreciated.

 

Thanks again

Kedar_Pande
Super User
Super User

@jjfr 

  1. Load the complete staging query (including the Booking ID) into the model. This will ensure that the IDs are consistently generated and available for reference in your other queries.
  2. Consider implementing a unique key generation method that combines multiple columns (e.g., date, user ID) to create a more stable Booking ID. This can help avoid duplicates even if the staging query isn’t loaded

💌 If this helped, a Kudos 👍 or Solution mark would be great! 🎉
Cheers,
Kedar
Connect on LinkedIn

Thanks so much Kedar. If I do load the staging table, would I then need to run incremental refresh on the staging table as well as the final tables? / Is it with normalising if I load it as I'll then have duplicate data running?

 

And would you expect the loading of this huge query to slow down my refresh as I'm already seeing issues with time outs (we're running this on Pro)?

 

Really appreciate your help here. 

Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!

Jan25PBI_Carousel

Power BI Monthly Update - January 2025

Check out the January 2025 Power BI update to learn about new features in Reporting, Modeling, and Data Connectivity.

December 2024

A Year in Review - December 2024

Find out what content was popular in the Fabric community during 2024.