March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early bird discount ends December 31.
Register NowBe one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now
Greetings, community. I have a scenario where I need several stored procedures to be able to insert data into the same table. Since Fabric doesn't have an Identity/Auto-increment solution yet, I am using the ROW_NUMBER trick (example here). This works fine in cases where I can run sequential loads, but I am now needing to run numerous inserts into the same table at the same time.
Since the ROW_NUMBER solution depends on getting the Max Key from the destination table, I believe concurrent inserts will conflict since two jobs may start at the same time and use the same max value to start the numbering. ☹️
Does anyone know of a possible way to ensure that duplicate numbers aren't generated? Right now, there's no real way of which I'm aware to do so.
Hi @arpost
Thanks for using Fabric Community.
Just wanted to share some information, if that could help you.
In Microsoft Fabric, you can generate unique identifiers in a warehouse table using a workaround technique .
Generate unique identifiers in a warehouse table in Microsoft Fabric - Microsoft Fabric | Microsoft ...
This method ensures that each row inserted into the table has a unique identifier, even when multiple stored procedures are inserting data into the same table concurrently.
Hope this helps. Please let me know if you have any further questions.
Hi @arpost
Thanks for using Fabric Community.
Just wanted to share some information, if that could help you.
In Microsoft Fabric, you can generate unique identifiers in a warehouse table using a workaround technique .
Generate unique identifiers in a warehouse table in Microsoft Fabric - Microsoft Fabric | Microsoft ...
This method ensures that each row inserted into the table has a unique identifier, even when multiple stored procedures are inserting data into the same table concurrently.
Hope this helps. Please let me know if you have any further questions.
@v-nikhilan-msft, thank you for that link. The issue is this doesn't ensure unique identifiers. If two jobs run concurrently, they insert duplicate keys. This workaround gets the max key and uses that for the row-numbering.
In the example below, the yellow items were script 1 and the red were script 2 executed almost simultaneously:
Hi @arpost
1) Have you considered a control table for storing and managing that key value?
2) I agree that for concurrent load requirement, you will need additional logic to maintain and check for the high water mark value before the insert. If this is a batch process, you can load your data in parallel threads into a stage table, and then have a transform process and use the ROW_NUMBER() over to transform (built IDENTITY column) into the final table. This will be much more simplified process and more optimal throughput.
Please try this and let me know if the issue still persists.
Thanks.
Can you expound a bit more on both #1 and #2? Trying to think through how that'd work or solve the issue.
Hi @arpost
#1
I presume you are working on a Warehouse. Given that delta tables support transactions (Transactions in Warehouse tables - Microsoft Fabric | Microsoft Learn), you can use a (control) table just to persist the max key, using the strategy of your preferences.
Just a simple example: Lets suppose you have a table ControlTable with a column id. The following sp will update the id and return the new value atomically, therefore all the processes must invoke this sp to get their unique id.
CREATE PROC [dbo].[GetMaxId]
@currentID INT OUTPUT
AS
BEGIN
-- Start a new transaction
BEGIN TRANSACTION;
-- Set the transaction isolation level to SERIALIZABLE
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
-- Get the current ID
SELECT @currentID = id FROM dbo.ControlTable;
-- Increment the ID
SET @currentID = @currentID + 1;
-- Update the table
UPDATE dbo.ControlTable SET id = @currentID;
-- Commit the transaction
COMMIT TRANSACTION;
END
Another option would be using GUID instead of a number if the data type is not strictly necessary. You can get a unique identifier by running SELECT NEWID();
Hope this helps. Please let me know if you have any further questions.
Is the use of the Serializable isolation level above supported in Fabric, though? Per this doc, only snapshot isolation level is supported; all others are ignored at time of query execution:
https://learn.microsoft.com/en-us/fabric/data-warehouse/transactions#transactional-capabilities
I've thought about GUIDs, but my concerns are two-fold:
Hi @arpost
The internal team has replied as follows:
Good catch... it appears that the customer is employing an OLTP-like process to populate the warehouse. While warehouses prioritize performance over integrity, it might be beneficial for them to reassess their hydration process and make it sequential to avoid these conflicts. Here is another example but I am afraid it may show the same issue under concurrent calls.
https://learn.microsoft.com/en-us/fabric/data-warehouse/generate-unique-identifiers
I would recommend the approach I mentioned earlier, in which the stored procedures can concurrently load into a table, and then CTAS (historical load)/INSERT(incremental) into the final table with the ROW_NUMBER.
Hope this helps.
I appreciate the reply, but that approach still introduces a lot of complications around job sequencing. Do we know if/when Microsoft is planning to add some kind of identity feature?
There are often cases where you need to have your primary keys (PK) generated in order to use those PKs as foreign keys (FK) in dependent tables. The lack of an auto-generated identity means all of your pipelines have to be built in such a way that they insert into the PK table, wait until all other loads complete, and THEN resume with inserts into the dependent tables that need to use the PK as an FK. To make matters worse, that means unrelated jobs have to wait on each other (File Type A's load is held up by File Type B's load).
Typically, any given pipeline would run through steps of a process for a particular ELT job. So the logic would be something like this:
Having to assign PKs at the end for the final table load introduces a huge amount of overhead if you have, say, 15 different data feeds that all insert into the same tables for #2 and #3.
@v-nikhilan-msft, any update on this? Also, I saw mentioned somewhere that some kind of Identity/Primary Key functionality was planned for 2024. Can you confirm this is in Microsoft's scope? I didn't see this listed in any release plan docs.
Hi @arpost
We haven’t heard from you on the last response and was just checking back to see if your query got resolved. Otherwise, will respond back with the more details and we will try to help.
Thanks
Hi @arpost
We haven’t heard from you on the last response and was just checking back to see if your query got resolved. Otherwise, will respond back with the more details and we will try to help.
Thanks
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!
Your insights matter. That’s why we created a quick survey to learn about your experience finding answers to technical questions.
Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.
User | Count |
---|---|
8 | |
3 | |
2 | |
1 | |
1 |
User | Count |
---|---|
8 | |
6 | |
5 | |
4 | |
4 |