Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Grow your Fabric skills and prepare for the DP-600 certification exam by completing the latest Microsoft Fabric challenge.

Reply
Element115
Power Participant
Power Participant

QUESTION::COPY DATA::INCREMENTAL APPEND

Is there an ETA for when incremental append will become available for the Copy data activity?  For both on-prem and cloud sources?

 

I am wondering if I should start implementing this by hand only to have to change all my pipelines in a couple of weeks.

1 ACCEPTED SOLUTION
Element115
Power Participant
Power Participant

EDIT:  re the discussion of the capitalization of the JSON key refering to the script return value, I just realized the system is using the alias name the user provides in the SELECT statement; I thought  the generated JSON would simply contain a generic key named and spelled 'result' and not the actual SQL alias.  Triple face palm moment. 

 

@v-nikhilan-msft 

 

Here is the solution.  But first, let me lay it all out from the first step.

0__create a user defined stored procedure in the lakehouse SQL endpoint like so:

 

 

 

CREATE OR ALTER PROCEDURE [dbo].[usp_is_exist_table]
    @table_name VARCHAR(32) = '',
    @result     INT OUTPUT
AS
BEGIN
    IF OBJECT_ID(@table_name, 'U') IS NOT NULL
        SET @result = 1
    ELSE
        SET @result = 0
END

 

 

 

This procedure checks whether or not a table exists.  Why?  In order to implement incremental refresh within a pipeline, we need to know 1 of 2 things:  Does the table exist, OR if it exists, is it not empty and thus there is a sequential index number to get and from which only we shall start loading new data from the source. We will use a pipeline variable later to store the name of the table for whose existence we want to check. 

 

1__create the following pipeline flow:

 

Screenshot 2024-04-11 231708.jpg

 

Because I have a simple scenario, I can simply hard code 1 or 2 table names in their respective pipeline variable. But if this needs to be done for more tables than this, then of course, you would use a control flow loop activity looping over an array variable containing all the table names in the lakehouse (but given the pain of doing even something as simple as this incremental load, I don't think it's worth trying only to find out that suddenly I'm stuck doing even more beta testing on behalf of Microsoft). 

 

2__configuring the pipeline variables: click on the pipeline design surface to make the pipeline config pane appear:

 

Screenshot 2024-04-11 233339.jpg

Since I only have 2 tables, intead of setting up a loop and so on, I opted to just hard code the table names in 2 distinct variables (since I already spent way too much time on this than should be necessary. It is left as an exercise for the time when I feel intrepid). 

 

The variable latest_index is going to be used inside the If Condition activity. 

 

3__configuring the Script activity:

 

Screenshot 2024-04-11 233928.jpg

For some reason, I can't get a scalar singleton, and so we are stuck with the script output being a result set. (If you figure it out, please let me know.) Note the last Select statement needs an alias or it won't work.

 

Also note the pipeline variable interpolation in the T-SQL script using @{...}, namely:  @{variables('usp_input_1')}:

 

 

 

DECLARE @usp_output INT;
EXEC [dbo].[usp_is_exist_table] @table_name = @{variables('usp_input_1')}, @result = @usp_output OUTPUT;
SELECT @usp_output AS Result;

 

 

 

No script parameters defined in the GUI, and no Advanced settings either.

 

Also important to know is the JSON output by the Script activity on success because you will need to reference some of its keys to retrieve the output value of the script. It looks like this:

 

 

 

{
	"resultSetCount": 1,
	"recordsAffected": 0,
	"resultSets": [
		{
			"rowCount": 1,
			"rows": [
				{
					"Result": 1
				}
			]
		}
	],
	"outputParameters": {},
	"outputLogs": "",
	"outputLogsLocation": "",
	"outputTruncated": false,
	"executionDuration": 1
}

 

 

 

Note how the word 'result' is spelled:  Result, and not result.  The reason I mention this is that the JSON output from the other Script activity inside the If Condition activity returns a JSON similar to this one except that the word 'result' starts with a small cap instead of a capitalized 'R.'  Yeah, go figure. 

 

4__configuring the If Condition activity:

 

Screenshot 2024-04-11 234851.jpg

Click on the expression and the editor opens up showing this code:

 

 

 

@equals(activity('Check MUP_Status exists').output.resultSets[0].rows[0].Result, 1)

 

 

 

Screenshot 2024-04-11 235640.jpg

 

Note how after selecting the first option in the list of Activity outputs, we get 

 

 

 

activity('Check MUP_Status exists').output

 

 

 

and then we use the '.' operator to reference the JSON structure 

 

 

 

.resultSets[0].rows[0].Result

 

 

 

in order to get the output from the Script activity outside the If Condition activity.

 

5__configuring the Script activity titled 'Get latest status ID' inside the If Condition activity: 

 

Screenshot 2024-04-12 000310.jpg

Again, the Select statement needs to alias the column returned.

As mentioned above, the JSON output from this Script activity is slightly different in terms of the capitalization of the result key, which is spelled 'result' and not 'Result' as shown below:

 

 

 

{
	"resultSetCount": 1,
	"recordsAffected": 0,
	"resultSets": [
		{
			"rowCount": 1,
			"rows": [
				{
					"result": 670826
				}
			]
		}
	],
	"outputParameters": {},
	"outputLogs": "",
	"outputLogsLocation": "",
	"outputTruncated": false,
	"executionDuration": 1
}

 

 

 

5__configuring the Set variable activity:

 

Screenshot 2024-04-12 000929.jpg

 

The expression (in green above) looks like this and is why I was saying pay attention to the fact that, even though the JSON structure is exactly the same as the first script output, the key 'result' is spelled differently and this is the spelling you must use as shown below (still banging my head against the wall about this one) :

 

 

 

@activity('Get latest status ID').output.resultSets[0].rows[0].result

 

 

 

Whatever the latest ID that has been ingested into the lakehouse for a particular table happens to be, it is stored in the latest_index variable which is used in the next activity, which is a Copy data activity using a T-SQL Select statement with a WHERE clause to filter for the most recently added data to the source DB.  I'll leave that last step as a small exercise for the newbie to figure out.  

 

In conclusion:  What a mess!!!  But there you have it.  I hope the result of struggling for the better part of a day+ will prove helpful to the naive and hapless soul who still belabors under the illusion that using Fabric is an easy matter of click-click and we are done.  If only!

View solution in original post

7 REPLIES 7
Element115
Power Participant
Power Participant

EDIT:  re the discussion of the capitalization of the JSON key refering to the script return value, I just realized the system is using the alias name the user provides in the SELECT statement; I thought  the generated JSON would simply contain a generic key named and spelled 'result' and not the actual SQL alias.  Triple face palm moment. 

 

@v-nikhilan-msft 

 

Here is the solution.  But first, let me lay it all out from the first step.

0__create a user defined stored procedure in the lakehouse SQL endpoint like so:

 

 

 

CREATE OR ALTER PROCEDURE [dbo].[usp_is_exist_table]
    @table_name VARCHAR(32) = '',
    @result     INT OUTPUT
AS
BEGIN
    IF OBJECT_ID(@table_name, 'U') IS NOT NULL
        SET @result = 1
    ELSE
        SET @result = 0
END

 

 

 

This procedure checks whether or not a table exists.  Why?  In order to implement incremental refresh within a pipeline, we need to know 1 of 2 things:  Does the table exist, OR if it exists, is it not empty and thus there is a sequential index number to get and from which only we shall start loading new data from the source. We will use a pipeline variable later to store the name of the table for whose existence we want to check. 

 

1__create the following pipeline flow:

 

Screenshot 2024-04-11 231708.jpg

 

Because I have a simple scenario, I can simply hard code 1 or 2 table names in their respective pipeline variable. But if this needs to be done for more tables than this, then of course, you would use a control flow loop activity looping over an array variable containing all the table names in the lakehouse (but given the pain of doing even something as simple as this incremental load, I don't think it's worth trying only to find out that suddenly I'm stuck doing even more beta testing on behalf of Microsoft). 

 

2__configuring the pipeline variables: click on the pipeline design surface to make the pipeline config pane appear:

 

Screenshot 2024-04-11 233339.jpg

Since I only have 2 tables, intead of setting up a loop and so on, I opted to just hard code the table names in 2 distinct variables (since I already spent way too much time on this than should be necessary. It is left as an exercise for the time when I feel intrepid). 

 

The variable latest_index is going to be used inside the If Condition activity. 

 

3__configuring the Script activity:

 

Screenshot 2024-04-11 233928.jpg

For some reason, I can't get a scalar singleton, and so we are stuck with the script output being a result set. (If you figure it out, please let me know.) Note the last Select statement needs an alias or it won't work.

 

Also note the pipeline variable interpolation in the T-SQL script using @{...}, namely:  @{variables('usp_input_1')}:

 

 

 

DECLARE @usp_output INT;
EXEC [dbo].[usp_is_exist_table] @table_name = @{variables('usp_input_1')}, @result = @usp_output OUTPUT;
SELECT @usp_output AS Result;

 

 

 

No script parameters defined in the GUI, and no Advanced settings either.

 

Also important to know is the JSON output by the Script activity on success because you will need to reference some of its keys to retrieve the output value of the script. It looks like this:

 

 

 

{
	"resultSetCount": 1,
	"recordsAffected": 0,
	"resultSets": [
		{
			"rowCount": 1,
			"rows": [
				{
					"Result": 1
				}
			]
		}
	],
	"outputParameters": {},
	"outputLogs": "",
	"outputLogsLocation": "",
	"outputTruncated": false,
	"executionDuration": 1
}

 

 

 

Note how the word 'result' is spelled:  Result, and not result.  The reason I mention this is that the JSON output from the other Script activity inside the If Condition activity returns a JSON similar to this one except that the word 'result' starts with a small cap instead of a capitalized 'R.'  Yeah, go figure. 

 

4__configuring the If Condition activity:

 

Screenshot 2024-04-11 234851.jpg

Click on the expression and the editor opens up showing this code:

 

 

 

@equals(activity('Check MUP_Status exists').output.resultSets[0].rows[0].Result, 1)

 

 

 

Screenshot 2024-04-11 235640.jpg

 

Note how after selecting the first option in the list of Activity outputs, we get 

 

 

 

activity('Check MUP_Status exists').output

 

 

 

and then we use the '.' operator to reference the JSON structure 

 

 

 

.resultSets[0].rows[0].Result

 

 

 

in order to get the output from the Script activity outside the If Condition activity.

 

5__configuring the Script activity titled 'Get latest status ID' inside the If Condition activity: 

 

Screenshot 2024-04-12 000310.jpg

Again, the Select statement needs to alias the column returned.

As mentioned above, the JSON output from this Script activity is slightly different in terms of the capitalization of the result key, which is spelled 'result' and not 'Result' as shown below:

 

 

 

{
	"resultSetCount": 1,
	"recordsAffected": 0,
	"resultSets": [
		{
			"rowCount": 1,
			"rows": [
				{
					"result": 670826
				}
			]
		}
	],
	"outputParameters": {},
	"outputLogs": "",
	"outputLogsLocation": "",
	"outputTruncated": false,
	"executionDuration": 1
}

 

 

 

5__configuring the Set variable activity:

 

Screenshot 2024-04-12 000929.jpg

 

The expression (in green above) looks like this and is why I was saying pay attention to the fact that, even though the JSON structure is exactly the same as the first script output, the key 'result' is spelled differently and this is the spelling you must use as shown below (still banging my head against the wall about this one) :

 

 

 

@activity('Get latest status ID').output.resultSets[0].rows[0].result

 

 

 

Whatever the latest ID that has been ingested into the lakehouse for a particular table happens to be, it is stored in the latest_index variable which is used in the next activity, which is a Copy data activity using a T-SQL Select statement with a WHERE clause to filter for the most recently added data to the source DB.  I'll leave that last step as a small exercise for the newbie to figure out.  

 

In conclusion:  What a mess!!!  But there you have it.  I hope the result of struggling for the better part of a day+ will prove helpful to the naive and hapless soul who still belabors under the illusion that using Fabric is an easy matter of click-click and we are done.  If only!

v-nikhilan-msft
Community Support
Community Support

Hi @Element115 
Thanks for using Fabric Community.
As for now we do not have any official information regarding this feature. Any details regarding the new features will be mentioned here:  Microsoft Fabric release plan documentation - Microsoft Fabric | Microsoft Learn

Hope this helps. Please let me know if you have any further questions.

Hi @Element115 
We haven’t heard from you on the last response and was just checking back to see if your got resolved.
Otherwise, will respond back with the more details and we will try to help.
Thanks

@v-nikhilan-msft Well, I implemented the incremental append by hand, however not quite there yet as it is not clear how to access the output of a stored procedure inside a pipeline execution flow.  Not even Copilot is useful in this regard as it keeps giving all sorts of answers that are non-answers.  

In the meantime, if you know which expression one should use to set the value of a variable to the output from the Stored procedure activity, please let me know because the following don't seem to work, unless they return some JSON and you gotta know which key in that JSON to reference, which I don't since I couldn't find any documentation on this:

Screenshot 2024-04-11 141813.jpg

 

The stored procedure looks like this:

CREATE OR ALTER PROCEDURE [dbo].[usp_is_exist_table]

    @table_name VARCHAR(32) = '',
    @result     INT OUTPUT
AS
BEGIN
    IF OBJECT_ID(@table_name, 'U') IS NOT NULL
        SET @result = 1
    ELSE
        SET @result = 0
END
GO

Furthermore, even though I follow the instructions from the 2nd link you provided, the Script activity fails because of cast error from NVARCHAR to INT.  Yet, this is how I specified the Script activity setting:

Screenshot 2024-04-11 182936.jpg

 

Of course, I also tried setting the result parameter to type Int64, didn't work. 

 

So how is one supposed to do this?  What are the exact and accurate steps to follow to capture the singleton ouptut of a stored procedure as a scalar value of T-SQL type INT?  

 

Also, code used to call the stored procedure is as follows (not standard btw, but the system doesn't seem to mind)--please advise if a different syntax should be used:

EXEC [dbo].[usp_is_exist_table] table_name, result

@v-nikhilan-msft 

First. re your first link:  if you look at my sp code, you will see it does not return a result set, but a scalar value, ie an integer equal to either 0 or 1.  So, if my sp were to return a result set, I understand from the first link that I should use a Lookup activity instead of a 'Stored procedure' activity.

 

However, the second link states: 'When the stored procedure has Output parameters, instead of using stored procedure activity, use lookup acitivty and Script activity. Stored procedure activity does not support calling SPs with Output parameter yet.'

But that's not all. Further down, the writer shows the Azure Script activity UI and it does not match the UI we have in Fabric!!  In Fabric, there is no 'Stored Procedure' option.

 

Screenshot 2024-04-11 174628.jpg

 

So how exactly are we supposed to do this, ie retrieve the stored procedure OUTPUT parameter value?

 

Now I am pissed off.  What on earth is the point of having a 'Stored procedure' activity if OUTPUT params can't even be handled?  I mean, seriously, this is so mindboggling that it is beyond unfathomable.  

Helpful resources

Announcements
RTI Forums Carousel3

New forum boards available in Real-Time Intelligence.

Ask questions in Eventhouse and KQL, Eventstream, and Reflex.

MayFabricCarousel

Fabric Monthly Update - May 2024

Check out the May 2024 Fabric update to learn about new features.