Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Special holiday offer! You and a friend can attend FabCon with a BOGO code. Supplies are limited. Register now.

Reply
younghoon_kim
Regular Visitor

Using Web Activity's output as a source of Copy Data Activity.

Pipeline I try to build is one that pulls data by calling external API, and save the response into Lakehouse table. 

And this has to be done recursively because response from API uses pagination, so it gives next URL to call to keep on collecting rest of the data.  Response from API looks like this:

{
	"subscriptions": [
		{
			(SOME ELEMENTS)
		}
	],
	"next": "(NEXT URL TO CALL)",
	"size": 50,
}

 

So I setup Until Activity on condition of "next' is empty. Inside, Copy Activity calls API using a variable "nextUrl", and then using same "nextUrl", Web Activity calls API to get "next", then Set Variable Activity sets variable "nextUrl" to "next". 

 

Screenshot 2025-08-07 at 3.40.03 PM.png

This seemed to be a good solution, except that API server doesn't allow calling the same URL appearing in "next" twice. URL appearing in "next" seems to be called only once. So my question is,

1. Can I use Web Activity's reponse as a source of Copy Data (So I only have to call API in Web Activity)

2. Any other solution for fetch paginated URL data?

1 ACCEPTED SOLUTION
Aala_Ali
Kudo Kingpin
Kudo Kingpin

Hi @younghoon_kim  👋

Thanks for the extra details in your follow-up thread. I can see the API returns:

"next": "/details?q=..."


…and when you use AbsoluteUrl = $.next, the next call is sent to:

https://serviceportal.telenorconnexion.com/details?... ( base path dropped)


instead of:

https://serviceportal.telenorconnexion.com/iot/api/subscriptions/details?... ( expected)


That explains the failure you saw. When the API gives a root-relative path (starts with “/”), the Copy activity treats it as domain-root and ignores the base path segment (/iot/api/subscriptions).


Below are three clean ways to get you unblocked—start with Option A if you can make the API return a full link.

A) Easiest (if API can return a full link)

Ask the API (or toggle an option, if it has one) to return an absolute next link:

"next": "https://serviceportal.telenorconnexion.com/iot/api/subscriptions/details?q=..."


Then set Source ▸ Advanced ▸ Pagination rules:

Key: AbsoluteUrl

Value: $.next

That’s the officially supported pattern: AbsoluteUrl can point to the next absolute or relative URL in the response body; JSONPath is used to read it. Also add an EndCondition or MaxRequestNumber to avoid endless loops if the API echoes back the last URL.


B) When the API only returns a root-relative /details?...

Copy activity can’t “prepend” /iot/api/subscriptions/ to a root-relative value in the pagination rules (the value must be a header reference or a JSONPath result—no string concat). That’s why the dynamic @concat(...) you tried errors (“not an ancestor” — pagination rules can’t reference other activities).


Two reliable workarounds:

B1) Dataflow Gen2 (Power Query) paging

Power Query lets you stitch the base path with the relative ‘next’ easily.

Dataflow Gen2 ▸ Blank query ▸ Advanced Editor, paste a template like:

let
Base = "https://serviceportal.telenorconnexion.com/iot/api/subscriptions",
First = "/details?{your params}",
GetPage = (rel as text) =>
let
url = Base & rel,
json = Json.Document(Web.Contents(url)),
rows = json[data],
next = try json[next] otherwise null
in [Rows = rows, Next = next],

Source = Table.GenerateByPage((prev) =>
let rel = if prev = null then First else prev[Next]
in if rel = null then null else GetPage(rel)
),

Result = Table.ExpandListColumn(Source, "Rows")
in
Result


Output to your Lakehouse table.
Docs: Handling paging with Table.GenerateByPage().

B2) Notebook (Python) — full control

If you prefer code, resolve the relative next with urljoin:

import requests, pandas as pd
from urllib.parse import urljoin

base = "https://serviceportal.telenorconnexion.com/iot/api/subscriptions/"
url = urljoin(base, "details?{your params}")
rows = []

while url:
r = requests.get(url, headers={"Authorization": f"Bearer {token}"}, timeout=60)
r.raise_for_status()
js = r.json()
rows.extend(js.get("data", []))
next_rel = js.get("next")
url = urljoin(base, next_rel) if next_rel else None

df = pd.DataFrame(rows)
# write to Lakehouse table/files as you prefer


This precisely fixes the root-relative next issue by always combining it with the correct base path. (General REST + pagination guidance for Copy/REST is here.)


C) Small safety knobs (whichever route you take)

EndCondition / MaxRequestNumber in pagination to prevent endless loops.

Request interval (ms) (e.g., 300–500) if the API rate-limits.



If this solved it, please mark as Solution and give Kudos so others can find it faster 🙏

 

View solution in original post

6 REPLIES 6
v-sdhruv
Community Support
Community Support

Hi @younghoon_kim ,

Web Activity in ADF is a control flow component and cannot directly feed data into a copy data activity. It’s designed to trigger HTTP endpoints and return metadata or control values (lURLs), but not to stream or pass data into a sink. You would need to first write the web activity output to a blob (eg. Azure Function), and then use that blob as the source for Copy Activity.
How to use output of Azure Data Factory Web Activity in next copy activity? - Stack Overflow

Or use the method provided by Aala Ali.
Copy and transform data from and to a REST endpoint - Azure Data Factory & Azure Synapse | Microsoft...

Let me know if this helps!

Aala_Ali
Kudo Kingpin
Kudo Kingpin

Hi @younghoon_kim  👋

Thanks for the extra details in your follow-up thread. I can see the API returns:

"next": "/details?q=..."


…and when you use AbsoluteUrl = $.next, the next call is sent to:

https://serviceportal.telenorconnexion.com/details?... ( base path dropped)


instead of:

https://serviceportal.telenorconnexion.com/iot/api/subscriptions/details?... ( expected)


That explains the failure you saw. When the API gives a root-relative path (starts with “/”), the Copy activity treats it as domain-root and ignores the base path segment (/iot/api/subscriptions).


Below are three clean ways to get you unblocked—start with Option A if you can make the API return a full link.

A) Easiest (if API can return a full link)

Ask the API (or toggle an option, if it has one) to return an absolute next link:

"next": "https://serviceportal.telenorconnexion.com/iot/api/subscriptions/details?q=..."


Then set Source ▸ Advanced ▸ Pagination rules:

Key: AbsoluteUrl

Value: $.next

That’s the officially supported pattern: AbsoluteUrl can point to the next absolute or relative URL in the response body; JSONPath is used to read it. Also add an EndCondition or MaxRequestNumber to avoid endless loops if the API echoes back the last URL.


B) When the API only returns a root-relative /details?...

Copy activity can’t “prepend” /iot/api/subscriptions/ to a root-relative value in the pagination rules (the value must be a header reference or a JSONPath result—no string concat). That’s why the dynamic @concat(...) you tried errors (“not an ancestor” — pagination rules can’t reference other activities).


Two reliable workarounds:

B1) Dataflow Gen2 (Power Query) paging

Power Query lets you stitch the base path with the relative ‘next’ easily.

Dataflow Gen2 ▸ Blank query ▸ Advanced Editor, paste a template like:

let
Base = "https://serviceportal.telenorconnexion.com/iot/api/subscriptions",
First = "/details?{your params}",
GetPage = (rel as text) =>
let
url = Base & rel,
json = Json.Document(Web.Contents(url)),
rows = json[data],
next = try json[next] otherwise null
in [Rows = rows, Next = next],

Source = Table.GenerateByPage((prev) =>
let rel = if prev = null then First else prev[Next]
in if rel = null then null else GetPage(rel)
),

Result = Table.ExpandListColumn(Source, "Rows")
in
Result


Output to your Lakehouse table.
Docs: Handling paging with Table.GenerateByPage().

B2) Notebook (Python) — full control

If you prefer code, resolve the relative next with urljoin:

import requests, pandas as pd
from urllib.parse import urljoin

base = "https://serviceportal.telenorconnexion.com/iot/api/subscriptions/"
url = urljoin(base, "details?{your params}")
rows = []

while url:
r = requests.get(url, headers={"Authorization": f"Bearer {token}"}, timeout=60)
r.raise_for_status()
js = r.json()
rows.extend(js.get("data", []))
next_rel = js.get("next")
url = urljoin(base, next_rel) if next_rel else None

df = pd.DataFrame(rows)
# write to Lakehouse table/files as you prefer


This precisely fixes the root-relative next issue by always combining it with the correct base path. (General REST + pagination guidance for Copy/REST is here.)


C) Small safety knobs (whichever route you take)

EndCondition / MaxRequestNumber in pagination to prevent endless loops.

Request interval (ms) (e.g., 300–500) if the API rate-limits.



If this solved it, please mark as Solution and give Kudos so others can find it faster 🙏

 

Hi @Aala_Ali  , Thanks for your descriptive answer. I knew Notebook always an option but was just curious if what I want is feasible in Copy Activity. Also great to know that Dataflow Gen2 can do this too. I'm not used to using it, but will reference you answer when I decide to use it. Thanks. FYI, now I'm focusing ELT jobs on Data Pipeline, so I configured using Copy Activity, Variables, and Until Activity, and it works

Hi @younghoon_kim 
Awesome, I’m really happy to hear you got it working with Copy activity + Variables + Until.

Aala_Ali
Kudo Kingpin
Kudo Kingpin

Hi @younghoon_kim  👋

You don't need a Web activity here. Let Copy activity (REST source) call the API and follow the nextlink automatically. Here's how:

1) Create the REST connection

In Data Factory → top bar Settings → Manage connections & gateways → New .

Choose Web / REST , give it a name, set the Base URL , and pick your Auth (API key header, OAuth2 Client Credentials, Managed Identity, etc.). Test & Save.


2) Add a Copy activity

Open (or create) your Data pipeline .

Add pipeline activity → Copy activity (or use Copy assistant ).


3) Configure the Source (REST)

Data store type: External

Connection type: REST (pick the connection you just made)

Relative URL: first page endpoint (eg, /v1/items?limit=100)

Advanced → Request method: GET (or POSTif your API needs it)

Advanced → Pagination rules:

If your response body has a next link at paging.next:

Key: AbsoluteUrl

Value: $.paging.next

If it's an OData API with @odata.nextLink:

Key: AbsoluteUrl

Value: $['@odata.nextLink']

(Optional) Request interval (ms): add a small delay (eg, 300–500) if the API rate-limits.
These are native pagination options in Copy (REST).


Note: Copy stops when the JSONPath returns null/empty. If your API sometimes repeats the last URL, add an EndCondition or a MaxRequestNumber in Pagination rules to prevent endless loops.


4) Configure the Destination (Lakehouse)

Connection: your Lakehouse .

Root folder: Tables → pick an existing table or type a new name.

Choose Table action (Append/Overwrite) under Advanced as needed.


5) (Optional) Mapping

If you want tabular columns instead of raw JSON, open mapping , import schemas , and map source → destination columns. If you want to land the JSON as-is, skip mapping.


6) Run & verify

Save → Run the pipeline → Check Monitor for page count, written rows, and any pagination or throttle messages.


Why not “Web output → Copy source”?

Web activity is control flow only; it doesn't feed data directly into Copy as a source. Copy's REST connector is built to issue the requests and handle pagination itself, so you avoid re-calling the same page twice.


Handy JSONPath examples for the pagination box

Body next link:$.paging.next

OData next link: $['@odata.nextLink'](the bracket syntax handles the @)


If this solved it, please mark as Solution and give Kudos so others can find it faster 🙏

Thanks for your reply.My first option was the way you described, but failed to do so I posted another question : https://community.fabric.microsoft.com/t5/Data-Pipeline/REST-API-connection-pagination-inside-Copy-A...

 

I was wondering if there's any other way.

Helpful resources

Announcements
December Fabric Update Carousel

Fabric Monthly Update - December 2025

Check out the December 2025 Fabric Holiday Recap!

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.