Starting December 3, join live sessions with database experts and the Microsoft product team to learn just how easy it is to get started
Learn moreGet certified in Microsoft Fabric—for free! For a limited time, get a free DP-600 exam voucher to use by the end of 2024. Register now
Hello,
I have built a custom connector to connect to the Vimeo API via OAuth2. Everything is working well, but it appears I need to come up with a solution to deal with pagination, as I am only getting back 25 items on each page.
I see the documentation on how to use Table.GenerateByPage and getNextPage here:
https://docs.microsoft.com/en-us/power-query/samples/trippin/5-paging/readme#tablegeneratebypage
As well as the implementation within the example GitHub custom connector
https://github.com/microsoft/DataConnectors/blob/master/samples/Github/github.pq
A sample of functions from that example:
Github.Contents = (url as text) =>
let
content = Web.Contents(url),
link = GetNextLink(content),
json = Json.Document(content),
table = Table.FromList(json, Splitter.SplitByNothing())
in
table meta [Next=link];
Github.PagedTable = (url as text) => Table.GenerateByPage((previous) =>
let
// If we have a previous page, get its Next link from metadata on the page.
next = if (previous <> null) then Value.Metadata(previous)[Next] else null,
// If we have a next link, use it, otherwise use the original URL that was passed in.
urlToUse = if (next <> null) then next else url,
// If we have a previous page, but don't have a next link, then we're done paging.
// Otherwise retrieve the next page.
current = if (previous <> null and next = null) then null else Github.Contents(urlToUse),
// If we got data back from the current page, get the link for the next page
link = if (current <> null) then Value.Metadata(current)[Next] else null
in
current meta [Next=link]);
GetNextLink = (response, optional request) =>
let
// extract the "Link" header if it exists
link = Value.Metadata(response)[Headers][#"Link"]?,
links = Text.Split(link, ","),
splitLinks = List.Transform(links, each Text.Split(Text.Trim(_), ";")),
next = List.Select(splitLinks, each Text.Trim(_{1}) = "rel=""next"""),
first = List.First(next),
removedBrackets = Text.Range(first{0}, 1, Text.Length(first{0}) - 2)
in
try removedBrackets otherwise null;
However, my issue is that the metadata on pagination that returns from the Vimeo API is coming through the JSON body response instead of within the headers, as is assumed in the documentation and examples. Is there an easy way or helper function within Power Query/M that would allow me to look into the body of the JSON response, grab the pagination JSON objects (as below), and built out my code from there?
Here is what comes back regarding pagination from Vimeo's API within the JSON body:
"total": 1012,
"page": 1,
"per_page": 25,
"paging": {
"next": "/users/{our-user-id}/videos?page=2",
"previous": null,
"first": "/users/{our-user-id}/videos?page=1",
"last": "/users/{our-user-id}/videos?page=41"
},
Many thanks for any help - it is very much appreciated!
Best,
-Josh
This is one of the best threads I've ever seen on this forum. I wish this was around 2 years ago when I had to solve this problem!
Hello lbendlin,
Thank you! I managed to grab the next page and place it in the metadata as evidenced by the GetPage and GetNextLink functions below. I confirmed the metadata is there in PowerBI visually. However, when I use the connector in PowerBI, it does loop, but it is not grabbing the "next" metadata. It's stuck repeating on page 1.
Do you have any idea what I might be doing wrong? Any help is greatly appreciated!!
This is my .pq file for the custom connector.
// This file contains your Data Connector logic
section Vimeo_Connector;
// Vimeo OAuth2 values
client_id = Text.FromBinary(Extension.Contents("client_id.txt"));
client_secret = Text.FromBinary(Extension.Contents("client_secret.txt"));
redirect_uri = "https://oauth.powerbi.com/views/oauthredirect.html";
token_uri = "https://api.vimeo.com/oauth/access_token";
authorize_uri = "https://api.vimeo.com/oauth/authorize";
logout_uri = "https://login.microsoftonline.com/logout.srf";
// Login modal window dimensions
windowWidth = 720;
windowHeight = 1024;
[DataSource.Kind="Vimeo_Connector", Publish="Vimeo_Connector.UI"]
shared Vimeo_Connector.Contents = Value.ReplaceType(_Vimeo.Contents, type function (url as Uri.Type) as any);
/*
[DataSource.Kind="Vimeo_Connector"]
shared Vimeo_Connector.PagedTable = Value.ReplaceType(_Vimeo.PagedTable, type function (url as Uri.Type) as nullable table);
*/
// Data Source Kind description
Vimeo_Connector= [
TestConnection = (dataSourcePath) => { "Vimeo_Connector.Contents", dataSourcePath },
Authentication = [
OAuth = [
StartLogin=StartLogin,
FinishLogin=FinishLogin,
Refresh=Refresh,
Logout=Logout
]
],
Label = Extension.LoadString("DataSourceLabel")
];
_Vimeo.Contents = (url as text) as table =>
let
a = GetAllPages(url)
in
a;
GetPage = (url as text) as table =>
let
content = Json.Document(Web.Contents(url)),
link = GetNextLink(content),
table = Record.ToTable(content)
in
table meta [next=link];
GetNextLink = (response, optional request) =>
let
// extract the "Link" header if it exists
link = (response)[paging][next]
in
try link otherwise null;
// Data Source UI publishing description
Vimeo_Connector.UI = [
Beta = true,
Category = "Other",
ButtonText = { Extension.LoadString("ButtonTitle"), Extension.LoadString("ButtonHelp") },
LearnMoreUrl = "https://powerbi.microsoft.com/",
SourceImage = Vimeo_Connector.Icons,
SourceTypeImage = Vimeo_Connector.Icons
];
// Helper functions for OAuth2: StartLogin, FinishLogin, Refresh, Logout
StartLogin = (resourceUrl, state, display) =>
let
authorizeUrl = authorize_uri & "?" & Uri.BuildQueryString([
response_type = "code",
client_id = client_id,
redirect_uri = redirect_uri,
state = state
// scope = GetScopeString(scopes, scope_prefix)
])
in
[
LoginUri = authorizeUrl,
CallbackUri = redirect_uri,
WindowHeight = 720,
WindowWidth = 1024,
Context = null
];
FinishLogin = (context, callbackUri, state) =>
let
// parse the full callbackUri, and extract the Query string
parts = Uri.Parts(callbackUri)[Query],
// if the query string contains an "error" field, raise an error
// otherwise call TokenMethod to exchange our code for an access_token
result = if (Record.HasFields(parts, {"error", "error_description"})) then
error Error.Record(parts[error], parts[error_description], parts)
else
TokenMethod("authorization_code", "code", parts[code])
in
result;
Refresh = (resourceUrl, refresh_token) => TokenMethod("refresh_token", "refresh_token", refresh_token);
Logout = (token) => logout_uri;
// see step 4 access token: https://developer.vimeo.com/api/authentication
TokenMethod = (grantType, tokenField, code) =>
let
queryString = [
grant_type = grantType,
redirect_uri = redirect_uri,
client_id = client_id,
client_secret = client_secret
],
queryWithCode = Record.AddField(queryString, tokenField, code),
authKey = "Basic " & Binary.ToText(Text.ToBinary(client_id & ":" & client_secret),BinaryEncoding.Base64),
tokenResponse = Web.Contents(token_uri, [
Content = Text.ToBinary(Uri.BuildQueryString(queryWithCode)),
Headers = [
#"Authorization" = authKey,
#"Content-Type" = "application/x-www-form-urlencoded",
#"Accept" = "application/vnd.vimeo.*+json;version=3.4"
],
ManualStatusHandling = {400}
]),
body = Json.Document(tokenResponse),
result = if (Record.HasFields(body, {"error", "error_description"})) then
error Error.Record(body[error], body[error_description], body)
else
body
in
result;
Value.IfNull = (a, b) => if a <> null then a else b;
GetScopeString = (scopes as list, optional scopePrefix as text) as text =>
let
prefix = Value.IfNull(scopePrefix, ""),
addPrefix = List.Transform(scopes, each prefix & _),
asText = Text.Combine(addPrefix, " ")
in
asText;
Vimeo_Connector.Icons = [
Icon16 = { Extension.Contents("Vimeo_Connector16.png"), Extension.Contents("Vimeo_Connector24.png"), Extension.Contents("Vimeo_Connector32.png") },
Icon32 = { Extension.Contents("Vimeo_Connector32.png"), Extension.Contents("Vimeo_Connector40.png"), Extension.Contents("Vimeo_Connector48.png"), Extension.Contents("Vimeo_Connector64.png") }
];
GetAllPages = (url as text) as table =>
Table.GenerateByPage((previous) =>
let
// if previous is null, then this is our first page of data
nextPageToken = if (previous = null) then null else Value.Metadata(previous)[next]?,
// if NextLink was set to null by the previous call, we know we have no more data
page = if (nextPageToken <> null) then GetPage(url) else if (previous = null) then GetPage(url) else null
in
page
);
// The getNextPage function takes a single argument and is expected to return a nullable table
Table.GenerateByPage = (getNextPage as function) as table =>
let
listOfPages = List.Generate(
() => getNextPage(null), // get the first page of data
(lastPage) => lastPage <> null, // stop when the function returns null
(lastPage) => getNextPage(lastPage) // pass the previous page to the next function call
),
// concatenate the pages together
tableOfPages = Table.FromList(listOfPages, Splitter.SplitByNothing(), {"Column1"}),
firstRow = tableOfPages{0}?
in
// if we didn't get back any pages of data, return an empty table
// otherwise set the table type based on the columns of the first page
if (firstRow = null) then
Table.FromRows({})
else
Value.ReplaceType(
Table.ExpandTableColumn(tableOfPages, "Column1", Table.ColumnNames(firstRow[Column1])),
Value.Type(firstRow[Column1])
);
This piece of code looks iffy:
link = (response)[paging][next]
Thank you!
This issue was is that I was only passing the url through the GetPage function in GetAllPages. I also needed to pass through another argument in GetPage containing the next page metadata/url
There are two option to approach this
- iterating through all pages, harvesting/accumulating the data, and repeating until no nextpage is found
- iterating through all pages, ignoring the data, only harvesting the URLs. Then add a cstom column that fetches the content for each URL and combines them in one step
The second approach seems to be wasteful as you seemingly fetch each URL twice. In reality this is most likely covered by the browser engine cache so there is no performance penalty. The benefit is that you don't have to lug the data around during the iteration, so performance is ultimately even better.
Thank you lbendlin,
Do you think that using the second way might help me get around API limits? The API I am working with case a limit of 250 calls a minutes and I think the way I did it (the first option in your list) is making me hit that limit.
Thanks!
And do you happen to have any examples or documentation for the second approach? It sounds very intriguing, and I am noticing performance issues doing it the first approach.
Read about Function.InvokeAfter
Search for API pagination, like here Solved: PowerQuery Rest API Pagination - Microsoft Power BI Community
Thank you, lbendlin.
I've tried pagination a few different ways now:
1) MS documented Table.GenerateByPage function
2) Custom columns to generate API calls
3) Using List.Generate to create a list of API calls and stitching them together
Regarding query speed, I'm having the most luck with #3. But still, it takes a long time to fetch the data and I am often left hanging on "Waiting....api.vimeo.com".
It takes about 5-6 minutes to fetch 1000 videos (coming in as Records containing a plethora of data). Is this a normal length of time to expect? I'm querying at 100 per_page. Postman takes about 10 seconds to get a 100 item response.
I have turned off my Type Detection and Background Data loading in an attempt to speed things up, with no luck.
I think I should open up another thread, since this has gone off topic. But I was wondering if you had any other suggestions.
Thank you for your expertise.
Vimeo may be throttling your requests. Have you gotten a 429 yet?
Table.GenerateByPage is syntax sugar for List.Generate.
No 429, but I think the payload from this API was particularly large. I was able to filter the fields I want in the query itself, and speed has improved greatly.
Thank you for all of your help with this. I ended up implementing the custom column strategy you mentioned and it works very well. You've been very helpful!
That's actually simpler than if pagination is done in the header. Parse your received body as JSON (which you likely are already doing) and grab the pagination fields (either offset, or nexturl, or whatever - depending on your API. Then stitch all results together as needed.
Starting December 3, join live sessions with database experts and the Fabric product team to learn just how easy it is to get started.
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early Bird pricing ends December 9th.
User | Count |
---|---|
2 | |
2 | |
1 | |
1 | |
1 |