Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get certified in Microsoft Fabric—for free! For a limited time, get a free DP-600 exam voucher to use by the end of 2024. Register now

Reply
powerbitotheppl
Advocate I
Advocate I

API Pagination in JSON Body - How to Access with Power Query in Custom Connector?

Hello,

 

I have built a custom connector to connect to the Vimeo API via OAuth2. Everything is working well, but it appears I need to come up with a solution to deal with pagination, as I am only getting back 25 items on each page.

 

I see the documentation on how to use Table.GenerateByPage  and getNextPage here:

https://docs.microsoft.com/en-us/power-query/samples/trippin/5-paging/readme#tablegeneratebypage

 

As well as the implementation within the example GitHub custom connector

https://github.com/microsoft/DataConnectors/blob/master/samples/Github/github.pq

A sample of functions from that example:

 

 

Github.Contents = (url as text) =>
    let
        content = Web.Contents(url),
        link = GetNextLink(content),
        json = Json.Document(content),
        table = Table.FromList(json, Splitter.SplitByNothing())
    in
        table meta [Next=link];

Github.PagedTable = (url as text) => Table.GenerateByPage((previous) =>
    let
        // If we have a previous page, get its Next link from metadata on the page.
        next = if (previous <> null) then Value.Metadata(previous)[Next] else null,
        // If we have a next link, use it, otherwise use the original URL that was passed in.
        urlToUse = if (next <> null) then next else url,
        // If we have a previous page, but don't have a next link, then we're done paging.
        // Otherwise retrieve the next page.
        current = if (previous <> null and next = null) then null else Github.Contents(urlToUse),
        // If we got data back from the current page, get the link for the next page
        link = if (current <> null) then Value.Metadata(current)[Next] else null
    in
        current meta [Next=link]);


GetNextLink = (response, optional request) =>
    let
        // extract the "Link" header if it exists
        link = Value.Metadata(response)[Headers][#"Link"]?,
        links = Text.Split(link, ","),
        splitLinks = List.Transform(links, each Text.Split(Text.Trim(_), ";")),
        next = List.Select(splitLinks, each Text.Trim(_{1}) = "rel=""next"""),
        first = List.First(next),
        removedBrackets = Text.Range(first{0}, 1, Text.Length(first{0}) - 2)
    in
        try removedBrackets otherwise null;

 

 

 

However, my issue is that the metadata on pagination that returns from the Vimeo API is coming through the JSON body response instead of within the headers, as is assumed in the documentation and examples. Is there an easy way or helper function within Power Query/M that would allow me to look into the body of the JSON response, grab the pagination JSON objects (as below), and built out my code from there?

 

Here is what comes back regarding pagination from Vimeo's API within the JSON body:

 

 

 

    "total": 1012,
    "page": 1,
    "per_page": 25,
    "paging": {
        "next": "/users/{our-user-id}/videos?page=2",
        "previous": null,
        "first": "/users/{our-user-id}/videos?page=1",
        "last": "/users/{our-user-id}/videos?page=41"
    },

 

 

 

Many thanks for any help - it is very much appreciated!

Best,

-Josh

12 REPLIES 12
lance_6
Helper II
Helper II

This is one of the best threads I've ever seen on this forum. I wish this was around 2 years ago when I had to solve this problem! 

powerbitotheppl
Advocate I
Advocate I

Hello lbendlin,

 

Thank you! I managed to grab the next page and place it in the metadata as evidenced by the GetPage and GetNextLink functions below. I confirmed the metadata is there in PowerBI visually. However, when I use the connector in PowerBI, it does loop, but it is not grabbing the "next" metadata. It's stuck repeating on page 1. 

Do you have any idea what I might be doing wrong?  Any help is greatly appreciated!!

 

 

This is my .pq file for the custom connector.

 

// This file contains your Data Connector logic
section Vimeo_Connector;


// Vimeo OAuth2 values
client_id = Text.FromBinary(Extension.Contents("client_id.txt"));
client_secret = Text.FromBinary(Extension.Contents("client_secret.txt"));
redirect_uri = "https://oauth.powerbi.com/views/oauthredirect.html";
token_uri = "https://api.vimeo.com/oauth/access_token";
authorize_uri = "https://api.vimeo.com/oauth/authorize";
logout_uri = "https://login.microsoftonline.com/logout.srf";



// Login modal window dimensions
windowWidth = 720;
windowHeight = 1024;


[DataSource.Kind="Vimeo_Connector", Publish="Vimeo_Connector.UI"]
shared Vimeo_Connector.Contents = Value.ReplaceType(_Vimeo.Contents, type function (url as Uri.Type) as any);

/*
[DataSource.Kind="Vimeo_Connector"]
shared Vimeo_Connector.PagedTable = Value.ReplaceType(_Vimeo.PagedTable, type function (url as Uri.Type) as nullable table);
*/

// Data Source Kind description
Vimeo_Connector= [
    TestConnection = (dataSourcePath) => { "Vimeo_Connector.Contents", dataSourcePath },
    Authentication = [
        OAuth = [
            StartLogin=StartLogin,
            FinishLogin=FinishLogin,
            Refresh=Refresh,
            Logout=Logout
        ]
    ],
    Label = Extension.LoadString("DataSourceLabel")
];

_Vimeo.Contents = (url as text) as table =>
    let
        a =  GetAllPages(url)
     in
        a;

GetPage = (url as text) as table =>
    let
        content = Json.Document(Web.Contents(url)),
        link = GetNextLink(content),
        table = Record.ToTable(content)
        
     in
        table meta [next=link];


GetNextLink = (response, optional request) =>
    let
        // extract the "Link" header if it exists
        link = (response)[paging][next]
    in
        try link otherwise null;


// Data Source UI publishing description
Vimeo_Connector.UI = [
    Beta = true,
    Category = "Other",
    ButtonText = { Extension.LoadString("ButtonTitle"), Extension.LoadString("ButtonHelp") },
    LearnMoreUrl = "https://powerbi.microsoft.com/",
    SourceImage = Vimeo_Connector.Icons,
    SourceTypeImage = Vimeo_Connector.Icons
];

// Helper functions for OAuth2: StartLogin, FinishLogin, Refresh, Logout
StartLogin = (resourceUrl, state, display) =>
    let
        authorizeUrl = authorize_uri & "?" & Uri.BuildQueryString([
            response_type = "code",
            client_id = client_id,  
            redirect_uri = redirect_uri,
            state = state
           // scope = GetScopeString(scopes, scope_prefix)
        ])
    in
        [
            LoginUri = authorizeUrl,
            CallbackUri = redirect_uri,
            WindowHeight = 720,
            WindowWidth = 1024,
            Context = null
        ];

FinishLogin = (context, callbackUri, state) =>
    let
        // parse the full callbackUri, and extract the Query string
        parts = Uri.Parts(callbackUri)[Query],
        // if the query string contains an "error" field, raise an error
        // otherwise call TokenMethod to exchange our code for an access_token
        result = if (Record.HasFields(parts, {"error", "error_description"})) then 
                    error Error.Record(parts[error], parts[error_description], parts)
                 else
                    TokenMethod("authorization_code", "code", parts[code])
    in
        result;

Refresh = (resourceUrl, refresh_token) => TokenMethod("refresh_token", "refresh_token", refresh_token);

Logout = (token) => logout_uri;



// see step 4 access token: https://developer.vimeo.com/api/authentication
TokenMethod = (grantType, tokenField, code) =>
    let
        queryString = [
            grant_type = grantType,
            redirect_uri = redirect_uri,
            client_id = client_id,
            client_secret = client_secret
        ],
        queryWithCode = Record.AddField(queryString, tokenField, code),

        authKey = "Basic " & Binary.ToText(Text.ToBinary(client_id & ":" & client_secret),BinaryEncoding.Base64),

        tokenResponse = Web.Contents(token_uri, [
            Content = Text.ToBinary(Uri.BuildQueryString(queryWithCode)),
            Headers = [
                #"Authorization" = authKey,
                #"Content-Type" = "application/x-www-form-urlencoded",
                #"Accept" = "application/vnd.vimeo.*+json;version=3.4"
            ],
            ManualStatusHandling = {400} 
        ]),
        body = Json.Document(tokenResponse),
        result = if (Record.HasFields(body, {"error", "error_description"})) then 
                    error Error.Record(body[error], body[error_description], body)
                 else
                    body
    in
        result;

Value.IfNull = (a, b) => if a <> null then a else b;

GetScopeString = (scopes as list, optional scopePrefix as text) as text =>
    let
        prefix = Value.IfNull(scopePrefix, ""),
        addPrefix = List.Transform(scopes, each prefix & _),
        asText = Text.Combine(addPrefix, " ")
    in
        asText;
        

Vimeo_Connector.Icons = [
    Icon16 = { Extension.Contents("Vimeo_Connector16.png"), Extension.Contents("Vimeo_Connector24.png"), Extension.Contents("Vimeo_Connector32.png") },
    Icon32 = { Extension.Contents("Vimeo_Connector32.png"), Extension.Contents("Vimeo_Connector40.png"), Extension.Contents("Vimeo_Connector48.png"), Extension.Contents("Vimeo_Connector64.png") }
];

GetAllPages = (url as text) as table =>
    Table.GenerateByPage((previous) => 
        let
            // if previous is null, then this is our first page of data
            nextPageToken = if (previous = null) then null else Value.Metadata(previous)[next]?,
            // if NextLink was set to null by the previous call, we know we have no more data
            page = if (nextPageToken <> null) then GetPage(url) else if (previous = null) then GetPage(url) else null
        in
            page
    );


// The getNextPage function takes a single argument and is expected to return a nullable table
Table.GenerateByPage = (getNextPage as function) as table =>
    let        
        listOfPages = List.Generate(
            () => getNextPage(null),            // get the first page of data
            (lastPage) => lastPage <> null,     // stop when the function returns null
            (lastPage) => getNextPage(lastPage) // pass the previous page to the next function call
        ),
        // concatenate the pages together
        tableOfPages = Table.FromList(listOfPages, Splitter.SplitByNothing(), {"Column1"}),
        firstRow = tableOfPages{0}?
    in
        // if we didn't get back any pages of data, return an empty table
        // otherwise set the table type based on the columns of the first page
        if (firstRow = null) then
            Table.FromRows({})
        else        
            Value.ReplaceType(
                Table.ExpandTableColumn(tableOfPages, "Column1", Table.ColumnNames(firstRow[Column1])),
                Value.Type(firstRow[Column1])
            );

 

This piece of code looks iffy:

 

        link = (response)[paging][next]

Thank you!

This issue was is that I was only passing the url through the GetPage function in GetAllPages. I also needed to pass through another argument in GetPage containing the next page metadata/url

There are two option to approach this

 

- iterating through all pages, harvesting/accumulating the data, and repeating until no nextpage is found

- iterating through all pages, ignoring the data, only harvesting the URLs. Then add a cstom column that fetches the content for each URL and combines them in one step

 

The second approach seems to be wasteful as you seemingly fetch each URL twice. In reality this is most likely covered by the browser engine cache so there is no performance penalty. The benefit is that you don't have to lug the data around during the iteration, so performance is ultimately even better.

Thank you lbendlin,

 

Do you think that using the second way might help me get around API limits? The API I am working with case a limit of 250 calls a minutes and I think the way I did it (the first option in your list) is making me hit that limit.

 

Thanks!

And do you happen to have any examples or documentation for the second approach? It sounds very intriguing, and I am noticing performance issues doing it the first approach. 

Read about Function.InvokeAfter

 

Search for API pagination, like here Solved: PowerQuery Rest API Pagination - Microsoft Power BI Community

Thank you, lbendlin.

 

I've tried pagination a few different ways now:

   1) MS documented Table.GenerateByPage function

   2) Custom columns to generate API calls

   3) Using List.Generate to create a list of API calls and stitching them together

 

Regarding query speed, I'm having the most luck with #3. But still, it takes a long time to fetch the data and I am often left hanging on "Waiting....api.vimeo.com".

 

It takes about 5-6 minutes to fetch 1000 videos (coming in as Records containing a plethora of data). Is this a normal length of time to expect? I'm querying at 100 per_page. Postman takes about 10 seconds to get a 100 item response. 

 

I have turned off my Type Detection and Background Data loading in an attempt to speed things up, with no luck.

 

I think I should open up another thread, since this has gone off topic. But I was wondering if you had any other suggestions.

 

Thank you for your expertise.

Vimeo may be throttling your requests. Have you gotten a 429 yet?

 

Table.GenerateByPage is syntax sugar for List.Generate.

No 429, but I think the payload from this API was particularly large. I was able to filter the fields I want in the query itself, and speed has improved greatly.

 

Thank you for all of your help with this. I ended up implementing the custom column strategy you mentioned and it works very well. You've been very helpful! 

lbendlin
Super User
Super User

That's actually simpler than if pagination is done in the header.  Parse your received body as JSON (which you likely are already doing) and grab the pagination fields (either offset, or nexturl, or whatever - depending on your API. Then stitch all results together as needed.

Helpful resources

Announcements
November Carousel

Fabric Community Update - November 2024

Find out what's new and trending in the Fabric Community.

Live Sessions with Fabric DB

Be one of the first to start using Fabric Databases

Starting December 3, join live sessions with database experts and the Fabric product team to learn just how easy it is to get started.

Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early Bird pricing ends December 9th.

Nov PBI Update Carousel

Power BI Monthly Update - November 2024

Check out the November 2024 Power BI update to learn about new features.