Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Grow your Fabric skills and prepare for the DP-600 certification exam by completing the latest Microsoft Fabric challenge.

Reply
mrraut
Frequent Visitor

Admin - Datasets GetDatasourcesAsAdmin request limit

I have been using the 'Admin - Datasets GetDatasourcesAsAdmin' REST api (https://learn.microsoft.com/en-us/rest/api/power-bi/admin/datasets-get-datasources-as-admin) in our scripts for a while now and it has been working as expected, however, I have been getting error with Status code : 429. You have exceeded the amount of requests allowed in the current time frame and further requests will fail. Retry in xxxx seconds.

The error occurs after 30 odd requests.

 

Has anything changed recently on this api?

2 ACCEPTED SOLUTIONS
samsonfrb
Frequent Visitor

Hi all,
I got a reply from Microsoft about the issue:

 

We have confirmation about a new throttling limit on this specific API call to 50 calls per hour.

 

Product Group informs they are aware this new limit is affecting users and their actual implementations, for this reason they share this workaround in the meantime a solution is applied about the new throttling limit:

 

“While we work on refining the throttling limits for the GetDatasourcesAsAdmin API, we understand the importance of providing viable alternatives for our users. As a temporary workaround, we recommend leveraging our scanner APIs to access all data source information for your tenant. The scanner APIs offer a robust solution for retrieving datasets and related data source information.

For more information, please refer “-

https://learn.microsoft.com/en-us/rest/api/power-bi/admin/workspace-info-get-scan-result

 

I hope they align the final throttling limit with GetXXXUsersAsAdmin APIs (200 calls per hour) so that with have consistency with our scripts. Or that they implement an API that return all datasources of all semantic models of the tenant in a single call.

 

Using scanner API is more involved (asynchronous calls, complex Json result) requiring a longer implementation.

 

Best regards,

Frederick

View solution in original post

karo
Post Patron
Post Patron

Is the old limit back? Today I was able to get all datasources as before end of January (via Datasets GetDatasourcesAsAdmin) .... 

Anyone has similar experience?

View solution in original post

28 REPLIES 28
karo
Post Patron
Post Patron

Is the old limit back? Today I was able to get all datasources as before end of January (via Datasets GetDatasourcesAsAdmin) .... 

Anyone has similar experience?

mrraut
Frequent Visitor

Yes, I think the old limit is back. I was able to run it across multiple calls and get the data without any issues

Yes same here, able to now run previous code and return all datasources.

I confirm that the old, less restricted limit is back! Thanks all for having reported the issue and to Microsoft to revert to the previous (and documented) limit.

Frederick

samsonfrb
Frequent Visitor

Hi all,
I got a reply from Microsoft about the issue:

 

We have confirmation about a new throttling limit on this specific API call to 50 calls per hour.

 

Product Group informs they are aware this new limit is affecting users and their actual implementations, for this reason they share this workaround in the meantime a solution is applied about the new throttling limit:

 

“While we work on refining the throttling limits for the GetDatasourcesAsAdmin API, we understand the importance of providing viable alternatives for our users. As a temporary workaround, we recommend leveraging our scanner APIs to access all data source information for your tenant. The scanner APIs offer a robust solution for retrieving datasets and related data source information.

For more information, please refer “-

https://learn.microsoft.com/en-us/rest/api/power-bi/admin/workspace-info-get-scan-result

 

I hope they align the final throttling limit with GetXXXUsersAsAdmin APIs (200 calls per hour) so that with have consistency with our scripts. Or that they implement an API that return all datasources of all semantic models of the tenant in a single call.

 

Using scanner API is more involved (asynchronous calls, complex Json result) requiring a longer implementation.

 

Best regards,

Frederick

@samsonfrb Thank you for sharing the responce from MS Support.

 

I have tried Admin - WorkspaceInfo GetScanResult, however it seems that part what I am the most interested in i.e. regarding datasources is missing in my response.

 

Missing part in comparison to sample response:

 

], "datasourceInstances": [ { "datasourceType": "XXX", "connectionDetails": { "server": "XXX", "database": "XXX" }, "datasourceId": "XXX", "gatewayId": "XXX" } ],

 

Have you tried it on your own? Are you getting info regarding datasources (similar as above)?

 

Regards,

Karo

 

Hi,

Did you set &datasourceDetails=true whey you called PostWorkspaceInfo?

I didn't have the chance to try these APIs yet but I saw that you must request this information in order to get it.

Frederick

@samsonfrb Good hint, thanks for it! It seems that I need to set up both lineage=True & datasourceDetails=True to get the info I am looking for.

aj1973
Community Champion
Community Champion

Thanks @samsonfrb 

Regards
Amine Jerbi

If I answered your question, please mark this thread as accepted
and you can follow me on
My Website, LinkedIn and Facebook

Data__Monkey
Regular Visitor

Same here, working until end of Jan 2024 now gives response code 429 after processing 25 datasets.

I dont understand what the limit is based on, I looped in the api in python with a 60 mins wait timer after the 429 error occurs. The number is not consistent. Sometimes its 28, sometimes 41, ranges between 25-45.

samsonfrb
Frequent Visitor

Hi mrraut,

We have the same problem. It started on January 28 and it now seems to allow only 50 calls per hour. It was working fine for a few months before HTTP Error 409 started to appear. We opened a ticket with Microsoft, I hope they resolve it soon. Or that they add a new API call returning all datasources from all semantic models of the tenant in a single call. 
I'll let you know if they come back to me with a resolution.

Frederick

Let me know what MS says. I ended up adding a 60 min sleep timer and took 2 days for the api/python code to fetch us the results for all the datasets.

mrraut
Frequent Visitor

Sure, here is the python script I have been running to get the information needed. This was working correctly till a few weeks ago, but I have been seeing the 429 error since yesterday. I run this every 2-3 months as needed.

 

 

######## Loading the required libraries

# import adal
import requests
# import os
import pandas as pd
from azure.identity import ClientSecretCredential
from time import sleep


tenant = 'xxx'
client = 'xxx'
client_secret = 'xxx' 
api = 'https://analysis.windows.net/powerbi/api/.default'

# Generates the access token for the Service Principal
auth = ClientSecretCredential(authority = 'https://login.microsoftonline.com/',
                                                        tenant_id = tenant,
                                                        client_id = client,
                                                        client_secret = client_secret)
access_token = auth.get_token(api)
access_token = access_token.token

headers = {'Authorization': f'Bearer {access_token}'}

workspacetable_id = []
workspacetable_name = []

workspace = requests.get('https://api.powerbi.com/v1.0/myorg/admin/groups?%24top=5000&%24filter=state%20eq%20%27Active%27%20and%20type%20eq%20%27Workspace%27&%24expand=users', headers=headers)

if workspace.status_code == 200:
    for workspace_info in workspace.json()['value']:
        
        workspacetable_id.append(workspace_info['id'])
        workspacetable_name.append(workspace_info['name'])


Workspace_Info = pd.DataFrame({
 	"Workspace ID":workspacetable_id,
 	"Workspace Name":workspacetable_name
})



dataset_id = []
dataset_name = []
dataset_url = []
dataset_owner = []
dataset_workspaceid = []
dataset_connection_details = []
dataset_datasource_id = []
dataset_gateway_id = []


datasets = requests.get('https://api.powerbi.com/v1.0/myorg/admin/datasets', headers=headers)
for datasets_info in datasets.json()['value']:
    datasouces = requests.get('https://api.powerbi.com/v1.0/myorg/admin/datasets/'+datasets_info['id']+'/datasources', headers=headers)
    
    #Deleted dataset ids are sometimes retained by PowerBI service, when you try to get datascource information with the above api, it errors out. The below if statement is to mitigate that situation
    if datasouces.status_code != 200:
        continue
    
    for datasources_info in datasouces.json()['value']:
        
        try:
            dataset_id.append(datasets_info['id'])
        except:
            dataset_id.append("")
        try:
            dataset_name.append(datasets_info['name'])
        except:
            dataset_name.append("")
        try:
            dataset_url.append(datasets_info['webUrl'])
        except:
            dataset_url.append("")
        try:
            dataset_owner.append(datasets_info['configuredBy'])
        except:
            dataset_owner.append("")
        try:
            dataset_workspaceid.append(datasets_info['workspaceId'])
        except:
            dataset_workspaceid.append("")
        try:
            dataset_connection_details.append(datasources_info['connectionDetails'])
        except:
            dataset_connection_details.append("")
        try:
            dataset_datasource_id.append(datasources_info['datasourceId'])
        except:
            dataset_datasource_id.append("")                
        try:
            dataset_gateway_id.append(datasources_info['gatewayId'])
        except:
            dataset_gateway_id.append("")                


Dataset_Info = pd.DataFrame({
 	"Dataset ID":dataset_id,
    "Dataset Name": dataset_name,
    "Dataset URL": dataset_url,
    "Dataset Owner": dataset_owner,
    "Workspace ID": dataset_workspaceid,
 	"Connection Details":dataset_connection_details, 
 	"Datasource ID":dataset_datasource_id,
 	"Gateway ID":dataset_gateway_id
})


Dataset_Info_Complete = pd.merge(Dataset_Info, Workspace_Info, on='Workspace ID', how='left')

Dataset = Dataset_Info_Complete[['Workspace Name','Dataset Name','Dataset URL','Dataset Owner','Connection Details','Dataset ID','Workspace ID','Datasource ID','Gateway ID']]

Dataset.to_excel('E:/Dataset Info.xlsx', sheet_name='PBI Reports',index=False)

 

aj1973
Community Champion
Community Champion

Never mind I was mistaken, that was for workspace API. Sorry.

 

However did you know that Get Dataset API can't go over 200 requests?

aj1973_1-1706810284048.png

 

 

Regards
Amine Jerbi

If I answered your question, please mark this thread as accepted
and you can follow me on
My Website, LinkedIn and Facebook

mrraut
Frequent Visitor

I am running the get dataset api only once. The api I am running multiple times is this (below) and no limitaions have been documented yet.
https://learn.microsoft.com/en-us/rest/api/power-bi/admin/datasets-get-datasources-as-admin

mrraut
Frequent Visitor

That is getting the workspace information. The filtered result is <200 workspaces. There is no issue with that api and does not time out. Neither any other api in the same session. The issue is with the

datasouces = requests.get('https://api.powerbi.com/v1.0/myorg/admin/datasets/'+datasets_info['id']+'/datasources', headers=headers)

part of the code. This times out after 30 odd requests. We have about 1250 datasets and the api was successfuly able to fetch the datasource information for all those datasets till last month

aj1973
Community Champion
Community Champion

Hi @mrraut 

Nothing has changed in my knowledge and it shouldn't happen. maybe the process when you run the script is taking too long and causing the problem

aj1973_0-1706808736423.png

 

Can you show me the Script please?

 

Regards
Amine Jerbi

If I answered your question, please mark this thread as accepted
and you can follow me on
My Website, LinkedIn and Facebook

mrraut
Frequent Visitor

Sure, here is the python script I have been running to get the information needed. This was working correctly till a few weeks ago, but I have been seeing the 429 error since yesterday. I run this every 2-3 months as needed.

 

 

 

 

######## Loading the required libraries

# import adal
import requests
# import os
import pandas as pd
from azure.identity import ClientSecretCredential
from time import sleep


tenant = 'xxx'
client = 'xxx'
client_secret = 'xxx' 
api = 'https://analysis.windows.net/powerbi/api/.default'

# Generates the access token for the Service Principal
auth = ClientSecretCredential(authority = 'https://login.microsoftonline.com/',
                                                        tenant_id = tenant,
                                                        client_id = client,
                                                        client_secret = client_secret)
access_token = auth.get_token(api)
access_token = access_token.token

headers = {'Authorization': f'Bearer {access_token}'}

workspacetable_id = []
workspacetable_name = []

workspace = requests.get('https://api.powerbi.com/v1.0/myorg/admin/groups?%24top=5000&%24filter=state%20eq%20%27Active%27%20and%20type%20eq%20%27Workspace%27&%24expand=users', headers=headers)

if workspace.status_code == 200:
    for workspace_info in workspace.json()['value']:
        
        workspacetable_id.append(workspace_info['id'])
        workspacetable_name.append(workspace_info['name'])


Workspace_Info = pd.DataFrame({
 	"Workspace ID":workspacetable_id,
 	"Workspace Name":workspacetable_name
})



dataset_id = []
dataset_name = []
dataset_url = []
dataset_owner = []
dataset_workspaceid = []
dataset_connection_details = []
dataset_datasource_id = []
dataset_gateway_id = []


datasets = requests.get('https://api.powerbi.com/v1.0/myorg/admin/datasets', headers=headers)
for datasets_info in datasets.json()['value']:
    datasouces = requests.get('https://api.powerbi.com/v1.0/myorg/admin/datasets/'+datasets_info['id']+'/datasources', headers=headers)
    
    #Deleted dataset ids are sometimes retained by PowerBI service, when you try to get datascource information with the above api, it errors out. The below if statement is to mitigate that situation
    if datasouces.status_code != 200:
        continue
    
    for datasources_info in datasouces.json()['value']:
        
        try:
            dataset_id.append(datasets_info['id'])
        except:
            dataset_id.append("")
        try:
            dataset_name.append(datasets_info['name'])
        except:
            dataset_name.append("")
        try:
            dataset_url.append(datasets_info['webUrl'])
        except:
            dataset_url.append("")
        try:
            dataset_owner.append(datasets_info['configuredBy'])
        except:
            dataset_owner.append("")
        try:
            dataset_workspaceid.append(datasets_info['workspaceId'])
        except:
            dataset_workspaceid.append("")
        try:
            dataset_connection_details.append(datasources_info['connectionDetails'])
        except:
            dataset_connection_details.append("")
        try:
            dataset_datasource_id.append(datasources_info['datasourceId'])
        except:
            dataset_datasource_id.append("")                
        try:
            dataset_gateway_id.append(datasources_info['gatewayId'])
        except:
            dataset_gateway_id.append("")                


Dataset_Info = pd.DataFrame({
 	"Dataset ID":dataset_id,
    "Dataset Name": dataset_name,
    "Dataset URL": dataset_url,
    "Dataset Owner": dataset_owner,
    "Workspace ID": dataset_workspaceid,
 	"Connection Details":dataset_connection_details, 
 	"Datasource ID":dataset_datasource_id,
 	"Gateway ID":dataset_gateway_id
})


Dataset_Info_Complete = pd.merge(Dataset_Info, Workspace_Info, on='Workspace ID', how='left')

Dataset = Dataset_Info_Complete[['Workspace Name','Dataset Name','Dataset URL','Dataset Owner','Connection Details','Dataset ID','Workspace ID','Datasource ID','Gateway ID']]

Dataset.to_excel('E:/Dataset Info.xlsx', sheet_name='PBI Reports',index=False)

 

 

 

aj1973
Community Champion
Community Champion

Ok, I have met this problem before but using Powershell scripts. Deleted Datasets in service can take a while to completly disappear and that could cause the problem. I saw in your script the condition when Dataset is deleted.... but I am not sure that the process in running the script takes all in effect or not.

Anyway here is my script if it can help you. Didn't use your condition and it works just fine

 

 

Get-PowerBIDataset -Scope Organization |
Foreach {
$dsID = $_.Id;
$dsName = $_.Name;
Get-PowerBIDatasource -DatasetId $dsID -Scope Organization |
Foreach {
[PSCustoMObject]@{

'WSID' = $wsId;
'WSName' = $wsName;
'DSID' = $dsID;
'DSName' = $dsName;
'SrcId' = $_.DatasourceId;
'SrcName' = $_.Name;
'DatasourceType' = $_.DatasourceType;
'ConnectionString' = $_.ConnectionString;
'ConnectionDetails.server' = $_.ConnectionDetails.server;
'ConnectionDetails.database' = $_.ConnectionDetails.database;
'ConnectionDetails.Url' = $_.ConnectionDetails.Url;
'GatewayId' = $_.GatewayId

}
}
}| ConvertTo-Json | Out-File -FilePath "D:\Logs\Inventaire\Datasources\PowerBI_Datasources_$(Get-Date -Format yyyyMMdd).json" -Encoding utf8 -Force

 

By the way, I use App Registration to connect so I don't go through the access token limitations.

 

Regards
Amine Jerbi

If I answered your question, please mark this thread as accepted
and you can follow me on
My Website, LinkedIn and Facebook

Helpful resources

Announcements
RTI Forums Carousel3

New forum boards available in Real-Time Intelligence.

Ask questions in Eventhouse and KQL, Eventstream, and Reflex.

MayPowerBICarousel1

Power BI Monthly Update - May 2024

Check out the May 2024 Power BI update to learn about new features.

Top Solution Authors