Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Calling all Data Engineers! Fabric Data Engineer (Exam DP-700) live sessions are back! Starting October 16th. Sign up.

Reply
bidgeir
Most Valuable Professional
Most Valuable Professional

Spark - query Power BI scanner api fails

Hi all

 

I´m a newbie to spark. Hopefully this is not too silly question.

 

I´m trying to call the Power BI admin API endpoint PostWorkspaceInfo.

 

The endpoint needs a list of modified workspaces as body. I have created all the logic to get a list of the modified workspace guids and I have put them into batches so I can call the PostWorkspaceInfo endpoint to ask for the metadata. My problem is that the list over modified workspaces is coming from a temporary view and when I try to pass it to the json part of the request.post it fails with BadRequest. I have examined the content of the column and it´s exactly as it should be. If I copy it and paste it straight into the request.post the call works. For some reason the text I´m inputting from the view is not read correctly. I´m guessing that I need to somehow make the request.post parse it as JSON but I have no idea how.

 

For this question I have limited the number of workspaces to 3 and sanitized the guids. Normally I would have hundreds/thousands of workspaces in batches of 100 hence the for loop below.

 

The column BodyExpression from below returns this: {"workspaces": ["034c87da-bbbb-4df8-becc-555e5916d6fa","043654b6-7868-aaaa-bdad-b1cf9ad95f4c","073c7513-dddd-40c3-bb6e-48d0f0b00b42"]}

 

My code below runs but the API returns error (see below code)

 

 

#Import necessary libraries
import msal
import requests
import json
import pandas as pd
from datetime import date, timedelta, time

#Set Client ID and Secret for Service Principal
client_id = ""
client_secret = ""
authority_url = "https://login.microsoftonline.com/northinsights.com"
scope = ["https://analysis.windows.net/powerbi/api/.default"]

activityDate = date.today()

#Use MSAL to grab token
app = msal.ConfidentialClientApplication(client_id, authority=authority_url, client_credential=client_secret)
result = app.acquire_token_for_client(scopes=scope)


#Set URL for changed workspaces
url = "https://api.powerbi.com/v1.0/myorg/admin/workspaces/modified"

#Get changed workspaces
if 'access_token' in result:
    access_token = result['access_token']
    header = {'Content-Type':'application/json', 'Authorization':f'Bearer {access_token}'}
    api_call = requests.get(url=url, headers=header)
    print(json.loads(api_call.text))

df1 = spark.createDataFrame(json.loads(api_call.text))
df1.createOrReplaceTempView("modifiedworkspaces")
display(df1)

%%sql
--Batch workspace ids into 100

create or replace temporary view WorkspaceBatches as
WITH WorkspaceRanked AS 
(
    SELECT ROW_NUMBER() OVER ( ORDER BY id) AS Rank
    , id
    FROM modifiedworkspaces
),
WorkspaceBatched AS
(
    SELECT ROW_NUMBER() OVER (PARTITION BY(Rank % 100) ORDER BY Rank ASC) AS Batch
    , id
    FROM WorkspaceRanked
)

SELECT replace(concat('{"workspaces": ["' , array_join(array_sort(collect_set(id)),","),'"]}'),',','","')  AS BodyExpression, Batch
--INTO WorkspaceBatches
FROM WorkspaceBatched
GROUP BY Batch;



SELECT * FROM WorkspaceBatches


#Try to post workspaces to get inventory
post_WorkspaceInfoUrl = "https://api.powerbi.com/v1.0/myorg/admin/workspaces/getInfo?lineage=True&datasourceDetails=True&getArtifactUsers=True&datasetSchema=True&datasetExpressions=True"
dfbatches = spark.sql("SELECT Batch, BodyExpression FROM WorkspaceBatches")
#display(dfbatches2)
rows_looped = dfbatches.select("Batch", "BodyExpression").collect()
# looping through each batch
for rows in rows_looped:
    #print(rows.value)
    post_WorkspaceInfo = requests.post(url=post_WorkspaceInfoUrl, headers=header, json = rows.BodyExpression )
    print(post_WorkspaceInfo.text)

 

 

 The code above runs but the response is a bad request: {"error":{"code":"BadRequest","message":"Bad Request","details":[{"message":"Error converting value "{"workspaces": ["034c87da-bbbb-4df8-becc-555e5916d6fa","043654b6-7868-aaaa-bdad-b1cf9ad95f4c","073c7513-dddd-40c3-bb6e-48d0f0b00b42"]}" to type 'Microsoft.PowerBI.ServiceContracts.Api.RequiredWorkspaces'. Path '', line 1, position 3096.","target":"requiredWorkspaces"}]}}

 

If I paste the value of the BodyExrpression into the code instead (like below). It works correctly.

 

Codesnippet:

 

post_WorkspaceInfoUrl = "https://api.powerbi.com/v1.0/myorg/admin/workspaces/getInfo?lineage=True&datasourceDetails=True&getArtifactUsers=True&datasetSchema=True&datasetExpressions=True"

dfbatches = spark.sql("SELECT Batch, BodyExpression FROM WorkspaceBatches")

rows_looped = dfbatches.select("Batch", "BodyExpression").collect()

for rows in rows_looped:
    post_WorkspaceInfo = requests.post(url=post_WorkspaceInfoUrl, headers=header, json = {"workspaces": ["034c87da-bbbb-4df8-becc-555e5916d6fa","043654b6-7868-aaaa-bdad-b1cf9ad95f4c","073c7513-dddd-40c3-bb6e-48d0f0b00b42"]} )
    print(post_WorkspaceInfo.text)

 

 

I tried converting the BodyExpression to JSON and then passing it into the request.post but it didn´t work. I´ve tried all kinds of conversions but nothing works for me.

I´m expecting that the string I pass into would work the same way as when I paste the same string directly into the function.

1 ACCEPTED SOLUTION
bidgeir
Most Valuable Professional
Most Valuable Professional

The problem is solved. I needed to use data = rows.bodyexpression

 

 

instead json = rows.bodyexpression because I´m passing string and not json in.

 

View solution in original post

1 REPLY 1
bidgeir
Most Valuable Professional
Most Valuable Professional

The problem is solved. I needed to use data = rows.bodyexpression

 

 

instead json = rows.bodyexpression because I´m passing string and not json in.

 

Helpful resources

Announcements
FabCon Global Hackathon Carousel

FabCon Global Hackathon

Join the Fabric FabCon Global Hackathon—running virtually through Nov 3. Open to all skill levels. $10,000 in prizes!

September Fabric Update Carousel

Fabric Monthly Update - September 2025

Check out the September 2025 Fabric update to learn about new features.

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.