How to access Defender for Cloud Apps data and sto...

gulce · ‎08-06-2025

🔹 1. First — How do you access Defender data at all?

I'm trying to pull discovery data (e.g. which users accessed which apps, like ChatGPT) using the API. I generate a valid access token using client credentials (Azure AD App Registration), and I tried both:

Classic MCAS endpoint:
https://<tenant>.portal.cloudappsecurity.com/api/discovery_users/
Microsoft Graph endpoint:
https://graph.microsoft.com/beta/security/cloudAppSecurity/discoveryUsers

So my first question is:

Which endpoint and method actually returns usable Defender for Cloud Apps data?

🔹 2. Then — How do you store this data for history tracking?

Once I get access working, I’d like to:

Pull data regularly using an ETL tool (e.g. Talend)
Save it to a database (e.g. SQL Server)
Connect Power BI to that database for long-term trend analysis

My second set of questions:

Has anyone built an ETL flow for Defender data?
What challenges did you face with throttling, authentication, paging?
What permissions or scopes are required in Azure?

Any working examples, architectures, or documentation would be really appreciated. Thanks so much!

Best,

Gülce

johnbasha33 · ‎08-06-2025

Hi @gulce

Python Example: Defender for Cloud Apps (MCAS) API

import requests
import json
import csv

# === Azure AD App Registration ===
tenant_id = "<your-tenant-id>"
client_id = "<your-client-id>"
client_secret = "<your-client-secret>"
mcas_domain = "<yourtenant>"  # without .onmicrosoft.com or any suffix

# === Token Endpoint for OAuth2 ===
token_url = f"https://login.microsoftonline.com/{tenant_id}/oauth2/token"

# === Get OAuth Token for MCAS API ===
token_data = {
    'grant_type': 'client_credentials',
    'client_id': client_id,
    'client_secret': client_secret,
    'resource': f'https://{mcas_domain}.portal.cloudappsecurity.com'
}

response = requests.post(token_url, data=token_data)
access_token = response.json().get('access_token')

if not access_token:
    print("Failed to get token:", response.text)
    exit()

headers = {
    'Authorization': f'Bearer {access_token}',
    'Content-Type': 'application/json'
}

# === Discovery API Endpoint ===
api_url = f"https://{mcas_domain}.portal.cloudappsecurity.com/api/discovery_users/"

all_data = []

while api_url:
    print(f"Fetching page: {api_url}")
    resp = requests.get(api_url, headers=headers)
    if resp.status_code != 200:
        print("Failed to fetch data:", resp.text)
        break

    data = resp.json()
    all_data.extend(data.get('data', []))

    api_url = data.get('nextLink')  # MCAS pagination

# === Save to CSV ===
csv_file = 'discovery_users.csv'
if all_data:
    keys = all_data[0].keys()
    with open(csv_file, 'w', newline='', encoding='utf-8') as f:
        dict_writer = csv.DictWriter(f, fieldnames=keys)
        dict_writer.writeheader()
        dict_writer.writerows(all_data)
    print(f"Saved {len(all_data)} records to {csv_file}")
else:
    print("No data received.")

Permissions Required in Azure AD App Registration:

Directory.Read.All (under Microsoft Graph)
Grant MCAS API access permissions in the Defender for Cloud Apps portal
Admin consent is required

Did I answer your question? Mark my post as a solution! Appreciate your Kudos !!

gulce · ‎08-07-2025

Thank you for your reply.

I’m able to pull data via the API, but I noticed that not all records are returned.

For example, in the Generative AI category, I see 127 records on the portal, but the API only returns 8–9 records.
Also, not all users are returned either.

Do I need to implement pagination?
If so, how exactly should it be done?

Here are the URLs I used:

And here is the script I used:

import requests
import pandas as pd

# 🔐 Token and domain info
access_token = "....."
mcas_domain = "<......>"

headers = {
'Authorization': f'Token {access_token}',
'Content-Type': 'application/json'
}

# 🔄 API URL: discovered_apps
api_url = f"https://{mcas_domain}.portal.cloudappsecurity.com/api/v1/discovery/discovered_apps/"
all_data = []

while api_url:
resp = requests.get(api_url, headers=headers)
print("Status code:", resp.status_code)

if resp.status_code != 200:
print("Error:", resp.text)
break

data = resp.json()
print("Number of records on page:", len(data.get("data", [])))
all_data.extend(data.get("data", []))
api_url = data.get("nextLink")

df = pd.json_normalize(all_data)

# 💾 Save to CSV
df.to_csv("discovered_apps.csv", index=False)
print("✅ discovered_apps.csv file created.")

v-achippa

Hi @gulce,

Thank you for reaching out to Microsoft Fabric Community.

Thank you @johnbasha33 for the prompt response.

The API is not returning all the apps or users you see in the defender because,

The API call without a streamId only returns data for one stream, that is why there are only few records. And full pagination is not implemented, here the API returns results in pages so by default it may be showing only the first page records.

Please follow below steps:

Get all stream IDs from /api/discovery/streams/
For each streamId, call /api/v1/discovery/discovered_apps with that streamId in the request body.
Implement pagination by checking hasNext and use the nextQueryFilters until no more pages remain.
Combine results from all the streams to match the portal’s total count.

This will return all records from all reports, matching what you see in the portal.

If this post helps in resolving the issue, then please consider Accepting as solution to help the other members find it more quickly, don't forget to give a "Kudos" – I’d truly appreciate it!

Thanks and regards,

Anjan Kumar Chippa