Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Enhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.

Reply
luko3
New Member

Fully Parse and Extract Nested JSON from AuditData Column in CSV Using Notebook in Fabric

Hello everyone,

I'm working with a CSV file in Microsoft Fabric within a Notebook environment, and I'm running into issues parsing a complex JSON string stored in the AuditData column. This csv was downloaded from Microsoft Purview by the way, I am trying to programmatically use it to look at data.

Here’s an example of one row from the CSV:

RecordId,CreationDate,RecordType,Operation,UserId,AuditData,...
(RecordID),"2025-04-14T03:35:10.0000000Z",20,GenerateDataflowSasToken,(Random Email),"{""Id"":""(RandomID)"",""RecordType"":20,...,""WorkspaceId"":""(Random ID)"",""DataflowName"":""T_FINDING_DETAIL""}",...

When I load this file using PySpark in Fabric, the AuditData column only shows a truncated version of the JSON — usually just the Id field. However, I need to extract other fields like WorkspaceId, DataflowName, etc.

I have already tried reading the CSV with inferSchema=False, I tried cleaning the JSON string regexp_replace to fix escaped quotes, and using from_json() with a defined schema to parse the JSON.

 

from pyspark.sql.functions import col, from_json, regexp_replace
from pyspark.sql.types import StructType, StructField, StringType

df = spark.read.option("inferSchema", "false").option("header", "true").csv("path_to_file.csv")
df_cleaned = df.withColumn("AuditData", regexp_replace(col("AuditData"), '\\"', '"'))

audit_schema = StructType([
    StructField("Id", StringType()),
    StructField("WorkspaceId", StringType()),
    StructField("DataflowName", StringType())
])

df_parsed = df_cleaned.withColumn("AuditDataParsed", from_json(col("AuditData"), audit_schema))
df_parsed.select("AuditDataParsed.WorkspaceId").show()

Despite this, the parsed column still only shows partial data or nulls. I suspect the JSON might be getting truncated or malformed during the read step, or maybe the schema is incomplete.

Questions:
1. What is the best way to ensure the full JSON string in AuditData is preservedand parsed correctly?

2. Is there a recommended way to dynamically infer the schema from a sample of the JSON in Fabric?
3. Could this be a limitation of the Fabric Notebook environment or Spark's CSV reader?
Any help or guidance would be greatly appreciated!
Thanks!

1 ACCEPTED SOLUTION
v-veshwara-msft
Community Support
Community Support

Hi @luko3 ,
Thanks for posting in Microsoft Fabric Community sharing the detailed context.

To extract fields or handle cases where certain parts of the JSON may be missing, consider using coalesce while working with the parsed JSON column. This can help avoid issues when some fields are not always present in the data.

 

A similar thread involved handling nested JSON in Fabric notebooks where using coalesce along with a properly defined schema allowed to extract fields safely, even when certain parts of the JSON were missing or null.

Solved: PySpark Notebook to process complex JSON - Microsoft Fabric Community

You may refer to the discussion and apply the same pattern in your scenario.

Hope this helps. Please reach out for further assistance.

Thank you.

 

View solution in original post

3 REPLIES 3
v-veshwara-msft
Community Support
Community Support

Hi @luko3 ,
Thanks for posting in Microsoft Fabric Community sharing the detailed context.

To extract fields or handle cases where certain parts of the JSON may be missing, consider using coalesce while working with the parsed JSON column. This can help avoid issues when some fields are not always present in the data.

 

A similar thread involved handling nested JSON in Fabric notebooks where using coalesce along with a properly defined schema allowed to extract fields safely, even when certain parts of the JSON were missing or null.

Solved: PySpark Notebook to process complex JSON - Microsoft Fabric Community

You may refer to the discussion and apply the same pattern in your scenario.

Hope this helps. Please reach out for further assistance.

Thank you.

 

Yes that thread was perfect. I appreciate your help!

Hi @luko3 ,
Just wanted to check if the response provided was helpful. If further assistance is needed, please reach out.
Thank you.

Helpful resources

Announcements
Join our Fabric User Panel

Join our Fabric User Panel

This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.

June FBC25 Carousel

Fabric Monthly Update - June 2025

Check out the June 2025 Fabric update to learn about new features.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.