Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Don't miss out! 2025 Microsoft Fabric Community Conference, March 31 - April 2, Las Vegas, Nevada. Use code MSCUST for a $150 discount. Prices go up February 11th. Register now.

Reply
pafnuty
Frequent Visitor

Handling JSON Data in CSV Files During Pipeline Execution in Fabric

I need to upload data to S3 in CSV format, which I ingest from Fabric DWH. Produced CSV file contains JSON data within a field.

Occasionally, the pipeline copy activity breaks the data in this scenario. Each JSON field and JSON key is parsed for a not corresponding column.

Changing the delimiter in settings don't solve the issue.

Any ideas how it can be solved?

1 ACCEPTED SOLUTION
v-nuoc-msft
Community Support
Community Support

Hi @pafnuty 

 

There is a problem with data interruption during the pipeline replication process which may be related to the data format. You can consider doing some pre-processing on json data. For example,

 

Base64 encoding. The JSON data is encoded in base64 format to ensure that it does not interfere with the CSV structure.

 

import base64
import csv

# Sample JSON data
json_data = '{"key1": "value1", "key2": "value2"}'

# Encode JSON data in base64
encoded_json = base64.b64encode(json_data.encode()).decode()

# Write to CSV
with open('output.csv', 'w', newline='') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(['id', 'json_data'])
    writer.writerow([1, encoded_json])

 

Escape special characters. Escape special characters in JSON data to prevent parsing problems.

 

import csv
import json

# Sample JSON data
json_data = '{"key1": "value1", "key2": "value2"}'

# Escape special characters
escaped_json = json.dumps(json.loads(json_data))

# Write to CSV
with open('output.csv', 'w', newline='') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(['id', 'json_data'])
    writer.writerow([1, escaped_json])

 

Flatten the JSON data. Convert JSON data to a flat structure before writing it to CSV.

 

import csv
import json

# Sample JSON data
json_data = '{"key1": "value1", "key2": "value2"}'
json_dict = json.loads(json_data)

# Write to CSV
with open('output.csv', 'w', newline='') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(['id'] + list(json_dict.keys()))
    writer.writerow([1] + list(json_dict.values()))

 

Hopefully this gives you some ideas. 

 

Regards,

Nono Chen

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

1 REPLY 1
v-nuoc-msft
Community Support
Community Support

Hi @pafnuty 

 

There is a problem with data interruption during the pipeline replication process which may be related to the data format. You can consider doing some pre-processing on json data. For example,

 

Base64 encoding. The JSON data is encoded in base64 format to ensure that it does not interfere with the CSV structure.

 

import base64
import csv

# Sample JSON data
json_data = '{"key1": "value1", "key2": "value2"}'

# Encode JSON data in base64
encoded_json = base64.b64encode(json_data.encode()).decode()

# Write to CSV
with open('output.csv', 'w', newline='') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(['id', 'json_data'])
    writer.writerow([1, encoded_json])

 

Escape special characters. Escape special characters in JSON data to prevent parsing problems.

 

import csv
import json

# Sample JSON data
json_data = '{"key1": "value1", "key2": "value2"}'

# Escape special characters
escaped_json = json.dumps(json.loads(json_data))

# Write to CSV
with open('output.csv', 'w', newline='') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(['id', 'json_data'])
    writer.writerow([1, escaped_json])

 

Flatten the JSON data. Convert JSON data to a flat structure before writing it to CSV.

 

import csv
import json

# Sample JSON data
json_data = '{"key1": "value1", "key2": "value2"}'
json_dict = json.loads(json_data)

# Write to CSV
with open('output.csv', 'w', newline='') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(['id'] + list(json_dict.keys()))
    writer.writerow([1] + list(json_dict.values()))

 

Hopefully this gives you some ideas. 

 

Regards,

Nono Chen

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Prices go up Feb. 11th.

JanFabricDE_carousel

Fabric Monthly Update - January 2025

Explore the power of Python Notebooks in Fabric!

JanFabricDW_carousel

Fabric Monthly Update - January 2025

Unlock the latest Fabric Data Warehouse upgrades!

JanFabricDF_carousel

Fabric Monthly Update - January 2025

Take your data replication to the next level with Fabric's latest updates!