Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get Fabric certified for FREE! Don't miss your chance! Learn more

Reply
Andy_S
Frequent Visitor

UUID JSON export from Mongo DB

We are exporting JSON Files from a Mongo DB with the Data Factory. The UUID in the MongoDB is represented like this "5aa399c0-2701-401e-8e83-26b5ce82abc7". In the exported JSON File whe geht the following values:

 

    "_id": {
        "$binary": "wJmjWgEnHkCOgya1zoKrxw==",
        "$type": "03"
    },

 

Any ideas how I can get the representation as a GUID in the export file instead the "ugly" binary value?

 

I already tried modifying the connection string to the MongoDB but nothing worked. And there is no setting to configure the driver used by the data factory to read the values.

1 ACCEPTED SOLUTION
Andy_S
Frequent Visitor

The 'ugly' string is base64 encoded.
We get the MongoDB file with a copy task from the and write it to a blob storage. Afterwards we import it with databricks and decode the base64.

 

The function we use:

def base64ToGUID(b64str):
    try:
      s = bitstring.BitArray(bytes=base64.b64decode(b64str)).hex
      def rev2(s_):
        def chunks(n):
          for i in range(0, len(s_), n):
              yield s_[i:i+n]
        return "".join(list(chunks(2))[::-1])
      return "-".join([rev2(s[:8]),rev2(s[8:][:4]),rev2(s[12:][:4]),s[16:][:4],s[20:]])
    except:
      return None

We read the JSON to a spark dataframe, convert it to a pandas dataframe and apply the decoding:

# Create pandas dataframe
df_pandas = df_spark.toPandas()

#iterate over rows to decode the columns
for index, row in df_pandas.iterrows():
   row["row_name_1"] = base64ToGUID(row["row_name_1"])
   row["row_name_2"] = base64ToGUID(row["row_name_2"])
   row["row_name_n"] = base64ToGUID(row["row_name_n"])

Hope this helps.

View solution in original post

5 REPLIES 5
Marusyk
Advocate II
Advocate II

Alternative solutions can be found here: Convert UUID from MongoDb to UNIQUEIDENTIFIER

Andy_S
Frequent Visitor

The 'ugly' string is base64 encoded.
We get the MongoDB file with a copy task from the and write it to a blob storage. Afterwards we import it with databricks and decode the base64.

 

The function we use:

def base64ToGUID(b64str):
    try:
      s = bitstring.BitArray(bytes=base64.b64decode(b64str)).hex
      def rev2(s_):
        def chunks(n):
          for i in range(0, len(s_), n):
              yield s_[i:i+n]
        return "".join(list(chunks(2))[::-1])
      return "-".join([rev2(s[:8]),rev2(s[8:][:4]),rev2(s[12:][:4]),s[16:][:4],s[20:]])
    except:
      return None

We read the JSON to a spark dataframe, convert it to a pandas dataframe and apply the decoding:

# Create pandas dataframe
df_pandas = df_spark.toPandas()

#iterate over rows to decode the columns
for index, row in df_pandas.iterrows():
   row["row_name_1"] = base64ToGUID(row["row_name_1"])
   row["row_name_2"] = base64ToGUID(row["row_name_2"])
   row["row_name_n"] = base64ToGUID(row["row_name_n"])

Hope this helps.

Thanks for your reply. I'm looking for another way because we don't use spark.

@GraceGu could you suggest anything? Can I use Power Query (Dataflow Gen2 in Fabric) or transform it in Data Pipeline before/after Copy?

GraceGu
Microsoft Employee
Microsoft Employee

Would you mind filling a support case for this? We'll need engineering to track this one down. Thank you!

I faced the same problem. any solutions?

Helpful resources

Announcements
Sticker Challenge 2026 Carousel

Join our Community Sticker Challenge 2026

If you love stickers, then you will definitely want to check out our Community Sticker Challenge!

Free Fabric Certifications

Free Fabric Certifications

Get Fabric certified for free! Don't miss your chance.

January Fabric Update Carousel

Fabric Monthly Update - January 2026

Check out the January 2026 Fabric update to learn about new features.

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.