Solved: Re: Need help to parse JSON string that has escape...

prabhatnath · ‎03-20-2025

Hi Friends,

I am working on a PySpark notebook in Fabric that has below code fragment and the initialized value for log_data is good working example where the code gives outptu as: JSON is valid!

# Import necessary libraries
import json, re
from pyspark.sql import SparkSession
from datetime import datetime

# Setting up Spark Session
spark = SparkSession.builder.appName("ManualTriggerLogging").getOrCreate()

# Setting up Environment and Timestamp
current_env = "DEV"                                            # Change this value
current_datetime = datetime.now().strftime("%Y%m%d - %H%M%S")

# Parameter values
source_workspace = "WORKSPNAME"
source_lakehouse = "MyHub"
log_data = "[{\"ApplicationName\": \"Ap001\", \"WorkspaceId\": \"af00f007-7e26-4654-803c-d82f11108b79\", \"Environment\": \"DEV\", \"Level\": \"ERROR\", \"Severity\": \"SEV 4\", \"Component\": \"PL_Demo_Pipeline\", \"Operation\": \"ETL\", \"Run_Id\": \"f3d4ea03-ecdc-4003-8229-6ed7a88721a9\", \"SessionId\": \"\", \"Message\": \"[DEV] - Ap001 - Manual trigger message (Ignore) - 20250320 - 121528.\", \"Status\": \"Success\", \"Details\": \"Test message logged while from manually triggering. Ignore this log entry.\", \"CorrelationId\": \"\", \"User\": \"data_engineer_1\"}]"

try:
    data = json.loads(log_data)
    # Access the first element of the list, which is the dictionary.
    log_entry = data[0]
    print("JSON is valid!")
    # Now you can work with log_entry as a dictionary
    print(log_entry["ApplicationName"])  # Example: Accessing a value
except json.JSONDecodeError as e:
    error_pos = e.pos
    print("Invalid JSON:", e)
    print(f"Character at position {error_pos}: {repr(log_data[error_pos-1])}")
    start = max(0, error_pos - 20)
    end = min(len(log_data), error_pos + 20)
    print("Surrounding text:", repr(log_data[start:end]))

But if I pass below 2 values the code could not does not work and I need help on how shall I write a generic meathod or statements that can parse these 2 types of JSON so that I can further process.

Example-1

log_data = "[{\"ApplicationName\": \"Ap001\",\"WorkspaceId\": \"af00f007-7e26-4654-803c-d82f11108b79\",\"Environment\": \"DEV\",\"Level\": \"ERROR\",\"Severity\": \"SEV 4\",\"Component\": \"PL_People\",\"Operation\": \"ETL\",\"Run_Id\": \"f5f780ba-b5b7-452c-80e7-77b05d898799\",\"SessionId\": \"\",\"Message\": \"[DEV] - Ap001 - Pipeline Execution Failed - 20250313\",\"Status\": \"Success\",\"Details\": \"Pipeline Name: PL_People\\nRun Id: f5f780ba-b5b7-452c-80e7-77b05d898799\\nError Message: Notebook Error Message: An error occurred while calling o4570.load.\n: Invalid URI The ABFS endpoint for host: mydomain.dfs.fabric.microsoft.com1 is not supported. It should match one of the valid configured endpoints [fabric.microsoft.com, data.microsoft.com, pbidedicated.windows.net, core.windows.net]\n	at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.validateHostnameEndpointsIfRequired(AzureBlobFileSystemStore.java:503)\n	at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.getURIBuilder(AzureBlobFileSystemStore.java:455)\n	at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.initializeClient(AzureBlobFileSystemStore.java:1649)\n	at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.<init>(AzureBlobFileSystemStore.java:258)\n	at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.initialize(AzureBlobFileSystem.java:192)\n	at com.microsoft.vegas.vfs.VegasFileSystem.initialize(VegasFileSystem.java:133)\n	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3469)\n	at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174)\n	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574)\n	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521)\n	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540)\n	at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)\n	at org.apache.spark.sql.delta.DeltaTableUtils$.findDeltaTableRoot(DeltaTable.scala:178)\n	at org.apache.spark.sql.delta.sources.DeltaDataSource$.parsePathIdentifier(DeltaDataSource.scala:357)\n	at org.apache.spark.sql.delta.catalog.DeltaTableV2.x$1$lzycompute(DeltaTableV2.scala:73)\n	at org.apache.spark.sql.delta.catalog.DeltaTableV2.x$1(DeltaTableV2.scala:68)\n	at org.apache.spark.sql.delta.catalog.DeltaTableV2.timeTravelByPath$lzycompute(DeltaTableV2.scala:68)\n	at org.apache.spark.sql.delta.catalog.DeltaTableV2.timeTravelByPath(DeltaTableV2.scala:68)\n	at org.apache.spark.sql.delta.catalog.DeltaTableV2.$anonfun$timeTravelSpec$1(DeltaTableV2.scala:122)\n	at scala.Option.orElse(Option.scala:447)\n	at org.apache.spark.sql.delta.catalog.DeltaTableV2.timeTravelSpec$lzycompute(DeltaTableV2.scala:122)\n	at org.apache.spark.sql.delta.catalog.DeltaTableV2.timeTravelSpec(DeltaTableV2.scala:118)\n	at org.apache.spark.sql.delta.catalog.DeltaTableV2.snapshot$lzycompute(DeltaTableV2.scala:126)\n	at org.apache.spark.sql.delta.catalog.DeltaTableV2.snapshot(DeltaTableV2.scala:125)\n	at org.apache.spark.sql.delta.catalog.DeltaTableV2.toBaseRelation(DeltaTableV2.scala:200)\n	at org.apache.spark.sql.delta.sources.DeltaDataSource.$anonfun$createRelation$5(DeltaDataSource.scala:230)\n	at org.apache.spark.sql.delta.metering.DeltaLogging.recordFrameProfile(DeltaLogging.scala:141)\n	at org.apache.spark.sql.delta.metering.DeltaLogging.recordFrameProfile$(DeltaLogging.scala:139)\n	at org.apache.spark.sql.delta.sources.DeltaDataSource.recordFrameProfile(DeltaDataSource.scala:49)\n	at org.apache.spark.sql.delta.sources.DeltaDataSource.createRelation(DeltaDataSource.scala:188)\n	at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:346)\n	at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:236)\n	at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:219)\n	at scala.Option.getOrElse(Option.scala:189)\n	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:219)\n	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:188)\n	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n	at java.base/java.lang.reflect.Method.invoke(Method.java:566)\n	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)\n	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)\n	at py4j.Gateway.invoke(Gateway.java:282)\n	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)\n	at py4j.commands.CallCommand.execute(CallCommand.java:79)\n	at py4j.GatewayConnection.run(GatewayConnection.java:238)\n	at java.base/java.lang.Thread.run(Thread.java:829)\n\\nExecution URL: https://msit.powerbi.com/workloads/data-pipeline/artifacts/workspaces/af00f007-7e26-4654-803c-d82f11108b79/pipelines/PL_People/f5f780ba-b5b7-452c-80e7-77b05d898799?experience=power-bi\\nApp Name: Ap001\",\"CorrelationId\": \"\",\"User\": \"System\"}]"

And this example too:

log_data = "[{\"ApplicationName\": \"App002\",\"WorkspaceId\": \"af00f007-7e26-4654-803c-d82f11108b79\",\"Environment\": \"DEV\",\"Level\": \"ERROR\",\"Severity\": \"SEV 4\",\"Component\": \"PL_App01_Ingest\",\"Operation\": \"ETL\",\"Run_Id\": \"e0dc524a-b36a-49da-8af7-dac7ead251af\",\"SessionId\": \"\",\"Message\": \"[PPE] - DCRM - Pipeline Execution Failed - 20250319\",\"Status\": \"Success\",\"Details\": \"Pipeline Name: PL_App01_Ingest\\nRun Id: e0dc524a-b36a-49da-8af7-dac7ead251af\\nError Message: Notebook execution failed at Notebook service with http status code - '200', please check the Run logs on Notebook, additional details - 'Error name - Py4JJavaError, Error value - An error occurred while calling o5298.save.\n: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 299.0 failed 4 times, most recent failure: Lost task 0.3 in stage 299.0 (TID 3552) (vm-89b62646 executor 1): org.apache.spark.SparkUpgradeException: [INCONSISTENT_BEHAVIOR_CROSS_VERSION.READ_ANCIENT_DATETIME] You may get a different result due to the upgrading to Spark >= 3.0:\nreading dates before 1582-10-15 or timestamps before 1900-01-01T00:00:00Z\nfrom Parquet files can be ambiguous, as the files may be written by\nSpark 2.x or legacy versions of Hive, which uses a legacy hybrid calendar\nthat is different from Spark 3.0+'s Proleptic Gregorian calendar.\nSee more details in SPARK-31404. You can set the SQL config \"spark.sql.parquet.datetimeRebaseModeInRead\" or\nthe datasource option \"datetimeRebaseMode\" to \"LEGACY\" to rebase the datetime values\nw.r.t. the calendar difference during reading. To read the datetime values\nas it is, set the SQL config or the datasource option to \"CORRECTED\".\n	at org.apache.spark.sql.errors.QueryExecutionErrors$.sparkUpgradeInReadingDatesError(QueryExecutionErrors.scala:763)\n	at org.apache.spark.sql.execution.datasources.DataSourceUtils$.newRebaseExceptionInRead(DataSourceUtils.scala:178)\n	at org.apache.spark.sql.execution.datasources.DataSourceUtils.newRebaseExceptionInRead(DataSourceUtils.scala)\n	at org.apache.spark.sql.execution.datasources.parquet.ParquetVectorUpdaterFactory.rebaseTimestamp(ParquetVectorUpdaterFactory.java:1100)\n	at org.apache.spark.sql.execution.datasources.parquet.ParquetVectorUpdaterFactory.rebaseMicros(ParquetVectorUpdaterFactory.java:1113)\n	at org.apache.spark.sql.execution.datasources.parquet.ParquetVectorUpdaterFactory$LongWithRebaseUpdater.decodeSingleDictionaryId(ParquetVectorUpdaterFactory.java:577)\n	at org.apache.spark.sql.execution.datasources.parquet.ParquetVectorUpdater.decodeDictionaryIds(ParquetVectorUpdater.java:75)\n	at org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.readBatch(VectorizedColumnReader.java:240)\n	at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:328)\n	at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextKeyValue(VectorizedParquetRecordReader.java:219)\n	at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)\n	at org.apache.spark.sql.execution.datasources.RecordReaderIterator$$anon$1.hasNext(RecordReaderIterator.scala:61)\n	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:158)\n	at org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:615)\n	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown Source)\n	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)\n	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)\n	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:764)\n	at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:424)\n	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898)\n	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898)\n	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:57)\n	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:368)\n	at org.apache.spark.rdd.RDD.iterator(RDD.scala:332)\n	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92)\n	at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)\n	at org.apache.spark.scheduler.Task.run(Task.scala:139)\n	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:574)\n	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529)\n	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:577)\n	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n	at java.base/java.lang.Thread.run(Thread.java:829)\n\nDriver stacktrace:\n	at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2871)\n	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2807)\n	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2806)\n	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)\n	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)\n	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)\n	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2806)\n	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1229)\n	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1229)\n	at scala.Option.foreach(Option.scala:407)\n	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1229)\n	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:3070)\n	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3009)\n	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2998)\n	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)\n	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:988)\n	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2418)\n	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2439)\n	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2458)\n	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2483)\n	at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1029)\n	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)\n	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)\n	at org.apache.spark.rdd.RDD.withScope(RDD.scala:409)\n	at org.apache.spark.rdd.RDD.collect(RDD.scala:1028)\n	at org.apache.spark.sql.execution.SparkPlan.executeCollectIterator(SparkPlan.scala:491)\n	at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.$anonfun$relationFuture$1(BroadcastExchangeExec.scala:137)\n	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$1(SQLExecution.scala:236)\n	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)\n	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n	at java.base/java.lang.Thread.run(Thread.java:829)\nCaused by: org.apache.spark.SparkUpgradeException: [INCONSISTENT_BEHAVIOR_CROSS_VERSION.READ_ANCIENT_DATETIME] You may get a different result due to the upgrading to Spark >= 3.0:\nreading dates before 1582-10-15 or timestamps before 1900-01-01T00:00:00Z\nfrom Parquet files can be ambiguous, as the files may be written by\nSpark 2.x or legacy versions of Hive, which uses a legacy hybrid calendar\nthat is different from Spark 3.0+'s Proleptic Gregorian calendar.\nSee more details in SPARK-31404. You can set the SQL config \"spark.sql.parquet.datetimeRebaseModeInRead\" or\nthe datasource option \"datetimeRebaseMode\" to \"LEGACY\" to rebase the datetime values\nw.r.t. the calendar difference during reading. To read the datetime values\nas it is, set the SQL config or the datasource option to \"CORRECTED\".\n	at org.apache.spark.sql.errors.QueryExecutionErrors$.sparkUpgradeInReadingDatesError(QueryExecutionErrors.scala:763)\n	at org.apache.spark.sql.execution.datasources.DataSourceUtils$.newRebaseExceptionInRead(DataSourceUtils.scala:178)\n	at org.apache.spark.sql.execution.datasources.DataSourceUtils.newRebaseExceptionInRead(DataSourceUtils.scala)\n	at org.apache.spark.sql.execution.datasources.parquet.ParquetVectorUpdaterFactory.rebaseTimestamp(ParquetVectorUpdaterFactory.java:1100)\n	at org.apache.spark.sql.execution.datasources.parquet.ParquetVectorUpdaterFactory.rebaseMicros(ParquetVectorUpdaterFactory.java:1113)\n	at org.apache.spark.sql.execution.datasources.parquet.ParquetVectorUpdaterFactory$LongWithRebaseUpdater.decodeSingleDictionaryId(ParquetVectorUpdaterFactory.java:577)\n	at org.apache.spark.sql.execution.datasources.parquet.ParquetVectorUpdater.decodeDictionaryIds(ParquetVectorUpdater.java:75)\n	at org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.readBatch(VectorizedColumnReader.java:240)\n	at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:328)\n	at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextKeyValue(VectorizedParquetRecordReader.java:219)\n	at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)\n	at org.apache.spark.sql.execution.datasources.RecordReaderIterator$$anon$1.hasNext(RecordReaderIterator.scala:61)\n	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:158)\n	at org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:615)\n	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown Source)\n	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)\n	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)\n	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:764)\n	at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:424)\n	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898)\n	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898)\n	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:57)\n	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:368)\n	at org.apache.spark.rdd.RDD.iterator(RDD.scala:332)\n	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92)\n	at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)\n	at org.apache.spark.scheduler.Task.run(Task.scala:139)\n	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:574)\n	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529)\n	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:577)\n	... 3 more\n' : \\nExecution URL: https://msit.powerbi.com/workloads/data-pipeline/artifacts/workspaces/af00f007-7e26-4654-803c-d82f11108b79/pipelines/PL_App01_Ingest/e0dc524a-b36a-49da-8af7-dac7ead251af?experience=power-bi\\nApp Name: App002\",\"CorrelationId\": \"\",\"User\": \"System\"}]"

Please help on parsing these 2 so that I can further process these values.

Thanks,

Prabhat

v-pnaroju-msft · ‎03-21-2025

Hi @prabhatnath,

Thank you for reaching out to the Microsoft Fabric Community Forum.

Please find attached the screenshot and the relevant log_data, which may assist in resolving the issue:

log_data = "[{\"ApplicationName\": \"Ap001\",\"WorkspaceId\": \"af00f007-7e26-4654-803c-d82f11108b79\",\"Environment\": \"DEV\",\"Level\": \"ERROR\",\"Severity\": \"SEV 4\",\"Component\": \"PL_People\",\"Operation\": \"ETL\",\"Run_Id\": \"f5f780ba-b5b7-452c-80e7-77b05d898799\",\"SessionId\": \"\",\"Message\": \"[DEV] - Ap001 - Pipeline Execution Failed - 20250313\",\"Status\": \"Success\",\"Details\": \"Pipeline Name: PL_People\\nRun Id: f5f780ba-b5b7-452c-80e7-77b05d898799\\nError Message: Notebook Error Message: An error occurred while calling o4570.load.\n: Invalid URI The ABFS endpoint for host: mydomain.dfs.fabric.microsoft.com1 is not supported. It should match one of the valid configured endpoints [fabric.microsoft.com, data.microsoft.com, pbidedicated.windows.net, core.windows.net]\n\tat org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.validateHostnameEndpointsIfRequired(AzureBlobFileSystemStore.java:503)\n\tat org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.getURIBuilder(AzureBlobFileSystemStore.java:455)\n\tat org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.initializeClient(AzureBlobFileSystemStore.java:1649)\n\tat org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.<init>(AzureBlobFileSystemStore.java:258)\n\tat org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.initialize(AzureBlobFileSystem.java:192)\n\tat com.microsoft.vegas.vfs.VegasFileSystem.initialize(VegasFileSystem.java:133)\n\tat org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3469)\n\tat org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174)\n\tat org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574)\n\tat org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521)\n\tat org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540)\n\tat org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)\n\tat org.apache.spark.sql.delta.DeltaTableUtils$.findDeltaTableRoot(DeltaTable.scala:178)\n\tat org.apache.spark.sql.delta.sources.DeltaDataSource$.parsePathIdentifier(DeltaDataSource.scala:357)\n\tat org.apache.spark.sql.delta.catalog.DeltaTableV2.x$1$lzycompute(DeltaTableV2.scala:73)\n\tat org.apache.spark.sql.delta.catalog.DeltaTableV2.x$1(DeltaTableV2.scala:68)\n\tat org.apache.spark.sql.delta.catalog.DeltaTableV2.timeTravelByPath$lzycompute(DeltaTableV2.scala:68)\n\tat org.apache.spark.sql.delta.catalog.DeltaTableV2.timeTravelByPath(DeltaTableV2.scala:68)\n\tat org.apache.spark.sql.delta.catalog.DeltaTableV2.$anonfun$timeTravelSpec$1(DeltaTableV2.scala:122)\n\tat scala.Option.orElse(Option.scala:447)\n\tat org.apache.spark.sql.delta.catalog.DeltaTableV2.timeTravelSpec$lzycompute(DeltaTableV2.scala:122)\n\tat org.apache.spark.sql.delta.catalog.DeltaTableV2.timeTravelSpec(DeltaTableV2.scala:118)\n\tat org.apache.spark.sql.delta.catalog.DeltaTableV2.snapshot$lzycompute(DeltaTableV2.scala:126)\n\tat org.apache.spark.sql.delta.catalog.DeltaTableV2.snapshot(DeltaTableV2.scala:125)\n\tat org.apache.spark.sql.delta.catalog.DeltaTableV2.toBaseRelation(DeltaTableV2.scala:200)\n\tat org.apache.spark.sql.delta.sources.DeltaDataSource.$anonfun$createRelation$5(DeltaDataSource.scala:230)\n\tat org.apache.spark.sql.delta.metering.DeltaLogging.recordFrameProfile(DeltaLogging.scala:141)\n\tat org.apache.spark.sql.delta.metering.DeltaLogging.recordFrameProfile$(DeltaLogging.scala:139)\n\tat org.apache.spark.sql.delta.sources.DeltaDataSource.recordFrameProfile(DeltaDataSource.scala:49)\n\tat org.apache.spark.sql.delta.sources.DeltaDataSource.createRelation(DeltaDataSource.scala:188)\n\tat org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:346)\n\tat org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:236)\n\tat org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:219)\n\tat scala.Option.getOrElse(Option.scala:189)\n\tat org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:219)\n\tat org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:188)\n\tat java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.base/java.lang.reflect.Method.invoke(Method.java:566)\n\tat py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)\n\tat py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)\n\tat py4j.Gateway.invoke(Gateway.java:282)\n\tat py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)\n\tat py4j.commands.CallCommand.execute(CallCommand.java:79)\n\tat py4j.GatewayConnection.run(GatewayConnection.java:238)\n\tat java.base/java.lang.Thread.run(Thread.java:829)\n\\nExecution URL: https://msit.powerbi.com/workloads/data-pipeline/artifacts/workspaces/af00f007-7e26-4654-803c-d82f11...\nApp Name: Ap001\",\"CorrelationId\": \"\",\"User\": \"System\"}]"

If you find our response helpful, kindly mark it as the accepted solution and provide kudos. This will help other community members encountering similar queries.

Thank you.

View solution in original post

v-pnaroju-msft · ‎04-09-2025

Hi prabhatnath,

We are following up to see if your query has been resolved. Should you have identified a solution, we kindly request you to share it with the community to assist others facing similar issues.

If our response was helpful, please mark it as the accepted solution and provide kudos, as this helps the broader community.

Thank you.

v-pnaroju-msft · ‎04-06-2025

Hi prabhatnath,

We are following up to see if your query has been resolved. Should you have identified a solution, we kindly request you to share it with the community to assist others facing similar issues.

If our response was helpful, please mark it as the accepted solution and provide kudos, as this helps the broader community.

Thank you.

v-pnaroju-msft · ‎04-03-2025

Hi prabhatnath,

We are following up to see if your query has been resolved. Should you have identified a solution, we kindly request you to share it with the community to assist others facing similar issues.

If our response was helpful, please mark it as the accepted solution and provide kudos, as this helps the broader community.

Thank you.

v-pnaroju-msft · ‎03-23-2025

Hi prabhatnath,

We are pleased to learn that the information provided has resolved your issue. Kindly mark the response that addressed your query as the accepted solution, as this will assist other community members facing similar challenges in finding solutions more effectively.

Thank you.

prabhatnath · ‎03-23-2025

Hello, thanks for checking on this thread.

Actually the suggested change did worked to handle \n with the 2 examples provided, but looks like the failing to handle handle \" inside the string. Can you help me with the change that can handle this situation as well.

I have provided an example log in my last message.

Thanks,

Prabhat

prabhatnath · ‎03-25-2025

Hi Friends,
Here is an example JSON string that has " inside the "Details" section and that needed to be handelled to ensure the JSON can be used. Please review and suggest an approach for this as I am not sure how do I change the \" into \\" inside the Details section.

Example you can see those words: "spark.sql.parquet.datetimeRebaseModeInRead", "datetimeRebaseMode", "LEGACY", "CORRECTED".

var_str_log_data = "[{\"ApplicationName\": \"APP\",\"WorkspaceId\": \"0addb382-fa0d-4ce1-9c3c-95e25b957955\",\"Environment\": \"DEV\",\"Level\": \"ERROR\",\"Severity\": \"SEV 4\",\"Component\": \"PL_APP_Ingest\",\"Operation\": \"ETL\",\"Run_Id\": \"e0dc524a-b36a-49da-8af7-dac7ead251af\",\"SessionId\": \"\",\"Message\": \"[DEV] - APP - Pipeline Execution Failed - 20250319\",\"Status\": \"Success\",\"Details\": \"Pipeline Name: PL_APP_Ingest\\nRun Id: e0dc524a-b36a-49da-8af7-dac7ead251af\\nError Message: Notebook execution failed at Notebook service with http status code - '200', please check the Run logs on Notebook, additional details - 'Error name - Py4JJavaError, Error value - An error occurred while calling o5298.save.\n: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 299.0 failed 4 times, most recent failure: Lost task 0.3 in stage 299.0 (TID 3552) (vm-89b62646 executor 1): org.apache.spark.SparkUpgradeException: [INCONSISTENT_BEHAVIOR_CROSS_VERSION.READ_ANCIENT_DATETIME] You may get a different result due to the upgrading to Spark >= 3.0:\nreading dates before 1582-10-15 or timestamps before 1900-01-01T00:00:00Z\nfrom Parquet files can be ambiguous, as the files may be written by\nSpark 2.x or legacy versions of Hive, which uses a legacy hybrid calendar\nthat is different from Spark 3.0+'s Proleptic Gregorian calendar.\nSee more details in SPARK-31404. You can set the SQL config \"spark.sql.parquet.datetimeRebaseModeInRead\" or\nthe datasource option \"datetimeRebaseMode\" to \"LEGACY\" to rebase the datetime values\nw.r.t. the calendar difference during reading. To read the datetime values\nas it is, set the SQL config or the datasource option to \"CORRECTED\".\n	at org.apache.spark.sql.errors.QueryExecutionErrors$.sparkUpgradeInReadingDatesError(QueryExecutionErrors.scala:763)\n	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529)\n	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:577)\n	... 3 more\n' : \\nExecution URL: https://msit.powerbi.com/workloads/data-pipeline/artifacts/workspaces/0addb382-fa0d-4ce1-9c3c-95e25b957955/pipelines/PL_APP_Ingest/e0dc524a-b36a-49da-8af7-dac7ead251af?experience=power-bi\\nApp Name: APP\",\"CorrelationId\": \"\",\"User\": \"System\"}]"

Thanks,

Prabhat

v-pnaroju-msft · ‎03-21-2025

Hi @prabhatnath,

Thank you for reaching out to the Microsoft Fabric Community Forum.

Please find attached the screenshot and the relevant log_data, which may assist in resolving the issue:

log_data = "[{\"ApplicationName\": \"Ap001\",\"WorkspaceId\": \"af00f007-7e26-4654-803c-d82f11108b79\",\"Environment\": \"DEV\",\"Level\": \"ERROR\",\"Severity\": \"SEV 4\",\"Component\": \"PL_People\",\"Operation\": \"ETL\",\"Run_Id\": \"f5f780ba-b5b7-452c-80e7-77b05d898799\",\"SessionId\": \"\",\"Message\": \"[DEV] - Ap001 - Pipeline Execution Failed - 20250313\",\"Status\": \"Success\",\"Details\": \"Pipeline Name: PL_People\\nRun Id: f5f780ba-b5b7-452c-80e7-77b05d898799\\nError Message: Notebook Error Message: An error occurred while calling o4570.load.\n: Invalid URI The ABFS endpoint for host: mydomain.dfs.fabric.microsoft.com1 is not supported. It should match one of the valid configured endpoints [fabric.microsoft.com, data.microsoft.com, pbidedicated.windows.net, core.windows.net]\n\tat org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.validateHostnameEndpointsIfRequired(AzureBlobFileSystemStore.java:503)\n\tat org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.getURIBuilder(AzureBlobFileSystemStore.java:455)\n\tat org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.initializeClient(AzureBlobFileSystemStore.java:1649)\n\tat org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.<init>(AzureBlobFileSystemStore.java:258)\n\tat org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.initialize(AzureBlobFileSystem.java:192)\n\tat com.microsoft.vegas.vfs.VegasFileSystem.initialize(VegasFileSystem.java:133)\n\tat org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3469)\n\tat org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174)\n\tat org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574)\n\tat org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521)\n\tat org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540)\n\tat org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)\n\tat org.apache.spark.sql.delta.DeltaTableUtils$.findDeltaTableRoot(DeltaTable.scala:178)\n\tat org.apache.spark.sql.delta.sources.DeltaDataSource$.parsePathIdentifier(DeltaDataSource.scala:357)\n\tat org.apache.spark.sql.delta.catalog.DeltaTableV2.x$1$lzycompute(DeltaTableV2.scala:73)\n\tat org.apache.spark.sql.delta.catalog.DeltaTableV2.x$1(DeltaTableV2.scala:68)\n\tat org.apache.spark.sql.delta.catalog.DeltaTableV2.timeTravelByPath$lzycompute(DeltaTableV2.scala:68)\n\tat org.apache.spark.sql.delta.catalog.DeltaTableV2.timeTravelByPath(DeltaTableV2.scala:68)\n\tat org.apache.spark.sql.delta.catalog.DeltaTableV2.$anonfun$timeTravelSpec$1(DeltaTableV2.scala:122)\n\tat scala.Option.orElse(Option.scala:447)\n\tat org.apache.spark.sql.delta.catalog.DeltaTableV2.timeTravelSpec$lzycompute(DeltaTableV2.scala:122)\n\tat org.apache.spark.sql.delta.catalog.DeltaTableV2.timeTravelSpec(DeltaTableV2.scala:118)\n\tat org.apache.spark.sql.delta.catalog.DeltaTableV2.snapshot$lzycompute(DeltaTableV2.scala:126)\n\tat org.apache.spark.sql.delta.catalog.DeltaTableV2.snapshot(DeltaTableV2.scala:125)\n\tat org.apache.spark.sql.delta.catalog.DeltaTableV2.toBaseRelation(DeltaTableV2.scala:200)\n\tat org.apache.spark.sql.delta.sources.DeltaDataSource.$anonfun$createRelation$5(DeltaDataSource.scala:230)\n\tat org.apache.spark.sql.delta.metering.DeltaLogging.recordFrameProfile(DeltaLogging.scala:141)\n\tat org.apache.spark.sql.delta.metering.DeltaLogging.recordFrameProfile$(DeltaLogging.scala:139)\n\tat org.apache.spark.sql.delta.sources.DeltaDataSource.recordFrameProfile(DeltaDataSource.scala:49)\n\tat org.apache.spark.sql.delta.sources.DeltaDataSource.createRelation(DeltaDataSource.scala:188)\n\tat org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:346)\n\tat org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:236)\n\tat org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:219)\n\tat scala.Option.getOrElse(Option.scala:189)\n\tat org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:219)\n\tat org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:188)\n\tat java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.base/java.lang.reflect.Method.invoke(Method.java:566)\n\tat py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)\n\tat py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)\n\tat py4j.Gateway.invoke(Gateway.java:282)\n\tat py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)\n\tat py4j.commands.CallCommand.execute(CallCommand.java:79)\n\tat py4j.GatewayConnection.run(GatewayConnection.java:238)\n\tat java.base/java.lang.Thread.run(Thread.java:829)\n\\nExecution URL: https://msit.powerbi.com/workloads/data-pipeline/artifacts/workspaces/af00f007-7e26-4654-803c-d82f11...\nApp Name: Ap001\",\"CorrelationId\": \"\",\"User\": \"System\"}]"

If you find our response helpful, kindly mark it as the accepted solution and provide kudos. This will help other community members encountering similar queries.

Thank you.

prabhatnath · ‎03-29-2025

Hi Friend,

Below is another example that is failing as the Details section has text enclosed with " ("Internal Server Error") and I am not sure how do I parse these unescaped " chars insode the message. I cant directly do replace for entire file as the element names need to be enclosed with ".

Please suggest how can I parse these types of messages that are having text enclosed with ".

var_str_log_data = "[{\"ApplicationName\": \"APP\",\"WorkspaceId\": \"0addb382-fa0d-4ce1-9c3c-95e32b957955\",\"Environment\": \"DEV\",\"Level\": \"ERROR\",\"Severity\": \"SEV 4\",\"Component\": \"PL_APP_Ingest\",\"Operation\": \"ETL\",\"Run_Id\": \"88276e4f-4e60-47ec-a6d6-dd9e475d2c3a\",\"SessionId\": \"\",\"Message\": \"[DEV] - APP - Pipeline Execution Failed - 20250329\",\"Status\": \"Success\",\"Details\": \"Pipeline Name: PL_APP_Ingest\\nRun Id: 88257e4f-4e60-34ec-a6d6-dd9e475d2c3a\\nError Message: Notebook execution failed at Notebook service with http status code - '200', please check the Run logs on Notebook, additional details - 'Error name - Py4JJavaError, Error value - An error occurred while calling o4628.load.\n: java.util.concurrent.ExecutionException: Operation failed: \"Internal Server Error\", 500, HEAD, http://msit-onelake.dfs.fabric.microsoft.com/MYWORKSPACE/dataverse_name_cds2_workspace_unq60feaf930cbf236eb2519b27f2d6b.Lakehouse/Tables/app_riskparcel/_delta_log?upn=false&action=getStatus&timeout=90\n	at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306)\n	at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293)\n	at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)\n	at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)\n	... 37 more\n' : \\nExecution URL: https://msit.powerbi.com/workloads/data-pipeline/artifacts/workspaces/0addb122-fa0d-4ce1-9c3c-95e25b957955/pipelines/PL_APP_Ingest/88223e4f-4e60-47ec-a6d6-dd9e475d2c3a?experience=power-bi\\nApp Name: APP\",\"CorrelationId\": \"\",\"User\": \"System\"}]"

Thanks,

Prabhat

prabhatnath · ‎03-21-2025

Thank you for your help on this. And the statement did worked for Example-1 and Example-2.

Here is what I added:

# Escape control characters in the JSON string
log_data = log_data.replace('\n', '\\n').replace('\t', '\\t')

I tried below example and that failed as looks like need help to handle \". It will be great help if you can assit in this as well.

log_data = "[{\"ApplicationName\": \"App003\",\"WorkspaceId\": \"af00f007-7e26-4654-803c-d82f11108b79\",\"Environment\": \"DEV\",\"Level\": \"ERROR\",\"Severity\": \"SEV 4\",\"Component\": \"PL_App003_Ingest\",\"Operation\": \"ETL\",\"Run_Id\": \"e0dc524a-b36a-49da-8af7-dac7ead251af\",\"SessionId\": \"\",\"Message\": \"[DEV] - App003 - Pipeline Execution Failed - 20250319\",\"Status\": \"Success\",\"Details\": \"Pipeline Name: PL_App003_Ingest\\nRun Id: e0dc524a-b36a-49da-8af7-dac7ead251af\\nError Message: Notebook execution failed at Notebook service with http status code - '200', please check the Run logs on Notebook, additional details - 'Error name - Py4JJavaError, Error value - An error occurred while calling o5298.save.\n: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 299.0 failed 4 times, most recent failure: Lost task 0.3 in stage 299.0 (TID 3552) (vm-89b62646 executor 1): org.apache.spark.SparkUpgradeException: [INCONSISTENT_BEHAVIOR_CROSS_VERSION.READ_ANCIENT_DATETIME] You may get a different result due to the upgrading to Spark >= 3.0:\nreading dates before 1582-10-15 or timestamps before 1900-01-01T00:00:00Z\nfrom Parquet files can be ambiguous, as the files may be written by\nSpark 2.x or legacy versions of Hive, which uses a legacy hybrid calendar\nthat is different from Spark 3.0+'s Proleptic Gregorian calendar.\nSee more details in SPARK-31404. You can set the SQL config \"spark.sql.parquet.datetimeRebaseModeInRead\" or\nthe datasource option \"datetimeRebaseMode\" to \"LEGACY\" to rebase the datetime values\nw.r.t. the calendar difference during reading. To read the datetime values\nas it is, set the SQL config or the datasource option to \"CORRECTED\".\n	at org.apache.spark.sql.errors.QueryExecutionErrors$.sparkUpgradeInReadingDatesError(QueryExecutionErrors.scala:763)\n	at org.apache.spark.sql.execution.datasources.DataSourceUtils$.newRebaseExceptionInRead(DataSourceUtils.scala:178)\n	at org.apache.spark.sql.execution.datasources.DataSourceUtils.newRebaseExceptionInRead(DataSourceUtils.scala)\n	at org.apache.spark.sql.execution.datasources.parquet.ParquetVectorUpdaterFactory.rebaseTimestamp(ParquetVectorUpdaterFactory.java:1100)\n	at org.apache.spark.sql.execution.datasources.parquet.ParquetVectorUpdaterFactory.rebaseMicros(ParquetVectorUpdaterFactory.java:1113)\n	at org.apache.spark.sql.execution.datasources.parquet.ParquetVectorUpdaterFactory$LongWithRebaseUpdater.decodeSingleDictionaryId(ParquetVectorUpdaterFactory.java:577)\n	at org.apache.spark.sql.execution.datasources.parquet.ParquetVectorUpdater.decodeDictionaryIds(ParquetVectorUpdater.java:75)\n	at org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.readBatch(VectorizedColumnReader.java:240)\n	at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:328)\n	at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextKeyValue(VectorizedParquetRecordReader.java:219)\n	at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)\n	at org.apache.spark.sql.execution.datasources.RecordReaderIterator$$anon$1.hasNext(RecordReaderIterator.scala:61)\n	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:158)\n	at org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:615)\n	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown Source)\n	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)\n	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)\n	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:764)\n	at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:424)\n	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898)\n	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898)\n	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:57)\n	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:368)\n	at org.apache.spark.rdd.RDD.iterator(RDD.scala:332)\n	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92)\n	at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)\n	at org.apache.spark.scheduler.Task.run(Task.scala:139)\n	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:574)\n	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529)\n	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:577)\n	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n	at java.base/java.lang.Thread.run(Thread.java:829)\n\nDriver stacktrace:\n	at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2871)\n	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2807)\n	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2806)\n	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)\n	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)\n	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)\n	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2806)\n	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1229)\n	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1229)\n	at scala.Option.foreach(Option.scala:407)\n	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1229)\n	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:3070)\n	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3009)\n	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2998)\n	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)\n	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:988)\n	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2418)\n	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2439)\n	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2458)\n	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2483)\n	at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1029)\n	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)\n	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)\n	at org.apache.spark.rdd.RDD.withScope(RDD.scala:409)\n	at org.apache.spark.rdd.RDD.collect(RDD.scala:1028)\n	at org.apache.spark.sql.execution.SparkPlan.executeCollectIterator(SparkPlan.scala:491)\n	at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.$anonfun$relationFuture$1(BroadcastExchangeExec.scala:137)\n	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$1(SQLExecution.scala:236)\n	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)\n	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n	at java.base/java.lang.Thread.run(Thread.java:829)\nCaused by: org.apache.spark.SparkUpgradeException: [INCONSISTENT_BEHAVIOR_CROSS_VERSION.READ_ANCIENT_DATETIME] You may get a different result due to the upgrading to Spark >= 3.0:\nreading dates before 1582-10-15 or timestamps before 1900-01-01T00:00:00Z\nfrom Parquet files can be ambiguous, as the files may be written by\nSpark 2.x or legacy versions of Hive, which uses a legacy hybrid calendar\nthat is different from Spark 3.0+'s Proleptic Gregorian calendar.\nSee more details in SPARK-31404. You can set the SQL config \"spark.sql.parquet.datetimeRebaseModeInRead\" or\nthe datasource option \"datetimeRebaseMode\" to \"LEGACY\" to rebase the datetime values\nw.r.t. the calendar difference during reading. To read the datetime values\nas it is, set the SQL config or the datasource option to \"CORRECTED\".\n	at org.apache.spark.sql.errors.QueryExecutionErrors$.sparkUpgradeInReadingDatesError(QueryExecutionErrors.scala:763)\n	at org.apache.spark.sql.execution.datasources.DataSourceUtils$.newRebaseExceptionInRead(DataSourceUtils.scala:178)\n	at org.apache.spark.sql.execution.datasources.DataSourceUtils.newRebaseExceptionInRead(DataSourceUtils.scala)\n	at org.apache.spark.sql.execution.datasources.parquet.ParquetVectorUpdaterFactory.rebaseTimestamp(ParquetVectorUpdaterFactory.java:1100)\n	at org.apache.spark.sql.execution.datasources.parquet.ParquetVectorUpdaterFactory.rebaseMicros(ParquetVectorUpdaterFactory.java:1113)\n	at org.apache.spark.sql.execution.datasources.parquet.ParquetVectorUpdaterFactory$LongWithRebaseUpdater.decodeSingleDictionaryId(ParquetVectorUpdaterFactory.java:577)\n	at org.apache.spark.sql.execution.datasources.parquet.ParquetVectorUpdater.decodeDictionaryIds(ParquetVectorUpdater.java:75)\n	at org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.readBatch(VectorizedColumnReader.java:240)\n	at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:328)\n	at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextKeyValue(VectorizedParquetRecordReader.java:219)\n	at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)\n	at org.apache.spark.sql.execution.datasources.RecordReaderIterator$$anon$1.hasNext(RecordReaderIterator.scala:61)\n	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:158)\n	at org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:615)\n	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown Source)\n	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)\n	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)\n	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:764)\n	at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:424)\n	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898)\n	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898)\n	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:57)\n	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:368)\n	at org.apache.spark.rdd.RDD.iterator(RDD.scala:332)\n	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92)\n	at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)\n	at org.apache.spark.scheduler.Task.run(Task.scala:139)\n	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:574)\n	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529)\n	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:577)\n	... 3 more\n' : \\nExecution URL: https://msit.powerbi.com/workloads/data-pipeline/artifacts/workspaces/af00f007-7e26-4654-803c-d82f11108b79/pipelines/PL_App003_Ingest/e0dc524a-b36a-49da-8af7-dac7ead251af?experience=power-bi\\nApp Name: DCRM\",\"CorrelationId\": \"\",\"User\": \"System\"}]"

Thanks,

Prabhat

v-pnaroju-msft · ‎03-29-2025

Hi prabhatnath,

Kindly find the attached screenshot, which may assist in resolving the issue:

If you find our response helpful, kindly mark it as the accepted solution and provide kudos. This will help other community members encountering similar queries.

Thank you.