Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get Fabric Certified for FREE during Fabric Data Days. Don't miss your chance! Request now

Reply
RenatoDM
New Member

Spark Data Lineage

I see the OpenLineage libraries are by default included as built-in library in Spark. When a notebook reads and writes to OneLake does it emit lineage events automatically? According to Copilot it does and lineage visualization in Purview is optional. Where are those events stored? I see a SparkLineage folder in OneLake but it is always empty. I am not able to find clear documentation regarding this topic. I appreciate comments. Thank you.

1 ACCEPTED SOLUTION
nilendraFabric
Super User
Super User

Hi @RenatoDM 

 

The `SparkLineage` folder in OneLake is not populated by default. Its presence suggests compatibility with OpenLineage standards, but explicit configuration is required.
• To emit granular OpenLineage events (e.g., column-level lineage), you must:
• Implement a SparkListener to intercept Spark execution plans.
• Configure diagnostic emitters to route logs to Azure Storage or Log Analytics

 

 

Native Purview integration captures basic item-level lineage (e.g., notebook → Lakehouse table) but doesn’t populate `SparkLineage`

 

https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/azure-synapse-diagnostic-emitters-az...

 

 

 

 

View solution in original post

1 REPLY 1
nilendraFabric
Super User
Super User

Hi @RenatoDM 

 

The `SparkLineage` folder in OneLake is not populated by default. Its presence suggests compatibility with OpenLineage standards, but explicit configuration is required.
• To emit granular OpenLineage events (e.g., column-level lineage), you must:
• Implement a SparkListener to intercept Spark execution plans.
• Configure diagnostic emitters to route logs to Azure Storage or Log Analytics

 

 

Native Purview integration captures basic item-level lineage (e.g., notebook → Lakehouse table) but doesn’t populate `SparkLineage`

 

https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/azure-synapse-diagnostic-emitters-az...

 

 

 

 

Helpful resources

Announcements
Fabric Data Days Carousel

Fabric Data Days

Advance your Data & AI career with 50 days of live learning, contests, hands-on challenges, study groups & certifications and more!

October Fabric Update Carousel

Fabric Monthly Update - October 2025

Check out the October 2025 Fabric update to learn about new features.

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.