The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
I see the OpenLineage libraries are by default included as built-in library in Spark. When a notebook reads and writes to OneLake does it emit lineage events automatically? According to Copilot it does and lineage visualization in Purview is optional. Where are those events stored? I see a SparkLineage folder in OneLake but it is always empty. I am not able to find clear documentation regarding this topic. I appreciate comments. Thank you.
Solved! Go to Solution.
Hi @RenatoDM
The `SparkLineage` folder in OneLake is not populated by default. Its presence suggests compatibility with OpenLineage standards, but explicit configuration is required.
• To emit granular OpenLineage events (e.g., column-level lineage), you must:
• Implement a SparkListener to intercept Spark execution plans.
• Configure diagnostic emitters to route logs to Azure Storage or Log Analytics
Native Purview integration captures basic item-level lineage (e.g., notebook → Lakehouse table) but doesn’t populate `SparkLineage`
Hi @RenatoDM
The `SparkLineage` folder in OneLake is not populated by default. Its presence suggests compatibility with OpenLineage standards, but explicit configuration is required.
• To emit granular OpenLineage events (e.g., column-level lineage), you must:
• Implement a SparkListener to intercept Spark execution plans.
• Configure diagnostic emitters to route logs to Azure Storage or Log Analytics
Native Purview integration captures basic item-level lineage (e.g., notebook → Lakehouse table) but doesn’t populate `SparkLineage`
User | Count |
---|---|
6 | |
2 | |
2 | |
2 | |
2 |
User | Count |
---|---|
18 | |
18 | |
6 | |
5 | |
5 |