Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Did you hear? There's a new SQL AI Developer certification (DP-800). Start preparing now and be one of the first to get certified. Register now

OpenLineage support for Column Level Lineage in Lakehouse

OpenLineage lets you trace a large number of ETL jobs that mutates tables, letting you trace changes to tables significantly easier.


Databricks Unity Catalog has support for Column Level Lineage and it's highly useful:
View data lineage using Unity Catalog - Azure Databricks | Microsoft Learn
Databricks Demo

 

AWS also supports OpenLineage: 
Amazon DataZone introduces OpenLineage-compatible data lineage visualization in preview | AWS Big Da...

It's actually quite easy to get OpenLineage working in Fabric Spark since you already have the JAR pre-installed:

Column level lineage in Fabric Spark with OpenLineage and stashing the lineage in Delta Lake | Raki ...

 

All we need is a way to store and query historical state (e.g. in Delta Lake), and a UI component that hooks into the Lakehouse catalog and offers an API on top of OpenLineage schema.

The OneLake Catalog UI could look like this:

mdrakiburrahman_1-1769384452251.jpeg

 

To prove this can be fairly easily achieved since the OpenLineage events are robust with a stable schema, I vibe-coded this little static site that works with the default Fabric Spark OpenLineage events: Marquito — OpenLineage Visualizer

mdrakiburrahman_0-1771702104744.png

 


 

 

Status: New
Comments
ThornKevin
Advocate I
This would bring massive benefit to our company! In addition, it would be great to check which columns should have data quality rules assigned - based on this visualized utilization.