Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Enhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.

Reply
WishAskedSooner
Responsive Resident
Responsive Resident

Writing Pipeline Output to JSON file in Lakehouse

Hi Experts!

 

Please correct me if I am mistaken, but my understanding is that the output of a pipeline activity essentially uses JSON format. Therefore, I have created a very simple pipeline that includes Get Metadata and Copy. My goal is to save the output of the Get Metadata activity as a JSON file in a Lakehouse in the Workspace. However, I keep getting errors in the Copy step.

 

Pipeline

WishAskedSooner_0-1724332062819.png

 

Get Metadata Output

WishAskedSooner_1-1724332154641.png

 

Copy Activity Source Settings: Connection using dynamic content set to @activity('Get Metadata1').output

WishAskedSooner_2-1724332526625.png

 

Copy Activity Destination Settings: Connection is my Lakehouse root directory as output.json

WishAskedSooner_4-1724332876907.png

I have tried various tweaks of the settings in both Source and Destination, but I always get an error which is difficult to interpret. I am guessing that the Source Connection is the problem and can't be set to the output of an activity unless that output is a connection string to a valid Lakehouse or Warehouse vs a JSON file.

 

Therefore, is it possible to save the output of an activity as a JSON file without having to invoke Notebook? And if Notebook is the only way, how does one go about that as simply as possible?

 

Thanks in advance for the help!

1 ACCEPTED SOLUTION
jwinchell40
Super User
Super User

@WishAskedSooner - 

 

We created a process that loops through a set of database tables and clones them to Fabric Lakehouse Tables.  We wanted to log each job run to the "Files" section because we were going to setup Unmanaged Tables reading those source files.  We didn't need to put the data in a managed file.  So we had to figure out a workaround with Copy Activity to write the output of a previous Copy Activity into the Files area.

 

Pipeline (Basic Premise Below):

jwinchell40_2-1724606052824.png

 

 

Source Setup:

We created a dummy Source by just connecting to a Fabric Datawarehouse and writing a query "SELECT TOP 1 * from Test".  This was basically a schemaless query.  We then used the "Additional Columns" section of the source to define our output schema.  We then used variables to read data from prior activities (like a previous Copy Step to get # of rows inserted) (Ex: @if(equals(activity('Copy Table').output.rowsRead,0),0,activity('Copy Table').output.rowsCopied)

 

jwinchell40_0-1724605817521.png

 

Destination Setup:

We then setup our destination, which in our case was the "Files" section of the Lakehouse because we wanted to write an execution log to the Files section so we could create an "Unmanaged" table in the Lakehouse.  We then specified our mapping manually and all of the custom properties we added then showed up for usin the mapping.  

jwinchell40_1-1724605923764.png

 

Result:

Now we were able to write a file for every Table that was copied to log ActivivytId, Table Names, Rows Read/Written etc.

 

View solution in original post

5 REPLIES 5
frithjof_v
Super User
Super User

Consider voting for this idea (link below) to highlight this need.

 

I think this idea will make it possible to natively write the outputs of any data pipeline activity to a Lakehouse file or table.

 

Write Data Factory Web activity output to Lakehouse File or Table

https://ideas.fabric.microsoft.com/ideas/idea/?ideaid=77919252-7f67-ee11-a81c-000d3ae3b222

 

Here is also a new idea:

 

Write data pipeline content to file or table

https://ideas.fabric.microsoft.com/ideas/idea/?ideaid=ffe2af32-3463-ef11-a4e6-6045bdb21058

@frithjof_v -  I will vote on those if I haven't already.

 

I know that the default output logging will support Fabric and not just ADLS Gen2 in the future as well.

 

To be fair this issue exists in ADF as well.  

jwinchell40
Super User
Super User

@WishAskedSooner - 

 

We created a process that loops through a set of database tables and clones them to Fabric Lakehouse Tables.  We wanted to log each job run to the "Files" section because we were going to setup Unmanaged Tables reading those source files.  We didn't need to put the data in a managed file.  So we had to figure out a workaround with Copy Activity to write the output of a previous Copy Activity into the Files area.

 

Pipeline (Basic Premise Below):

jwinchell40_2-1724606052824.png

 

 

Source Setup:

We created a dummy Source by just connecting to a Fabric Datawarehouse and writing a query "SELECT TOP 1 * from Test".  This was basically a schemaless query.  We then used the "Additional Columns" section of the source to define our output schema.  We then used variables to read data from prior activities (like a previous Copy Step to get # of rows inserted) (Ex: @if(equals(activity('Copy Table').output.rowsRead,0),0,activity('Copy Table').output.rowsCopied)

 

jwinchell40_0-1724605817521.png

 

Destination Setup:

We then setup our destination, which in our case was the "Files" section of the Lakehouse because we wanted to write an execution log to the Files section so we could create an "Unmanaged" table in the Lakehouse.  We then specified our mapping manually and all of the custom properties we added then showed up for usin the mapping.  

jwinchell40_1-1724605923764.png

 

Result:

Now we were able to write a file for every Table that was copied to log ActivivytId, Table Names, Rows Read/Written etc.

 

Anonymous
Not applicable

Hi @WishAskedSooner ,

The output of the Get Metadata activity is indeed in JSON format. However, directly using this output as a source in the Copy activity can be tricky because the Copy activity expects a dataset or a file path, not a JSON object.

 

If you want to avoid using a Notebook,  one common method is to use a Web Activity to call an Azure Function or Logic App that writes the JSON to a file in your Lakehouse. You can look at this document: www.techbrothersit.com

 

If using a Notebook is acceptable, you can use the Microsoft Spark Utilities to write the JSON output to a file. Here’s a simple example:

import json
from mssparkutils.fs import put

# Assuming 'metadata_output' is the JSON output from the Get Metadata activity
metadata_output = {
    "name": "example",
    "type": "file",
    "size": 12345
}

# Convert the JSON to a string
metadata_str = json.dumps(metadata_output)

# Write the JSON string to a file in the Lakehouse
put("/lakehouse/output.json", metadata_str)

 

If you still want to use the Copy activity, you might need to first write the JSON output to a temporary file using a Web Activity or a custom activity, and then use the Copy activity to move that file to your desired location.

 

 

 

Best Regards

Yilong Zhou

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Thank you so much for the reply and clarification regarding the output of the Metadata activity. It is reassuring knowing what exactly one is dealing with when passing data between activities, exporting to a file, etc.

 

I will look into the solutions you have provided. Thank you.

Helpful resources

Announcements
Fabric July 2025 Monthly Update Carousel

Fabric Monthly Update - July 2025

Check out the July 2025 Fabric update to learn about new features.

July 2025 community update carousel

Fabric Community Update - July 2025

Find out what's new and trending in the Fabric community.

Top Solution Authors