The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
Hi Experts!
Please correct me if I am mistaken, but my understanding is that the output of a pipeline activity essentially uses JSON format. Therefore, I have created a very simple pipeline that includes Get Metadata and Copy. My goal is to save the output of the Get Metadata activity as a JSON file in a Lakehouse in the Workspace. However, I keep getting errors in the Copy step.
Pipeline
Get Metadata Output
Copy Activity Source Settings: Connection using dynamic content set to @activity('Get Metadata1').output
Copy Activity Destination Settings: Connection is my Lakehouse root directory as output.json
I have tried various tweaks of the settings in both Source and Destination, but I always get an error which is difficult to interpret. I am guessing that the Source Connection is the problem and can't be set to the output of an activity unless that output is a connection string to a valid Lakehouse or Warehouse vs a JSON file.
Therefore, is it possible to save the output of an activity as a JSON file without having to invoke Notebook? And if Notebook is the only way, how does one go about that as simply as possible?
Thanks in advance for the help!
Solved! Go to Solution.
We created a process that loops through a set of database tables and clones them to Fabric Lakehouse Tables. We wanted to log each job run to the "Files" section because we were going to setup Unmanaged Tables reading those source files. We didn't need to put the data in a managed file. So we had to figure out a workaround with Copy Activity to write the output of a previous Copy Activity into the Files area.
Pipeline (Basic Premise Below):
Source Setup:
We created a dummy Source by just connecting to a Fabric Datawarehouse and writing a query "SELECT TOP 1 * from Test". This was basically a schemaless query. We then used the "Additional Columns" section of the source to define our output schema. We then used variables to read data from prior activities (like a previous Copy Step to get # of rows inserted) (Ex: @if(equals(activity('Copy Table').output.rowsRead,0),0,activity('Copy Table').output.rowsCopied)
Destination Setup:
We then setup our destination, which in our case was the "Files" section of the Lakehouse because we wanted to write an execution log to the Files section so we could create an "Unmanaged" table in the Lakehouse. We then specified our mapping manually and all of the custom properties we added then showed up for usin the mapping.
Result:
Now we were able to write a file for every Table that was copied to log ActivivytId, Table Names, Rows Read/Written etc.
Consider voting for this idea (link below) to highlight this need.
I think this idea will make it possible to natively write the outputs of any data pipeline activity to a Lakehouse file or table.
Write Data Factory Web activity output to Lakehouse File or Table
https://ideas.fabric.microsoft.com/ideas/idea/?ideaid=77919252-7f67-ee11-a81c-000d3ae3b222
Here is also a new idea:
Write data pipeline content to file or table
https://ideas.fabric.microsoft.com/ideas/idea/?ideaid=ffe2af32-3463-ef11-a4e6-6045bdb21058
@frithjof_v - I will vote on those if I haven't already.
I know that the default output logging will support Fabric and not just ADLS Gen2 in the future as well.
To be fair this issue exists in ADF as well.
We created a process that loops through a set of database tables and clones them to Fabric Lakehouse Tables. We wanted to log each job run to the "Files" section because we were going to setup Unmanaged Tables reading those source files. We didn't need to put the data in a managed file. So we had to figure out a workaround with Copy Activity to write the output of a previous Copy Activity into the Files area.
Pipeline (Basic Premise Below):
Source Setup:
We created a dummy Source by just connecting to a Fabric Datawarehouse and writing a query "SELECT TOP 1 * from Test". This was basically a schemaless query. We then used the "Additional Columns" section of the source to define our output schema. We then used variables to read data from prior activities (like a previous Copy Step to get # of rows inserted) (Ex: @if(equals(activity('Copy Table').output.rowsRead,0),0,activity('Copy Table').output.rowsCopied)
Destination Setup:
We then setup our destination, which in our case was the "Files" section of the Lakehouse because we wanted to write an execution log to the Files section so we could create an "Unmanaged" table in the Lakehouse. We then specified our mapping manually and all of the custom properties we added then showed up for usin the mapping.
Result:
Now we were able to write a file for every Table that was copied to log ActivivytId, Table Names, Rows Read/Written etc.
Hi @WishAskedSooner ,
The output of the Get Metadata activity is indeed in JSON format. However, directly using this output as a source in the Copy activity can be tricky because the Copy activity expects a dataset or a file path, not a JSON object.
If you want to avoid using a Notebook, one common method is to use a Web Activity to call an Azure Function or Logic App that writes the JSON to a file in your Lakehouse. You can look at this document: www.techbrothersit.com
If using a Notebook is acceptable, you can use the Microsoft Spark Utilities to write the JSON output to a file. Here’s a simple example:
import json
from mssparkutils.fs import put
# Assuming 'metadata_output' is the JSON output from the Get Metadata activity
metadata_output = {
"name": "example",
"type": "file",
"size": 12345
}
# Convert the JSON to a string
metadata_str = json.dumps(metadata_output)
# Write the JSON string to a file in the Lakehouse
put("/lakehouse/output.json", metadata_str)
If you still want to use the Copy activity, you might need to first write the JSON output to a temporary file using a Web Activity or a custom activity, and then use the Copy activity to move that file to your desired location.
Best Regards
Yilong Zhou
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Thank you so much for the reply and clarification regarding the output of the Metadata activity. It is reassuring knowing what exactly one is dealing with when passing data between activities, exporting to a file, etc.
I will look into the solutions you have provided. Thank you.