Power BI is turning 10, and we’re marking the occasion with a special community challenge. Use your creativity to tell a story, uncover trends, or highlight something unexpected.
Get startedJoin us at FabCon Vienna from September 15-18, 2025, for the ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM. Get registered
Hi,
I'm trying to setup a Fabric Pipeline to consume Azure Open Datasets NYC Taxi and Limousine yellow dataset - Azure Open Datasets | Microsoft Learn. For that I created a connection to the blob storage with the name azureopendatastorage as it's in the samples. The data from NYC Yellow Caps is partitioned so I create a source in my copy activity like this:
When I click on "Preview Data" I see a correct sample and also the partitions are detected correctly, I see additional fields in the dataset reflecting them.
However, when I now run the package I receive the following error:
Error details are:
{
"errorCode": "2200",
"message": "Failure happened on 'Source' side. ErrorCode=PartitionDiscoveryWithInvalidFolderPath,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Source file path 'nyctlc/yellow/puYear=2010/puMonth=8/part-00000-tid-8898858832658823408-a1de80bd-eed3-4d11-b9d4-fa74bfbd47bc-426339-25.c000.snappy.parquet' invalid when processing partition discovery. Please check the folder pathes.,Source=Microsoft.DataTransfer.ClientLibrary,'",
"failureType": "UserError",
"target": "Copy data1",
"details": []
}
I checked the data using a pyspark Notebook in Synapse and it works fine...
Any idea what I should check?
Thanks,
Thomas
Hi,
sorry for the late reply (I was off for a few days) and thanks for your feedback...
I saw the option for getting the NYC Taxi data as a sample but this is too easy for me 😉 And I wanted to use the "big" dataset containing >50GB of data...
I tried what you suggested. This removes the puYear and puMonth from the dataset so I can't use them anymore for partitioning the result in OneLake. So I skipped partitioning for now...
I deed this was now successful
Interesting to see that data compression seems to be higher at the destination...
Question remains why my approach didn't work... Might be a bug I think...
Thanks,
Thomas
Couple of suggestions:
1. There is a much easier option that allows you to ingest NYC taxi data from open datasets, instead of creating your own connection, in the copy activity, go to source, and chose the 3rd option of "sample datasets", then choose NYC taxi option.
2. However, the way you are doing, should have also worked. Can you disable the partition discovery option and retry ? Another way to load the entire data is via the below configuration:
This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.
Check out the June 2025 Fabric update to learn about new features.