Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Don't miss out! 2025 Microsoft Fabric Community Conference, March 31 - April 2, Las Vegas, Nevada. Use code MSCUST for a $150 discount. Prices go up February 11th. Register now.

Reply
ahmadammari
Advocate I
Advocate I

Issue with Loading Retail Data Model from Wide World Importers into Data Warehouse on Fabric

Hello Fabric Community,

I am encountering an issue when trying to load the Retail Data Model from Wide World Importers into my Data Warehouse using a Data Pipeline Copy data assistant on Fabric. The error message I receive is as follows:

 

 

Failed to load
Blob operation failed for: Blob Storage on container 'sampledata' and path 'sampledata/WideWorldImportersDW/parquet/full/dimension_city/' get failed with 'The remote server returned an error: (403) Forbidden.'. Possible root causes: (1). Grant service principal or managed identity appropriate permissions to do copy. For source, at least the “Storage Blob Data Reader” role. For destination, at least the “Storage Blob Data Contributor” role. For more information, see https://docs.microsoft.com/en-us/azure/data-factory/connector-azure-blob-storage?tabs=data-factory#service-principal-authentication. (2). It's possible because some IP address ranges of Data Factory are not allowed by your Azure Storage firewall settings. Data Factory IP ranges please refer https://docs.microsoft.com/en-us/azure/data-factory/azure-integration-runtime-ip-addresses. If you allow trusted Microsoft services to access this storage account option in firewall, you must use https://docs.microsoft.com/en-us/azure/data-factory/connector-azure-blob-storage?tabs=data-factory#managed-identity. For more information on Azure Storage firewalls settings, see https://docs.microsoft.com/en-us/azure/storage/common/storage-network-security?tabs=azure-portal.. The remote server returned an error: (403) Forbidden.StorageExtendedMessage=This request is not authorized to perform this operation. RequestId:d9a153c9-901e-0050-7157-c85dbd000000 Time:2024-06-27T06:01:34.7150867Z, The remote server returned an error: (403) Forbidden. Activity ID: 8ef3f463-ef55-49d4-abbd-5736efc81fda

 

Interestingly, I was able to load a different sample using the Copy data assistant in Fabric without creating a storage account on Azure. Therefore, I believe this issue is not related to my Azure account.

I would appreciate any guidance or suggestions to resolve this issue. Thank you in advance for your help.

Best, 

Ahmadsample_data_error.png

19 REPLIES 19
ahmadammari
Advocate I
Advocate I

Hello Community,

I wanted to share an update on the issue I was facing with loading the Retail Data Model from Wide World Importers into the Data Warehouse on Fabric. After reaching out to Microsoft, they have confirmed that the issue is related to a company-level security policy. They are actively working on resolving this, but unfortunately, there is no estimated time of completion at this moment.

As a result, the Wide World Importers dataset has been temporarily hidden on the data pipeline UI. In the meantime, Microsoft has suggested using other available sample datasets to continue exploring and working with the platform effectively.

I appreciate your support and will keep you updated on any further developments.

Best,

Ahmad

ryomaru0825
Frequent Visitor

Hi @ahmadammari ,

Unfortunately, it seems that the wwi sample data storage container is no longer available.

 

At the beginning of "Step 5. Prepare data", please try running the following code to load the wwi sample data provided by Microsoft (Tutorial: Load data using Azure portal & SSMS - Azure Synapse Analytics | Microsoft Learn).

 

 

 

 

from pyspark.sql.types import *  

fact_sale_schema = StructType([  
    StructField("SaleKey", LongType(), True),  
    StructField("CityKey", IntegerType(), True),  
    StructField("CustomerKey", IntegerType(), True),  
    StructField("BillToCustomerKey", IntegerType(), True),  
    StructField("StockItemKey", IntegerType(), True),  
    StructField("InvoiceDateKey", TimestampType(), True),  
    StructField("DeliveryDateKey", TimestampType(), True),  
    StructField("SalespersonKey", IntegerType(), True),  
    StructField("WWIInvoiceID", IntegerType(), True),  
    StructField("Description", StringType(), True),  
    StructField("Package", StringType(), True),  
    StructField("Quantity", IntegerType(), True),  
    StructField("UnitPrice", DecimalType(18, 2), True),  
    StructField("TaxRate", DecimalType(18, 3), True),  
    StructField("TotalExcludingTax", DecimalType(18, 2), True),  
    StructField("TaxAmount", DecimalType(18, 2), True),  
    StructField("Profit", DecimalType(18, 2), True),  
    StructField("TotalIncludingTax", DecimalType(18, 2), True),  
    StructField("TotalDryItems", IntegerType(), True),  
    StructField("TotalChillerItems", IntegerType(), True),  
    StructField("LineageKey", IntegerType(), True)
])  

dimension_city_schema = StructType([  
    StructField("CityKey", IntegerType(), True),  
    StructField("WWICityID", IntegerType(), True),  
    StructField("City", StringType(), True),  
    StructField("StateProvince", StringType(), True),  
    StructField("Country", StringType(), True),  
    StructField("Continent", StringType(), True),  
    StructField("SalesTerritory", StringType(), True),  
    StructField("Region", StringType(), True),  
    StructField("Subregion", StringType(), True),  
    StructField("Location", StringType(), True),  
    StructField("LatestRecordedPopulation", LongType(), True),  
    StructField("ValidFrom", TimestampType(), True),  
    StructField("ValidTo", TimestampType(), True),  
    StructField("LineageKey", IntegerType(), True)  
])  

dimension_employee_schema = StructType([  
    StructField("EmployeeKey", IntegerType(), True),  
    StructField("WWIEmployeeID", IntegerType(), True),  
    StructField("Employee", StringType(), True),  
    StructField("PreferredName", StringType(), True),  
    StructField("IsSalesperson", IntegerType(), True),  
    StructField("Photo", StringType(), True),  
    StructField("ValidFrom", TimestampType(), True),  
    StructField("ValidTo", TimestampType(), True),  
    StructField("LineageKey", IntegerType(), True)  
])  

dimension_stock_item_schema = StructType([  
    StructField("StockItemKey", IntegerType(), True),  
    StructField("WWIStockItemID", IntegerType(), True),  
    StructField("StockItem", StringType(), True),  
    StructField("Color", StringType(), True),  
    StructField("SellingPackage", StringType(), True),  
    StructField("BuyingPackage", StringType(), True),  
    StructField("Brand", StringType(), True),  
    StructField("Size", StringType(), True),  
    StructField("LeadTimeDays", IntegerType(), True),  
    StructField("QuantityPerOuter", IntegerType(), True),  
    StructField("IsChillerStock", BooleanType(), True),  
    StructField("Barcode", StringType(), True),  
    StructField("TaxRate", DecimalType(18, 3), True),  
    StructField("UnitPrice", DecimalType(18, 2), True),  
    StructField("RecommendedRetailPrice", DecimalType(18, 2), True),  
    StructField("TypicalWeightPerUnit", DecimalType(18, 3), True),  
    StructField("Photo", StringType(), True),  
    StructField("ValidFrom", TimestampType(), True),  
    StructField("ValidTo", TimestampType(), True),  
    StructField("LineageKey", IntegerType(), True)  
])
dimension_customer_schema = StructType([  
    StructField("CustomerKey", LongType(), True),  
    StructField("WWICustomerID", LongType(), True),  
    StructField("Customer", StringType(), True),  
    StructField("BillToCustomer", StringType(), True),  
    StructField("Category", StringType(), True),  
    StructField("BuyingGroup", StringType(), True),  
    StructField("PrimaryContact", StringType(), True),  
    StructField("PostalCode", StringType(), True),  
    StructField("ValidFrom", TimestampType(), True),  
    StructField("ValidTo", TimestampType(), True),  
    StructField("LineageKey", LongType(), True)  
])  

tables = [
    {
        "dist_folder": "fact_sale_1y_full",
        "sorce_folder": "fact_Sale",
        "schema": fact_sale_schema
    },
    {
        "dist_folder": "dimension_city",
        "sorce_folder": "dimension_City",
        "schema": dimension_city_schema
    },
    {
        "dist_folder": "dimension_employee",
        "sorce_folder": "dimension_Employee",
        "schema": dimension_employee_schema
    },
    {
        "dist_folder": "dimension_stock_item",
        "sorce_folder": "dimension_StockItem",
        "schema": dimension_stock_item_schema
    },
    {
        "dist_folder": "dimension_customer",
        "sorce_folder": "dimension_Customer",
        "schema": dimension_customer_schema
    }
]

dist_lakehoues_files = "Files/wwi-raw-data/full/"
source_wasb = "wasbs://wideworldimporters@sqldwholdata.blob.core.windows.net/v1"
# folders = mssparkutils.fs.ls(source_wasb)

for table in tables:
    schema = table["schema"]
    source_folder=f"{source_wasb}/"+ table["sorce_folder"]
    dist_folder=f"{dist_lakehoues_files}/"+ table["dist_folder"]
    df = spark.read.format("csv").option("delimiter","|").schema(schema).load(source_folder)
    if table["sorce_folder"] == "fact_Sale":
        df = df.where("InvoiceDateKey <= '2000-11-30 00:00:00' AND InvoiceDateKey >= '2000-01-01 00:00:00'")

    df.write.format("parquet").mode("overwrite").save(dist_folder)


## dim date

from pyspark.sql.functions import sequence, explode

start_date = "2000-01-01"  
end_date = "2016-12-31"  
date_df = spark.sql(f"SELECT explode(sequence(to_date('{start_date}'), to_date('{end_date}'), interval 1 day)) as Date")
date_format_df = spark.sql("""
    SELECT 
        `Date`,
        dayofmonth(`Date`) as `DayNumber`,
        cast(dayofmonth(`Date`) as string ) as `Day`,
        date_format(`Date`, "MMMM") as `Month`,
        date_format(`Date`, "MMM") as `ShortMonth`,
        month(`Date`) as `CalendarMonthNumber`,
        concat('CY',year(`Date`),'-',date_format(`Date`, "MMM")) as `CalendarMonthLabel`,
        year(`Date`) as `CalendarYear`,
        concat('CY',year(`Date`)) as `CalendarYearLabel`
    FROM 
        {df}
""",df = date_df)

date_dist_folder = f"{dist_lakehoues_files}/dimension_date"
date_format_df.write.format("parquet").mode("overwrite").save(date_dist_folder)

 

 

 

 

dfltusr
New Member

I'm having the same issue when running through the Lakehouse tutorial.  Has anyone received an further information on this?

reded1
Frequent Visitor

I am also getting same error

erens
New Member

I am also facing this issue. Would love to have an update if anyone finds a solution!

AzizQuazi
Helper I
Helper I

@Anonymous ,

 

I have also faced a similar issue for the https://learn.microsoft.com/en-gb/fabric/data-engineering/tutorial-lakehouse-data-ingestion
Shows Invalid credentials:
8. On the Connection settings screen, enter the following details and select Create to create the connection to the data source.
- the error message is attached

https://docs.google.com/presentation/d/19KpTuwdwMwmMtu_WYlBp8aQ8vkc_n0Tk/edit?usp=sharing&ouid=109581682156551055280&rtpof=true&sd=true

Please help to solve this. Thanks.

Dima131186
Regular Visitor

Have the same issue

ghyros
New Member

Same issue encountered. This data source is also encountered in the Learning Module Lab Exercise for Explore real-time analytics in Microsoft Fabric, and doesn't mention anything about any permissions requirements that need to be set.

It sounds to me like the sample data sets at https://azuresynapsestorage.blob.core.windows.net/sampledata are not correctly configured to allow anonymous access -- or, these tutorials do not adqueately specify how to configure permissions. 

How can we get this corrected? Currently, the only way to proceed with the above exercise is to create an Azure SQL database, load WideWorldImportersDW into it, and then ingest the needed information that way. This definitely complicates and adds significant time (and resource use) to the completion of these tutorials.

I'm also facing the same issue. Even though in the Microsoft Learn tutorial below it specifies Anonomous Authenticaion Kind, i am getting the Invalid credentials error. (also tested from Power BI)

https://learn.microsoft.com/en-gb/fabric/data-engineering/tutorial-lakehouse-data-ingestion

 

Has anyone had any luck connecting to https://azuresynapsestorage.blob.core.windows.net/sampledata yet?

 

 

MikeStoner
New Member

I also get a version of this. I'm following the tutorial in Lakehouse tutorial - Ingest data into the lakehouse - Microsoft Fabric | Microsoft Learn. Step 8 states that I should be able to create an anonymous connection to 

"https://azuresynapsestorage.blob.core.windows.net/sampledata"

 

However this fails with "Invalid credentials"

I have the same issue. Thanks.

Same issue here

Anonymous
Not applicable

Hi @ahmadammari ,

Thanks for using Fabric Community. 
Based on the error message, it looks like a permission issue.
Can you please check and confirm whether you provided proper permission?

Anonymous
Not applicable

Hi @ahmadammari ,

Apologize for the issue you are facing. The best course of action is to open a support ticket and have our support team take a closer look at it.

 

Please reach out to our support team so they can do a more thorough investigation on why this it is happening: Link 

 

After creating a Support ticket please provide the ticket number as it would help us to track for more information.

 

Hope this helps. Please let us know if you have any other queries.

Anonymous
Not applicable

Hi @ahmadammari 

We haven’t heard from you on the last response and was just checking back to see if you got a chance to open a support ticket.
After creating a Support ticket please provide the ticket number as it would help us to track for more information.

Thanks

https://github.com/MicrosoftDocs/fabric-docs/blob/main/docs/data-engineering/tutorial-lakehouse-data...

 

I had the same problem and in a search I found this tutorial from github and it helped me to proceed.
I hope it helps you.

Dear Community Support,

 

Thank you for your follow-up. I have opened a support ticket.

The ticket number is 2407030030009040.

Please let me know if there’s any additional information you need.

 

Best, 

Ahmad

Hi, have you received a response yet? Thanks.

Yes. I have received the following response from the Microsoft Support Team:

 

```
Dear Ahmad,

Thank you for reaching out to us regarding the issue with loading the Retail Data Model from Wide World Importers. I understand that you have been waiting for updates on this matter, and I apologize for any inconvenience caused.

After discussing with the product team and the Wide World Importers dataset owner, we have confirmed that the issue is related to a company-level security policy. Our team is actively working on the recovery process, but unfortunately, we do not have an estimated time of completion at this moment. As a result, we have decided to hide the Wide World Importers dataset on the data pipeline UI. 

I would like to suggest that you consider using other sample datasets available to you. We have a variety of datasets that you can choose from, which will still allow you to explore and work with our platform effectively.

If you have any further questions or need additional assistance, please don't hesitate to reach out to me. I will be more than happy to help you.

Thank you for your understanding and cooperation.

```

Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!

ArunFabCon

Microsoft Fabric Community Conference 2025

Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.

December 2024

A Year in Review - December 2024

Find out what content was popular in the Fabric community during 2024.