March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early Bird pricing ends December 9th.
Register NowGet certified in Microsoft Fabric—for free! For a limited time, the Microsoft Fabric Community team will be offering free DP-600 exam vouchers. Prepare now
Hi all,
I'm relatively new to Fabric and I'm trying to build out a POC to demonstrate a requirement for taking data from our Dynamics365 environments and other sytems, populating a lakehouse and moving the data through a medallion architecture.
I have set myslf up with a new workspace and provisioned bronze and silver lakehouses. In the bronze lakehouse I have created a shortcut to some CRM data in an ADL Gen2 storage container that was populated using Azure Synapse Link, all of which has been pretty straight forward!
What I am struggling with is then moving that raw data into the Silver lakehouse as Delta tables. The csv files don't have any column headers and I can't seem to figure out a way to get them.
In Azure Synapse I can create a Dataflow that connects to the data lake database and then reads the files in table format and allows me to process them as I wish.
Is there a way of doing this in Fabric?
Many Thanks!
Solved! Go to Solution.
In Fabric, the Pipeline Copy Data activity has both a Dataverse and Dynamics CRM connector
You'll probably need to set up a gateway (but if you've got Synapse talking to D365, this shouldn't be a hardship)
Hi @eyeballkid the problem here is that as part of Synapse Link, the process sets up a lake database in Synapse and synchronises the dynamics table and column metadata which casts structure over the exported CSV files. If in Fabric you shortcut to and query these csv files directly then there is no metadata in the csv files.
Is there a possibility of using Fabric Link instead? This will create a lakehouse in a workspace and synchronise the metadata https://learn.microsoft.com/en-us/power-apps/maker/data-platform/azure-synapse-link-view-in-fabric
Be warned that the data itself is exported to delta parquet and stored in Dataverse database storage.
Hi @eyeballkid ,
Thanks for the reply from spencer_sa / AndyDDC .
I have a suggestion for your “csv file doesn't have any column headers” problem.
First, open a notebook:
I can reproduce your csv file without any column headers.
Please note that the column headers contain the first row of the data. To correct this issue, you need to modify the first line of the code as follows:
df = spark.read.format(“csv”).option(“header”, “false”).load(“Files/orders/2019.csv”)
display(df)
Column names have now been changed to _c0, _c1, etc.
Descriptive column names help you make sense of your data. To create meaningful column names, you need to define the schema and data types. You also need to import a set of standard Spark SQL types to define the data types. Replace the existing code with the following:
from pyspark.sql.types import *
orderSchema = StructType([
StructField(“SalesOrderNumber”, StringType()),
StructField(“SalesOrderLineNumber”, IntegerType()),
StructField(“OrderDate”, DateType()),
StructField(“CustomerName”, StringType()),
StructField(“Email”, StringType()),
StructField(“Item”, StringType()),
StructField(“Quantity”, IntegerType()),
StructField(“UnitPrice”, FloatType()), StructField(“Tax”, FloatType()), StructField(“Tax”, FloatType())
StructField(“Tax”, FloatType()))
])
df = spark.read.format(“csv”).schema(orderSchema).load(“Files/orders/2019.csv”)
display(df)
Save it as a delta table using the following code:
df.write.format(“delta”).saveAsTable(“test”)
The final display looks like below and now you can use the table.
In Fabric you can also create a dataflow that reads CSV files and processes them as needed.
You can specify column names and data types for CSV files, and use dataflows to transform raw data as needed.
After the data transformation is complete, you can use the dataflow's output settings to load the table into Silver Lakehouse.
After publish you can go to the target Lakehouse and see that the table has been successfully loaded over.
You can also schedule the dataflow to run at specified intervals, which will automate the transfer of data from the bronze Lakehouse to the silver Lakehouse by updating it periodically.
If you have any other questions please feel free to contact me.
Best Regards,
Yang
Community Support Team
If there is any post helps, then please consider Accept it as the solution to help the other members find it more quickly.
If I misunderstand your needs or you still have problems on it, please feel free to let us know. Thanks a lot!
Many thanks to everyone for all the really helpful replies. For the purposes of my testing I went straight to the dataverse which provided my with column headers but there will be instances where I need to create my own schemas so that's very useful as well.
Hi @eyeballkid ,
Thanks for the reply from spencer_sa / AndyDDC .
I have a suggestion for your “csv file doesn't have any column headers” problem.
First, open a notebook:
I can reproduce your csv file without any column headers.
Please note that the column headers contain the first row of the data. To correct this issue, you need to modify the first line of the code as follows:
df = spark.read.format(“csv”).option(“header”, “false”).load(“Files/orders/2019.csv”)
display(df)
Column names have now been changed to _c0, _c1, etc.
Descriptive column names help you make sense of your data. To create meaningful column names, you need to define the schema and data types. You also need to import a set of standard Spark SQL types to define the data types. Replace the existing code with the following:
from pyspark.sql.types import *
orderSchema = StructType([
StructField(“SalesOrderNumber”, StringType()),
StructField(“SalesOrderLineNumber”, IntegerType()),
StructField(“OrderDate”, DateType()),
StructField(“CustomerName”, StringType()),
StructField(“Email”, StringType()),
StructField(“Item”, StringType()),
StructField(“Quantity”, IntegerType()),
StructField(“UnitPrice”, FloatType()), StructField(“Tax”, FloatType()), StructField(“Tax”, FloatType())
StructField(“Tax”, FloatType()))
])
df = spark.read.format(“csv”).schema(orderSchema).load(“Files/orders/2019.csv”)
display(df)
Save it as a delta table using the following code:
df.write.format(“delta”).saveAsTable(“test”)
The final display looks like below and now you can use the table.
In Fabric you can also create a dataflow that reads CSV files and processes them as needed.
You can specify column names and data types for CSV files, and use dataflows to transform raw data as needed.
After the data transformation is complete, you can use the dataflow's output settings to load the table into Silver Lakehouse.
After publish you can go to the target Lakehouse and see that the table has been successfully loaded over.
You can also schedule the dataflow to run at specified intervals, which will automate the transfer of data from the bronze Lakehouse to the silver Lakehouse by updating it periodically.
If you have any other questions please feel free to contact me.
Best Regards,
Yang
Community Support Team
If there is any post helps, then please consider Accept it as the solution to help the other members find it more quickly.
If I misunderstand your needs or you still have problems on it, please feel free to let us know. Thanks a lot!
Hi @eyeballkid the problem here is that as part of Synapse Link, the process sets up a lake database in Synapse and synchronises the dynamics table and column metadata which casts structure over the exported CSV files. If in Fabric you shortcut to and query these csv files directly then there is no metadata in the csv files.
Is there a possibility of using Fabric Link instead? This will create a lakehouse in a workspace and synchronise the metadata https://learn.microsoft.com/en-us/power-apps/maker/data-platform/azure-synapse-link-view-in-fabric
Be warned that the data itself is exported to delta parquet and stored in Dataverse database storage.
In Fabric, the Pipeline Copy Data activity has both a Dataverse and Dynamics CRM connector
You'll probably need to set up a gateway (but if you've got Synapse talking to D365, this shouldn't be a hardship)
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early Bird pricing ends December 9th.
Check out the October 2024 Fabric update to learn about new features.
User | Count |
---|---|
13 | |
8 | |
5 | |
4 | |
2 |
User | Count |
---|---|
26 | |
23 | |
15 | |
12 | |
5 |