How do 'Copy Data' activities actually convert dat...

IntegrateGuru · ‎10-22-2024

For clarity - in this thread I am referring to the 'copy data' pipeline activity from within a data pipeline, within Fabric.

In my quest to figure out how copy data type conversions are actually done, I stumbled upon this: https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-schema-and-type-mapping#data-type...

Which only displays a matrix for which interim data types are compatible with other interim data types? Which seems completely useless because you can't actually tell the copy data activity which interim data type to use, it automatically selects it based on the source logical and physical data types.

My questions are:
1.) Are there SQL or Parquet or <other> type crosswalks that show which data types will be automatically mapped to which 'interim' data type?

2.) Are these interim data types actually defined anywhere?

3.) Is there anyway to override a interim type mapping on either source/sink sides?

4.) Is there a way to modify the conversion process? (ex. I want to covert a SQL Time (Timespan interim type) to an Integer which is miliseconds since 00:00:00.0) directly through the sink/source configurations?

5.) Is there any documentation on the "translator" object, or the properties for the source/sink mapping objects that are viewable from View -> Edit JSON Code?

SuryaTejaK · ‎11-19-2024

Hi @IntegrateGuru

below may work-

Definition of Interim Data Types:

Interim data types are not explicitly defined in the documentation. They are used internally by the system to facilitate data type conversion during the copy process.

Overriding Interim Type Mapping:

You can override type mappings by configuring the type conversion settings in the Mappings tab of the Copy activity

Documentation on the “Translator” Object:

Detailed documentation on the “translator” object or properties for source/sink mapping objects is not readily available. The JSON code view allows you to see and edit the configurations, but specific properties related to the translator object are not well-documented.

Modifying the Conversion Process:

To modify the conversion process, you can use the type conversion settings in the Mappings tab. For example, you can specify custom formats for date and time conversions.

IntegrateGuru · ‎11-21-2024

Definition of Interim Data Types:
Interim data types are not explicitly defined in the documentation. They are used internally by the system to facilitate data type conversion during the copy process.

Those interim data types should be documented - especially because some interim types are incompatible with certain source/destination types.

Overriding Interim Type Mapping:
You can override type mappings by configuring the type conversion settings in the Mappings tab of the Copy activity

That is not the interim types. That is the source/sink types. I understand you can configure the source/sink data types.

Documentation on the “Translator” Object:
Detailed documentation on the “translator” object or properties for source/sink mapping objects is not readily available. The JSON code view allows you to see and edit the configurations, but specific properties related to the translator object are not well-documented.

There should be documentation on this. These are class representations, I don't see why we wouldn't be able to view the class methods and properties - at least the ones that are accessible to the Pipeline. Otherwise, there is no point in viewing the JSON. Might as well just remove it.

Modifying the Conversion Process:
To modify the conversion process, you can use the type conversion settings in the Mappings tab. For example, you can specify custom formats for date and time conversions.

This is half-true, there are *some* configurations you may set - but those only work for certain source types. If I have an unusual type - right now I have no choice but to manually use a SQL Convert on the source, putting extra load on my SQL server. I shouldn't have to - Fabric should be more than capable of handling any SQL data type.

Anonymous · ‎10-23-2024

Hi @IntegrateGuru,

Since fabric data pipeline also reference from azure synapse features, I think you can take a look azure synapse pipeline and the data type mapping in the copy data activity.

Copy activity - Azure Data Factory & Azure Synapse | Microsoft Learn

Map Data in Azure Synapse Analytics - Azure Synapse Analytics | Microsoft Learn

For override data types, you can add a step in pipeline to modify the previous step definitions.

Regards,

Xiaoxin Sheng

IntegrateGuru · ‎10-23-2024

Thanks for your reply, I had already been looking at this documentation, but can't find answers to any of the above questions.

As to the suggestion of having multiple steps in the pipeline, that could work but I'd like to try to understand how the copy data activity is doing its transformations. It is not ideal or efficient to read/write the same data multiple times. If I could, I would do all my data ingestion and transformation via notebook where my toolset is not as limited.

Anonymous · ‎11-19-2024

Hi @IntegrateGuru,

Any update on this? Did these suggestion helps for you?

Regards,

Xiaoxin Sheng

IntegrateGuru · ‎11-19-2024

No, the interim data types remain a completely undocumented enigma, which makes it overly complicated to perform valid type mappings. I did find some hidden properties through trial and error, but nothing that has resolved any of these questions.

Anonymous · ‎10-24-2024

Hi @IntegrateGuru,

#1, In fact, I also not found some related documents mention about these.

You can try to check on azure synapse document or contact to azure support to ask for some detail information if you can find this definition on fabric side. (fabric synapse features have reference with azure synapse features)

#2, current customize the field type does not include in this, you can try to submit an idea for add new features to improving current activity usages.

Microsoft Fabric Ideas

Regards,

Xiaoxin Sheng