The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredEnhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.
Hello people,
I have a requirement where I want to load the data from a csv file in Lakehouse to a table in Warehouse. To do that, I created a structure similar to the one below in the warehouse.
Solved! Go to Solution.
Hi pavannarani,
When files are uploaded to the Lakehouse, they are generally stored in the Delta Lake format, which is based on Parquet for optimised storage and querying. Utilising a standardised format such as Parquet ensures that all engines within the Fabric ecosystem can access and process the same dataset without the need for data duplication.
Parquet is a columnar storage format that is highly efficient for analytical queries; however, it may interpret data types differently from traditional CSV files. This discrepancy might be causing the DeviceID to be interpreted as a string instead of an integer, thereby leading to the compatibility error during the loading process.
Instead of employing staging tables, it is recommended to utilise Data Flows to transform the data during the loading process. Data Flows can be designed to read data from the Lakehouse, perform the necessary transformations, and directly load it into the Warehouse without the need for intermediate staging.
Microsoft frequently updates its services, and it is possible that future releases will offer more seamless integration and improved options for loading data between the Lakehouse and Warehouse.
We also recommend providing feedback to Microsoft regarding your specific requirements. Microsoft actively considers user feedback while designing its product roadmap. You may submit your feature requests or feedback through the following link:Home
If you find the response helpful, we kindly request you to mark it as the accepted solution and provide kudos. This will greatly assist other community members encountering similar challenges.
Thank you.
Hi pavannarani,
When files are uploaded to the Lakehouse, they are generally stored in the Delta Lake format, which is based on Parquet for optimised storage and querying. Utilising a standardised format such as Parquet ensures that all engines within the Fabric ecosystem can access and process the same dataset without the need for data duplication.
Parquet is a columnar storage format that is highly efficient for analytical queries; however, it may interpret data types differently from traditional CSV files. This discrepancy might be causing the DeviceID to be interpreted as a string instead of an integer, thereby leading to the compatibility error during the loading process.
Instead of employing staging tables, it is recommended to utilise Data Flows to transform the data during the loading process. Data Flows can be designed to read data from the Lakehouse, perform the necessary transformations, and directly load it into the Warehouse without the need for intermediate staging.
Microsoft frequently updates its services, and it is possible that future releases will offer more seamless integration and improved options for loading data between the Lakehouse and Warehouse.
We also recommend providing feedback to Microsoft regarding your specific requirements. Microsoft actively considers user feedback while designing its product roadmap. You may submit your feature requests or feedback through the following link:Home
If you find the response helpful, we kindly request you to mark it as the accepted solution and provide kudos. This will greatly assist other community members encountering similar challenges.
Thank you.
@v-pnaroju-msft Thank you so much for your response.
I am glad there is a way to load data in the warehouse. But let's say I have raw data in the lakehouse as CSV files. To load them into the warehouse, I need to perform an extra step to load into staging tables. I am sure this approach will work. I want to know if Lakehouse behaves like this by default; whatever file we put in Lakehouse, does it treat it as a parquet file? If so, is there any reason why it is doing like this?
Is there any other way to load data directly to the warehouse without using the staging step? I thought of using notebooks, but still using notebooks, we can't load data into the warehouse; even if we load into Lakehouse tables, the structure will be different for every CSV file, so it's not possible to automate the solution.
Is this planned in the future roadmap
Hi @pavannarani,
We appreciate your query through the Microsoft Fabric Community Forum.
Based on the details shared, the error message indicates that the DeviceID column of type INT is incompatible with the external data type Parquet physical type: BYTE_ARRAY, logical type: UTF8. This issue arises because the Parquet format utilised in Lakehouse is interpreting the DeviceID as a string (UTF8), which is not directly compatible with the INT type in the Warehouse table.
Please find below the steps to troubleshoot the issue:
If you find the response helpful, we kindly request you to mark it as the accepted solution and provide kudos. This will greatly assist other community members encountering similar challenges.
Thank you.