Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Enhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends August 31st. Request your voucher.

Reply
DebbieE
Community Champion
Community Champion

Data Warehouse. Copy data from DL with pipeline or create an external table to the files in the DL.

Hi,

 

We have a data lake containing our raw data.

Previously, using a database we would use Data Factory to copy the files into the SQL DB so we could then work with them. So we would create a copy and then transform.

 

With Synapse, you can create an External table so you can use the data in the Data Lake as if it were a table in the DW.

 

So I think Pros of still copying the data into the Warehouse is that you get that data and if anything happens to the Data lake files, it doesnt appect what you are doing.

 

Pros for the External Table link up. You dont need the copy of the data in you SQL DW.

 

can any one think of more pros and cons for each of these possibilities. i love the idea of not having to copy the data in using a pipeline, but there is something about that copy securing you from Data lake issues 

1 ACCEPTED SOLUTION
Anonymous
Not applicable

Hi @DebbieE 

 

Here are some possible pros and cons for each approach:

Copying Data into the Data Warehouse

Pros:

  1. Data Security and Integrity: Having a copy of the data in the DW ensures that your operations are not affected if there are issues with the Data Lake.
  2. Performance: Queries might run faster since the data is stored locally in the DW, reducing latency.
  3. Data Transformation: Easier to perform complex transformations and indexing within the DW.

Cons:

  1. Storage Costs: Additional storage is required in the DW to hold the copied data when the source files are stored out of OneLake. 
  2. Data Freshness: There might be a lag between the data in the Data Lake and the DW, depending on how frequently the data is copied.
  3. Maintenance Overhead: Requires additional pipelines and processes to keep the data in sync between the Data Lake and the DW.

Using External Tables

Pros:

  1. No Data Duplication: Eliminates the need to store duplicate data, saving on storage costs.
  2. Real-Time Access: Provides real-time access to the latest data in the Data Lake.
  3. Simplified Architecture: Reduces the complexity of data pipelines and maintenance.
  4. Cost Efficiency: Potentially lower costs as you avoid the need for additional storage and data movement.

Cons:

  1. Performance: Query performance might be slower due to the data being accessed from the Data Lake rather than locally.
  2. Data Availability: If the Data Lake is unavailable or has issues, it directly impacts your ability to query the data.
  3. Complex Transformations: Performing complex transformations might be more challenging compared to having the data in the DW.
  4. Security Concerns: Depending on your setup, there might be additional security considerations when accessing data directly from the Data Lake.

Additional Considerations

  • Hybrid Approach: Some organizations use a hybrid approach, where critical data is copied to the DW for performance and reliability, while less critical data is accessed via External Tables.
  • Scalability: Evaluate how each approach scales with your data growth and query demands.

 

Hope this would be helpful!

 

Best Regards,
Jing
If this post helps, please Accept it as Solution to help other members find it. Appreciate your Kudos!

View solution in original post

1 REPLY 1
Anonymous
Not applicable

Hi @DebbieE 

 

Here are some possible pros and cons for each approach:

Copying Data into the Data Warehouse

Pros:

  1. Data Security and Integrity: Having a copy of the data in the DW ensures that your operations are not affected if there are issues with the Data Lake.
  2. Performance: Queries might run faster since the data is stored locally in the DW, reducing latency.
  3. Data Transformation: Easier to perform complex transformations and indexing within the DW.

Cons:

  1. Storage Costs: Additional storage is required in the DW to hold the copied data when the source files are stored out of OneLake. 
  2. Data Freshness: There might be a lag between the data in the Data Lake and the DW, depending on how frequently the data is copied.
  3. Maintenance Overhead: Requires additional pipelines and processes to keep the data in sync between the Data Lake and the DW.

Using External Tables

Pros:

  1. No Data Duplication: Eliminates the need to store duplicate data, saving on storage costs.
  2. Real-Time Access: Provides real-time access to the latest data in the Data Lake.
  3. Simplified Architecture: Reduces the complexity of data pipelines and maintenance.
  4. Cost Efficiency: Potentially lower costs as you avoid the need for additional storage and data movement.

Cons:

  1. Performance: Query performance might be slower due to the data being accessed from the Data Lake rather than locally.
  2. Data Availability: If the Data Lake is unavailable or has issues, it directly impacts your ability to query the data.
  3. Complex Transformations: Performing complex transformations might be more challenging compared to having the data in the DW.
  4. Security Concerns: Depending on your setup, there might be additional security considerations when accessing data directly from the Data Lake.

Additional Considerations

  • Hybrid Approach: Some organizations use a hybrid approach, where critical data is copied to the DW for performance and reliability, while less critical data is accessed via External Tables.
  • Scalability: Evaluate how each approach scales with your data growth and query demands.

 

Hope this would be helpful!

 

Best Regards,
Jing
If this post helps, please Accept it as Solution to help other members find it. Appreciate your Kudos!

Helpful resources

Announcements
Join our Fabric User Panel

Join our Fabric User Panel

This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.

June FBC25 Carousel

Fabric Monthly Update - June 2025

Check out the June 2025 Fabric update to learn about new features.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.