Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Be one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now

Reply
sreddy47652
Helper III
Helper III

Tableau to Power bi migration with large dataset

Hi Team, my user wants to migrate tableau to Power bi reports here are the requirements how we can handle huge volume of data in power bi, if we use dataflows in power bi, is that Dataflowas having capacity to handle 150 GB of data and what are prerequesties for pros and cons please share.

1. User recive csv files every week and the file size increase 5 gb of data by every week

2. They need to maintain atleast 1year of data which is 150GB

so how we can hanle this scnario in power bi please share your thoughts.

2 ACCEPTED SOLUTIONS
Ritaf1983
Super User
Super User

Hi @sreddy47652 

 

Migrating large-scale reports from Tableau to Power BI, especially with datasets as large as 150GB and growing, requires careful planning. Here are the key considerations for handling data at this scale in Power BI, along with the challenges you’ll face in terms of report migration, visualizations, and tool capabilities.

1. Requirement for Premium or Fabric Capacity

  • Premium or Fabric Capacity: Due to the size and frequency of your data updates (5GB per week), Power BI Pro or Premium Per User (PPU) is likely not sufficient. Power BI Premium or Fabric Capacity provides increased storage, larger model sizes, and better processing capabilities. This will allow you to manage large datasets, use features like Incremental Refresh, and connect to Dataflows efficiently.
  • Why Premium: Power BI Premium allows larger data models, increased refresh frequency, and processing power that are necessary for handling high data volumes.
  • Incremental Refresh: With Premium Capacity, you can configure Incremental Refresh to only load new or updated data, which is critical for managing a growing dataset of this size.

2. Managing Large Data Volumes in Power BI

  • Storage Strategy: Storing the full dataset (150GB) within Power BI’s memory may not be ideal. Consider using Azure Data Lake or Azure Synapse Analytics to store the raw data, then connect Power BI to this storage. This setup allows you to manage storage costs and optimize performance.
  • DirectQuery vs Import Mode: If the dataset is too large for Import mode, DirectQuery may be an option. DirectQuery fetches data directly from the source as needed, which can reduce memory requirements but may impact performance if queries are not optimized.

3. Differences in Visualization Capabilities

  • Power BI vs Tableau Visualizations: Power BI and Tableau have different visualization capabilities. While Power BI is highly capable, certain visualizations may need to be recreated or adapted, as there are some chart types and interactive elements in Tableau that may not have direct counterparts in Power BI.
  • Rebuilding Reports: This migration will not be a simple "lift-and-shift." Reports from Tableau will likely need to be rebuilt in Power BI. Consider this a redevelopment process where you may need to rethink certain visualizations to fit within Power BI’s strengths and adjust for any limitations.
  • Custom Visuals: If needed, Power BI offers a marketplace of custom visuals that might replicate specific Tableau visualizations. However, using custom visuals can sometimes affect report performance and add maintenance overhead.

4. Dataflows and ETL Considerations

  • Dataflows: Power BI Dataflows can help preprocess and transform data before it reaches the report layer, especially useful for ETL tasks. However, Dataflows themselves have size limits, so it’s essential to pair Dataflows with Premium or Fabric Capacity for managing large datasets.
  • External ETL Tools: You might also consider external ETL tools or services (such as Azure Synapse Pipelines or Azure Data Factory) for data transformation tasks, especially if you have complex transformations or need to aggregate data from multiple sources.

5. Pros and Cons of Moving to Power BI

  • Pros:
    • Power BI integrates well with other Microsoft tools and allows for extensive customization with DAX and Power Query.
    • Dataflows and Incremental Refresh are valuable features for managing data at scale.
    • Premium Capacity provides robust processing and storage options.
  • Cons:
    • Migration is a redevelopment process; some features in Tableau may not be replicated exactly in Power BI.
    • The cost of Premium Capacity can be significant, especially with high data storage needs.
    • Power BI has some visualization and interaction limitations compared to Tableau.

Suggested Approach for Migration

  1. Set up Azure Data Lake or Synapse to store your weekly CSV files as they come in. This will help you manage data growth without overwhelming Power BI’s memory.
  2. Create Dataflows in Power BI connected to your Azure storage, where you can apply transformations and use Incremental Refresh to handle new data.
  3. Rebuild Visualizations in Power BI, using Power BI’s built-in visuals and exploring custom visuals if necessary.
  4. Test Performance and Adjust: Given the differences between Power BI and Tableau, you may need to test and adjust performance, especially if using DirectQuery.

Migrating from Tableau to Power BI for large datasets like this involves some trade-offs and considerations. While Power BI is highly capable, this process will require a structured approach to data storage, capacity planning, and visualization adaptation.

Let us know if you have specific questions on any of these points or need further assistance with setup and migration planning.

If this post helps, then please consider Accepting it as the solution to help the other members find it more quickly

Regards,
Rita Fainshtein | Microsoft MVP
https://www.linkedin.com/in/rita-fainshtein/
Blog : https://www.madeiradata.com/profile/ritaf/profile

View solution in original post

grazitti_sapna
Solution Supplier
Solution Supplier

Hi @sreddy47652, Handling large datasets like 150GB in Power BI requires strategic planning. Here are some suggestions and considerations:

1. Dataflows and Storage Capacity

  • Power BI Dataflows allow you to perform ETL processes and store data in Azure Data Lake Gen2. However, managing such a large volume (150GB) with weekly increments needs careful planning:
    • Storage Limitations: Power BI Pro accounts have limitations on storage and refresh sizes. Power BI Premium allows higher storage limits, up to 100 TB in the Premium Capacity. A P1 or higher SKU would be suitable to handle your large dataset.
    • Dataflow Performance: You can use incremental refresh in Dataflows to optimize performance and only refresh new or updated data weekly. This will help manage the increasing dataset size.

2. Incremental Refresh

  • Purpose: Incremental refresh allows loading and processing only new or changed data, minimizing processing time and resource consumption. This is crucial for weekly increasing data files.
  • Setup: Define partitions based on your update frequency (weekly). This way, Power BI only processes the new week's data rather than the entire dataset each refresh cycle.

3. DirectQuery Mode

  • When to Use: If your dataset is too large for Import mode, consider using DirectQuery with a performant database like Azure Synapse, SQL Database, or another scalable cloud database. This allows Power BI to query the database without loading the entire dataset into memory.
  • Pros: No size limitations; real-time data access.
  • Cons: Slower visualizations compared to Import mode due to database query time.

4. Aggregations

  • For reports and visuals requiring summaries or rollups of data, create aggregation tables in Power BI. This reduces the size of the data being processed and speeds up visual rendering.
  • Aggregations work well in scenarios where users don’t need to see the full dataset in each report.

5. Data Lake and External Storage

  • Store the raw CSV files in Azure Data Lake Gen2 or Blob Storage. Use Power BI Dataflows or Azure Data Factory to process and load only the necessary columns and rows. This keeps your Power BI model efficient.
  • Consider splitting your data storage into hot and cold tiers, with older, less-accessed data kept in cheaper storage options.

6. Prerequisites & Licensing

  • Licenses: Ensure you have Power BI Premium capacity (P1 or higher) for the dataset size. For Pro licenses, the data model cannot exceed 1GB, so it’s not suitable for 150GB.
  • Storage Considerations: You’ll need Azure Data Lake for external storage and potentially Azure SQL Database for optimized query performance.
  • Data Model Design: Design your data model efficiently to avoid unnecessary columns or tables.

Pros and Cons of Using Power BI for Large Datasets:

Pros:

  • Scalability with Premium: Supports large datasets with efficient data refresh using incremental load.
  • Advanced AI & Analytics: Leverage built-in AI capabilities for predictive analytics and trend identification.
  • Better Integration with Microsoft Ecosystem: Seamless integration with Azure Data Lake, Azure Synapse, etc.
  • Improved Cost Management: Power BI can provide a cost-effective solution when strategically using Dataflows, Data Lake, and incremental refresh.

Cons:

  • DirectQuery Performance: May face slower query times if the underlying database or query design isn’t optimized.
  • Learning Curve: If users are moving from Tableau to Power BI, there might be a learning curve in handling large datasets effectively.

Recommended Actions

  • Implement incremental refresh with Dataflows.
  • Leverage aggregation tables to reduce query time.
  • Consider using DirectQuery for the database if 150GB exceeds Power BI memory capacity.
  • Store raw data externally in Azure Data Lake and use efficient ETL techniques.

This approach should provide a scalable, efficient way to manage large datasets within Power BI.

If I have resolved your question, please consider marking my post as a solution🎉. Thank you!

View solution in original post

3 REPLIES 3
grazitti_sapna
Solution Supplier
Solution Supplier

Hi @sreddy47652, Handling large datasets like 150GB in Power BI requires strategic planning. Here are some suggestions and considerations:

1. Dataflows and Storage Capacity

  • Power BI Dataflows allow you to perform ETL processes and store data in Azure Data Lake Gen2. However, managing such a large volume (150GB) with weekly increments needs careful planning:
    • Storage Limitations: Power BI Pro accounts have limitations on storage and refresh sizes. Power BI Premium allows higher storage limits, up to 100 TB in the Premium Capacity. A P1 or higher SKU would be suitable to handle your large dataset.
    • Dataflow Performance: You can use incremental refresh in Dataflows to optimize performance and only refresh new or updated data weekly. This will help manage the increasing dataset size.

2. Incremental Refresh

  • Purpose: Incremental refresh allows loading and processing only new or changed data, minimizing processing time and resource consumption. This is crucial for weekly increasing data files.
  • Setup: Define partitions based on your update frequency (weekly). This way, Power BI only processes the new week's data rather than the entire dataset each refresh cycle.

3. DirectQuery Mode

  • When to Use: If your dataset is too large for Import mode, consider using DirectQuery with a performant database like Azure Synapse, SQL Database, or another scalable cloud database. This allows Power BI to query the database without loading the entire dataset into memory.
  • Pros: No size limitations; real-time data access.
  • Cons: Slower visualizations compared to Import mode due to database query time.

4. Aggregations

  • For reports and visuals requiring summaries or rollups of data, create aggregation tables in Power BI. This reduces the size of the data being processed and speeds up visual rendering.
  • Aggregations work well in scenarios where users don’t need to see the full dataset in each report.

5. Data Lake and External Storage

  • Store the raw CSV files in Azure Data Lake Gen2 or Blob Storage. Use Power BI Dataflows or Azure Data Factory to process and load only the necessary columns and rows. This keeps your Power BI model efficient.
  • Consider splitting your data storage into hot and cold tiers, with older, less-accessed data kept in cheaper storage options.

6. Prerequisites & Licensing

  • Licenses: Ensure you have Power BI Premium capacity (P1 or higher) for the dataset size. For Pro licenses, the data model cannot exceed 1GB, so it’s not suitable for 150GB.
  • Storage Considerations: You’ll need Azure Data Lake for external storage and potentially Azure SQL Database for optimized query performance.
  • Data Model Design: Design your data model efficiently to avoid unnecessary columns or tables.

Pros and Cons of Using Power BI for Large Datasets:

Pros:

  • Scalability with Premium: Supports large datasets with efficient data refresh using incremental load.
  • Advanced AI & Analytics: Leverage built-in AI capabilities for predictive analytics and trend identification.
  • Better Integration with Microsoft Ecosystem: Seamless integration with Azure Data Lake, Azure Synapse, etc.
  • Improved Cost Management: Power BI can provide a cost-effective solution when strategically using Dataflows, Data Lake, and incremental refresh.

Cons:

  • DirectQuery Performance: May face slower query times if the underlying database or query design isn’t optimized.
  • Learning Curve: If users are moving from Tableau to Power BI, there might be a learning curve in handling large datasets effectively.

Recommended Actions

  • Implement incremental refresh with Dataflows.
  • Leverage aggregation tables to reduce query time.
  • Consider using DirectQuery for the database if 150GB exceeds Power BI memory capacity.
  • Store raw data externally in Azure Data Lake and use efficient ETL techniques.

This approach should provide a scalable, efficient way to manage large datasets within Power BI.

If I have resolved your question, please consider marking my post as a solution🎉. Thank you!

shafiz_p
Resident Rockstar
Resident Rockstar

Hi @sreddy47652 

Here are some steps and considerations :

  • Power BI Dataflows can handle large datasets, but the capacity depends on the Power BI service plan.  For large datasets like 150 GB, you might need a Power BI Premium capacity, which offers higher storage and better performance. 
  • Use incremental refresh to manage large datasets efficiently.  This feature allows you to refresh only the new or changed data, reducing the load and time required for data processing.
  • Consider using Azure Data Lake Storage for storing large volumes of data.  Power BI can connect to Azure Data Lake, providing a scalable and cost-effective storage solution.
  • Clean and preprocess your data before importing it into Power BI. This can involve removing duplicates, handling missing values, and ensuring data consistency.
  • Connect your data sources (databases, files, etc.) to Power BI. Use Power BI Dataflows to manage and transform your data.
  • Rebuild your dashboards and reports in Power BI using its features and capabilities.

 

 

Hope this helps!!

Ritaf1983
Super User
Super User

Hi @sreddy47652 

 

Migrating large-scale reports from Tableau to Power BI, especially with datasets as large as 150GB and growing, requires careful planning. Here are the key considerations for handling data at this scale in Power BI, along with the challenges you’ll face in terms of report migration, visualizations, and tool capabilities.

1. Requirement for Premium or Fabric Capacity

  • Premium or Fabric Capacity: Due to the size and frequency of your data updates (5GB per week), Power BI Pro or Premium Per User (PPU) is likely not sufficient. Power BI Premium or Fabric Capacity provides increased storage, larger model sizes, and better processing capabilities. This will allow you to manage large datasets, use features like Incremental Refresh, and connect to Dataflows efficiently.
  • Why Premium: Power BI Premium allows larger data models, increased refresh frequency, and processing power that are necessary for handling high data volumes.
  • Incremental Refresh: With Premium Capacity, you can configure Incremental Refresh to only load new or updated data, which is critical for managing a growing dataset of this size.

2. Managing Large Data Volumes in Power BI

  • Storage Strategy: Storing the full dataset (150GB) within Power BI’s memory may not be ideal. Consider using Azure Data Lake or Azure Synapse Analytics to store the raw data, then connect Power BI to this storage. This setup allows you to manage storage costs and optimize performance.
  • DirectQuery vs Import Mode: If the dataset is too large for Import mode, DirectQuery may be an option. DirectQuery fetches data directly from the source as needed, which can reduce memory requirements but may impact performance if queries are not optimized.

3. Differences in Visualization Capabilities

  • Power BI vs Tableau Visualizations: Power BI and Tableau have different visualization capabilities. While Power BI is highly capable, certain visualizations may need to be recreated or adapted, as there are some chart types and interactive elements in Tableau that may not have direct counterparts in Power BI.
  • Rebuilding Reports: This migration will not be a simple "lift-and-shift." Reports from Tableau will likely need to be rebuilt in Power BI. Consider this a redevelopment process where you may need to rethink certain visualizations to fit within Power BI’s strengths and adjust for any limitations.
  • Custom Visuals: If needed, Power BI offers a marketplace of custom visuals that might replicate specific Tableau visualizations. However, using custom visuals can sometimes affect report performance and add maintenance overhead.

4. Dataflows and ETL Considerations

  • Dataflows: Power BI Dataflows can help preprocess and transform data before it reaches the report layer, especially useful for ETL tasks. However, Dataflows themselves have size limits, so it’s essential to pair Dataflows with Premium or Fabric Capacity for managing large datasets.
  • External ETL Tools: You might also consider external ETL tools or services (such as Azure Synapse Pipelines or Azure Data Factory) for data transformation tasks, especially if you have complex transformations or need to aggregate data from multiple sources.

5. Pros and Cons of Moving to Power BI

  • Pros:
    • Power BI integrates well with other Microsoft tools and allows for extensive customization with DAX and Power Query.
    • Dataflows and Incremental Refresh are valuable features for managing data at scale.
    • Premium Capacity provides robust processing and storage options.
  • Cons:
    • Migration is a redevelopment process; some features in Tableau may not be replicated exactly in Power BI.
    • The cost of Premium Capacity can be significant, especially with high data storage needs.
    • Power BI has some visualization and interaction limitations compared to Tableau.

Suggested Approach for Migration

  1. Set up Azure Data Lake or Synapse to store your weekly CSV files as they come in. This will help you manage data growth without overwhelming Power BI’s memory.
  2. Create Dataflows in Power BI connected to your Azure storage, where you can apply transformations and use Incremental Refresh to handle new data.
  3. Rebuild Visualizations in Power BI, using Power BI’s built-in visuals and exploring custom visuals if necessary.
  4. Test Performance and Adjust: Given the differences between Power BI and Tableau, you may need to test and adjust performance, especially if using DirectQuery.

Migrating from Tableau to Power BI for large datasets like this involves some trade-offs and considerations. While Power BI is highly capable, this process will require a structured approach to data storage, capacity planning, and visualization adaptation.

Let us know if you have specific questions on any of these points or need further assistance with setup and migration planning.

If this post helps, then please consider Accepting it as the solution to help the other members find it more quickly

Regards,
Rita Fainshtein | Microsoft MVP
https://www.linkedin.com/in/rita-fainshtein/
Blog : https://www.madeiradata.com/profile/ritaf/profile

Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!

Dec Fabric Community Survey

We want your feedback!

Your insights matter. That’s why we created a quick survey to learn about your experience finding answers to technical questions.

ArunFabCon

Microsoft Fabric Community Conference 2025

Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.

December 2024

A Year in Review - December 2024

Find out what content was popular in the Fabric community during 2024.