The ultimate Fabric, Power BI, SQL, and AI community-led learning event. Save €200 with code FABCOMM.
Get registeredCompete to become Power BI Data Viz World Champion! First round ends August 18th. Get started.
Hi All,
I am new to PBI and learning it. I am required to create a PBI dashboard from the excel file I get from vendor. I did in the following way
1. I usually get 3 excel files(3 channels, so 3 files) month. Each excel has 200,00-250,00 rows and 33 columns.
2. I removed the unwanted columns and appened the data for 5 monnths. Now the data has 3 million rows and 27 columns.
3. To create a dimension table, I reference the appended table created in (process 2), selected the columns (lets say Maufacturer, brand, product) and called in Dim 1 table. Removed other columns, then removed duplicates data and created index column based on all three columns.
4. I merged the table (created in process 2) with Dim 1 table and connecting values were columns, Maufactuere, Brand and Products.
5. Expanded the merged table and selected only the column I wanted wich is the index value and deleted the unwanted column. Called this Fact Table.
I did this other columns as well. Now the problem is that the size is so big that even a simple query table more than10-15 mins to execute and merging, expanding table and selecting the columns each steps take more than 45 min - 1 hr. There was some errors I corrected them from Advance Editors and again it took about 1 hr to execute.
This is just 5 months of data, I again need another 7 months of data. I am not starting to believe that loading excel files like this is not a good process there are big file.
After closing and applying the PBI query editor it take another 1-2 hours to load the data. I have disable loading of unwated tables by disabling the load. During the buffering process, I see all the table that are loaded and buffering and files size for each table keep on increasing beyond 3gb for each table.
Is there any good way to do this instead of loading 3 excel files each month. My vendor has not provided the api and this data does not reside in our data lake, data warehouse.
I am using HP Laptop, Core i5, 8th gen, 8gb ram given by company.
Thank you.
Hey @Mkrishna
The issue you're facing with large Excel files and long loading times in Power BI is not uncommon. Working with such massive datasets can be challenging, especially on a laptop with limited resources. Maybe you can consider use Python for ETL (Extract, Transform, Load) processes before connecting the transformed data to Power BI. This is a viable and often an excellent solution, especially when dealing with complex data transformations, large datasets, or when you need to integrate data from multiple sources. Here's how you can approach it:
Extract: In Python, you can use libraries like Pandas or PySpark to extract data from various sources, including Excel files, databases, APIs, or other data formats.
Transform: Python provides powerful libraries and tools for data transformation, cleaning, and manipulation. You can perform complex data transformations, aggregations, calculations, and data cleansing using Pandas, NumPy, or custom Python scripts.
Load: After transforming the data, you can save it into a format that is easily accessible by Power BI. Common formats include CSV, Excel, or a database. You can also consider using Parquet or other columnar storage formats for better query performance.
Connect to Power BI: In Power BI, you can connect to the transformed data source (e.g., the generated CSV or Excel files). Power BI can work efficiently with these files, and you can create your reports, dashboards, and data models based on this transformed data.
Benefits of this approach:
Data Quality: Python allows you to perform rigorous data cleaning and quality checks, ensuring that the data is accurate and reliable before it reaches Power BI.
Complex Transformations: Python provides extensive libraries for complex data transformations and calculations, which can be challenging to achieve within Power BI's Power Query Editor.
Performance: By pre-aggregating and optimizing data in Python, you can reduce the processing load on Power BI, leading to faster report rendering and better overall performance.
Flexibility: Python gives you greater flexibility in handling various data sources, formats, and complex business logic, which might not be achievable directly within Power BI.
Automation: You can automate the ETL process in Python, scheduling it to run at specific intervals or in response to new data arrivals.
However, it's important to keep in mind that this approach adds complexity to your data pipeline. You'll need to manage the Python ETL process, ensure data consistency, and monitor for any issues. Additionally, it requires Python programming skills and knowledge of data manipulation libraries.
In summary, using Python for ETL before connecting the transformed data to Power BI can be a robust and efficient solution, particularly for handling large or complex datasets. It allows you to leverage the strengths of both tools for optimal data processing and reporting.
Kind Regards,
Marcel
Regards,
Marcel Magalhães
Microsoft Power BI Official Partner
MCT | Certified PL-300 Power BI
@marcelsmaglhaes
Thank you for the suggestion.
The problem is I work in Sales department and my IT does not allow us to use Python. Is there any other way.
I create data flow in PBI Online but due to the large size of the file, and PBI only allows for 10 min to query evaluation, I get evaluation cancelled message.
IS there any other way