Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Be one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now

Reply
Dave1mo1
Helper II
Helper II

Most Efficient Way to Create Buckets?

Hi,

 

I'd like to bucket order size by units (1-5, 6-10, 11-20, 21-30, 30+) and use it as a slicer or field in report visuals. My fact table contains more than 17 million rows, so I want to do this as efficiently as possible. Is it best to create a conditional column in Power Query during the ETL stage, or add a column using DAX?

 

Or is there another alternative that I'm missing? Thanks!

15 REPLIES 15
Ashish_Mathur
Super User
Super User

Hi,

You may try using measures to solve this problem.  Please see the attached files.


Regards,
Ashish Mathur
http://www.ashishmathur.com
https://www.linkedin.com/in/excelenthusiasts/
Thejeswar
Community Champion
Community Champion

@Dave1mo1 ,

You can use 2 table approach

 

1. 1st is your original table where you want to have the bucket information

2. Create a new table that has the Units grouped by Order level. This table as of me will have only 2 columns (Order, Units). 

3. To the table created in step 2, add a calculated column to define the buckets as per your requirement using conditional column option

4. Using Merge option in Power Query editor, Merge these two tables and get the Bucket info alone to the 1st table.

5. This should give you a table with details at Order line number while also having the bucket information for the Order.

 

With your volume of records, can't be sure about the time it could take. But it is worth a try.

 

Although I would prefer doing this same approach in a much higher level like writing this in a SQL statement and pulling in only the output rather than doing this inside Power BI

 

Regards,

Dave1mo1
Helper II
Helper II

Bump - Any other ideas? Thanks!

Thank you Tahreem24 

Hi, @Dave1mo1

When your conditional columns only depend on other columns in the current table, it is usually better to perform in the ETL phase.
If your calculated columns depend on fields from other tables, you can create DAX calculated columns.
SQL BI has an article dedicated to this, and I believe it can help you make the right decision:

vjianpengmsft_0-1733721886734.png

 

Comparing DAX calculated columns with Power Query computed columns - SQLBI 

 

 

Best Regards

Jianpeng Li

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Hi Jian,

 

That is helpful - thanks. I am trying to create buckets based on an aggregation. My data comes in at an "order line" granuality, but I'd like to aggregate at the order level to show total units for each unit. Then I'd like to group the total units for each order into buckets (1-5, 6-10, etc.).

 

I can write the DAX measure to give me total units for an order easily, but then that result doesn't "live" anywhere for me to bucket. When I tried to aggregate in the ETL stage by duplicating the query and using "Group By" in the second query, it took a very long time (I have ~17m records in this table, so it's somewhat understandable).

 

I'm not sure how I can both aggregate at a higher granularity than my fact table and bucket that aggregation in an efficient way. Do you have any other ideas?

Hi, @Dave1mo1 

I wish you all the best. Previously we have provided a solution to help you solve the problem. Since we haven't heard back from you yet, I'd like to confirm if you've successfully resolved this issue or if you need further help?
You are welcome to share your workaround and mark it as a solution so that other users can benefit as well. If you find a reply particularly helpful to you, you can also mark it as a solution.
If you still have any questions or need more support, please feel free to let us know. We are more than happy to continue to help you.
Thank you for your patience and look forward to hearing from you.

 

Best Regards

Jianpeng Li

Hi Jianpeng,

 

I'm planning on spending some time today trying the "two table" method to see if that works for me. Thanks!

 

-David

Hi, @Dave1mo1 

I wish you all the best. Previously Thejeswar have provided a solution to help you solve the problem. Since we haven't heard back from you yet, I'd like to confirm if you've successfully resolved this issue or if you need further help?
If yes, you are welcome to share your workaround and mark it as a solution so that other users can benefit as well. If you find a reply particularly helpful to you, you can also mark it as a solution.
If you still have any questions or need more support, please feel free to let us know. We are more than happy to continue to help you.
Thank you for your patience and look forward to hearing from you.

 

 

Best Regards

Jianpeng Li

 

I've not been able to get the preview to load the second table in a reasonable time to test. I think this will add too much time to my refreshes to be worth the outcome, to be honest 😞 I think the solution will work but can't verify it, in other words.

Hi, @Dave1mo1 

Merry Christmas to you. Previously we have provided a solution to help you solve the problem. Since we haven't heard back from you yet, I'd like to confirm if you've successfully resolved this issue or if you need further help?
If yes, you are welcome to share your workaround and mark it as a solution so that other users can benefit as well. If you find a reply particularly helpful to you, you can also mark it as a solution.
If you still have any questions or need more support, please feel free to let us know. We are more than happy to continue to help you.
Thank you for your patience and look forward to hearing from you.

 

 

Best Regards

Jianpeng Li

Hi, @Dave1mo1 

I understand your concerns. This is a challenge when dealing with large datasets.
I think you could try to do it in Power Query and use the Table.buffer function to cache some data in memory.

Table.Buffer - PowerQuery M | Microsoft Learn
Or if you're using some relational databases, you can do that by writing SQL on load. Of course, the best way to do this is to do it upstream (in the database), which will guarantee the loading speed in Power BI.
Of course, I have a small idea that if you have permission to use Fabric, you can feed your data through Fabric's pipeline into the lakehouse, and then the lakehouse completes the transformation. In Power BI, you only need to read the data of lakehouse to create a report.

If you still have any questions or need more support, please feel free to let us know. We are more than happy to continue to help you.
Thank you for your patience and look forward to hearing from you.

 

 

Best Regards

Jianpeng Li

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Hi, @Dave1mo1 

Yeah. That's right, and we look forward to your follow-up feedback. It would be great if it was possible to solve your current problem smoothly!

 

Best Regards

Jianpeng Li

Hi @Dave1mo1 

Are you able to create a calculated column? You can use two tools to determine which is faster in your data, either the calculated column or the conditional column for PQ:

PQ:

vjianpengmsft_0-1733902727816.png

Calculated Column:

vjianpengmsft_1-1733902841388.png

If you still have any questions or need more support, please feel free to let us know. We are more than happy to continue to help you.
Thank you for your patience and look forward to hearing from you.

 

 

Best Regards

Jianpeng Li

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

 

 

 

 

 

Dave1mo1
Helper II
Helper II

Hi Tahreem,

 

Thanks so much for the link. I did try using the "bins" feature, but I can't figure out how to change the name of the bin. If I say I want a bin size of "5" for units, it gives me bin names of "0," 5," "10," etc. I want the bin name to reflect the range of 1-5, 6-10, etc.

Tahreem24
Super User
Super User

@Dave1mo1 Best way to define the bucket is by using "Bins and Grouping" feature of PBI. Attached is the official blog for your reference:

https://learn.microsoft.com/en-us/power-bi/create-reports/desktop-grouping-and-binning

 

Don't forget to give thumbs up and accept this as a solution if it helped you!!!

Please take a quick glance at newly created dashboards : Restaurant Management Dashboard , HR Analytics Report , Hotel Management Report, Sales Analysis Report , Fortune 500 Companies Analysis , Revenue Tracking Dashboard

Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!

Dec Fabric Community Survey

We want your feedback!

Your insights matter. That’s why we created a quick survey to learn about your experience finding answers to technical questions.

ArunFabCon

Microsoft Fabric Community Conference 2025

Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.

December 2024

A Year in Review - December 2024

Find out what content was popular in the Fabric community during 2024.