March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early bird discount ends December 31.
Register NowBe one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now
Hi,
I'd like to bucket order size by units (1-5, 6-10, 11-20, 21-30, 30+) and use it as a slicer or field in report visuals. My fact table contains more than 17 million rows, so I want to do this as efficiently as possible. Is it best to create a conditional column in Power Query during the ETL stage, or add a column using DAX?
Or is there another alternative that I'm missing? Thanks!
You can use 2 table approach
1. 1st is your original table where you want to have the bucket information
2. Create a new table that has the Units grouped by Order level. This table as of me will have only 2 columns (Order, Units).
3. To the table created in step 2, add a calculated column to define the buckets as per your requirement using conditional column option
4. Using Merge option in Power Query editor, Merge these two tables and get the Bucket info alone to the 1st table.
5. This should give you a table with details at Order line number while also having the bucket information for the Order.
With your volume of records, can't be sure about the time it could take. But it is worth a try.
Although I would prefer doing this same approach in a much higher level like writing this in a SQL statement and pulling in only the output rather than doing this inside Power BI
Regards,
Bump - Any other ideas? Thanks!
Thank you Tahreem24
Hi, @Dave1mo1
When your conditional columns only depend on other columns in the current table, it is usually better to perform in the ETL phase.
If your calculated columns depend on fields from other tables, you can create DAX calculated columns.
SQL BI has an article dedicated to this, and I believe it can help you make the right decision:
Comparing DAX calculated columns with Power Query computed columns - SQLBI
Best Regards
Jianpeng Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Hi Jian,
That is helpful - thanks. I am trying to create buckets based on an aggregation. My data comes in at an "order line" granuality, but I'd like to aggregate at the order level to show total units for each unit. Then I'd like to group the total units for each order into buckets (1-5, 6-10, etc.).
I can write the DAX measure to give me total units for an order easily, but then that result doesn't "live" anywhere for me to bucket. When I tried to aggregate in the ETL stage by duplicating the query and using "Group By" in the second query, it took a very long time (I have ~17m records in this table, so it's somewhat understandable).
I'm not sure how I can both aggregate at a higher granularity than my fact table and bucket that aggregation in an efficient way. Do you have any other ideas?
Hi, @Dave1mo1
I wish you all the best. Previously we have provided a solution to help you solve the problem. Since we haven't heard back from you yet, I'd like to confirm if you've successfully resolved this issue or if you need further help?
You are welcome to share your workaround and mark it as a solution so that other users can benefit as well. If you find a reply particularly helpful to you, you can also mark it as a solution.
If you still have any questions or need more support, please feel free to let us know. We are more than happy to continue to help you.
Thank you for your patience and look forward to hearing from you.
Best Regards
Jianpeng Li
Hi Jianpeng,
I'm planning on spending some time today trying the "two table" method to see if that works for me. Thanks!
-David
Hi, @Dave1mo1
I wish you all the best. Previously Thejeswar have provided a solution to help you solve the problem. Since we haven't heard back from you yet, I'd like to confirm if you've successfully resolved this issue or if you need further help?
If yes, you are welcome to share your workaround and mark it as a solution so that other users can benefit as well. If you find a reply particularly helpful to you, you can also mark it as a solution.
If you still have any questions or need more support, please feel free to let us know. We are more than happy to continue to help you.
Thank you for your patience and look forward to hearing from you.
Best Regards
Jianpeng Li
I've not been able to get the preview to load the second table in a reasonable time to test. I think this will add too much time to my refreshes to be worth the outcome, to be honest 😞 I think the solution will work but can't verify it, in other words.
Hi, @Dave1mo1
Merry Christmas to you. Previously we have provided a solution to help you solve the problem. Since we haven't heard back from you yet, I'd like to confirm if you've successfully resolved this issue or if you need further help?
If yes, you are welcome to share your workaround and mark it as a solution so that other users can benefit as well. If you find a reply particularly helpful to you, you can also mark it as a solution.
If you still have any questions or need more support, please feel free to let us know. We are more than happy to continue to help you.
Thank you for your patience and look forward to hearing from you.
Best Regards
Jianpeng Li
Hi, @Dave1mo1
I understand your concerns. This is a challenge when dealing with large datasets.
I think you could try to do it in Power Query and use the Table.buffer function to cache some data in memory.
Table.Buffer - PowerQuery M | Microsoft Learn
Or if you're using some relational databases, you can do that by writing SQL on load. Of course, the best way to do this is to do it upstream (in the database), which will guarantee the loading speed in Power BI.
Of course, I have a small idea that if you have permission to use Fabric, you can feed your data through Fabric's pipeline into the lakehouse, and then the lakehouse completes the transformation. In Power BI, you only need to read the data of lakehouse to create a report.
If you still have any questions or need more support, please feel free to let us know. We are more than happy to continue to help you.
Thank you for your patience and look forward to hearing from you.
Best Regards
Jianpeng Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Hi, @Dave1mo1
Yeah. That's right, and we look forward to your follow-up feedback. It would be great if it was possible to solve your current problem smoothly!
Best Regards
Jianpeng Li
Hi @Dave1mo1
Are you able to create a calculated column? You can use two tools to determine which is faster in your data, either the calculated column or the conditional column for PQ:
PQ:
Calculated Column:
If you still have any questions or need more support, please feel free to let us know. We are more than happy to continue to help you.
Thank you for your patience and look forward to hearing from you.
Best Regards
Jianpeng Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Hi Tahreem,
Thanks so much for the link. I did try using the "bins" feature, but I can't figure out how to change the name of the bin. If I say I want a bin size of "5" for units, it gives me bin names of "0," 5," "10," etc. I want the bin name to reflect the range of 1-5, 6-10, etc.
@Dave1mo1 Best way to define the bucket is by using "Bins and Grouping" feature of PBI. Attached is the official blog for your reference:
https://learn.microsoft.com/en-us/power-bi/create-reports/desktop-grouping-and-binning
User | Count |
---|---|
117 | |
77 | |
58 | |
52 | |
46 |
User | Count |
---|---|
171 | |
117 | |
63 | |
57 | |
51 |