March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early bird discount ends December 31.
Register NowBe one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now
Hi all,
I have a dataset with products with several components. I want to calculate the number of SKU for each product. Unfortunately, there are double values. The product is a car. So level 0 is the car. Level 1 can be the engine, a tyre, a light, etc. And this goes on till level 6 (in the sample set, there are only 4).
But for every unique row in a level, you've a row in the next level. For example, you've one extra row in Level 1 with AA|. That's the car. But this shouldn't be count as component, because the car is all rows in level 0 (excluding the extra row added in Level 1). This could still easily solved by just filter it. But for the next levels, it is more complicated.
In Level 1, there are AA|53414662 and AA|57312563. These are the only components, and won't continue in Level 2. So these two should be counted. But there are also sub-components AA|62885492 and AA|BDS18000. They have a extra row in Level 2 with 'AA |' ( it can be a different code, but it does have 3 characters (or it is empty)). And the same principe applies for Levels 3 and 4 (and 5 and 6, but not in this example).
I hope this explanation is clear. If not, please let me know. I added a file with one car. I coloured the first 40 rows where necessary. The blue ones should be excluded in the count in the measure (or in the table if it is possible in the Query Editor). The green ones are correct and should stay.
https://docs.google.com/spreadsheets/d/11ZZpXEezO95mWFAgRr4OYSQxXyXCUG_v/edit?usp=sharing&ouid=11807...
Solved! Go to Solution.
Yes, I see
That's a rather interesting hierarchy - aka a multi parent hierarchy. Not overly surprising since you mentioned a BOM scenario, but still requires special attention. Usually it is done by "cloning" the child.
In your case we need to change the logic to only include items that do NOT appear anywhere earlier in the path.
CleanPath =
var l = PATHLENGTH([BillOfMaterials_bk])
var a = ADDCOLUMNS(GENERATESERIES(1,l),"pi",pathitem([BillOfMaterials_bk],[Value]))
return CONCATENATEX(filter(a,[pi]<>""),[pi],"|")
LastChild = PATHITEMREVERSE([CleanPath],1)
LastOccurrences =
var c = [LastChild]
var a = filter(ALL('Table'),PATHCONTAINS([CleanPath],c) && PATHITEMREVERSE([CleanPath],1)<>c)
return if(countrows(a)>0,0,1)
Hi @lbendlin
It works great! Thank you!
Is there a way to do these steps in Power Query? I noticed that adding the calculated columns, in particular, CleanPath and LastOccurences, take a long time (since there a millions of rows in the data set).
@J94 Here is the Power Query version. I am sure it will be much slower. You could try to use a Table.Buffer but that will likely exhaust the memory.
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("lZbJbuMwDIbfpec5kNTqox01c3CNBrHhAFPkTfLwlRPLSyvSmiBAgvyfFW4i+fX1VtePtkH4AO0ez9fb/c/+V7QeEZTKq0ZpAFT6QD0DUHybIsoVUb6EUsBRqK0lxmqnkIxlPLbkvdEV8+yihnZQzalGmerC9QbW5OOyUpf3AAh5X1ZqvLQKGKoJPfpZFNTucv0ENLaIymdgTxlTFVAW8/8YSDUYv+WtXtRwXrECqjvHT63Vf9KvL6Skp7qR1G1gchDqW8S4HCyqIY0ELl8TK6WNRvD5+looF4uDLB6cNVNMDl4UU++LWmI1FVlNRVbTgdXNSYp1UudbyuTrF6UqoNn8AlpXU4k/W1AxrUmgUfQGH/E8jJdX9AYf4XRZDiugZp99GW0RtXPW5ukgVVJSu/bjGh3h8rqlFBxR/VArMPmJslKxc8Zxke9UoWknipkGi2rJoELP9KBEvfq94zKUqKknxoZ/QPXNp4+zjqGeCfGc1bNadsZBFBMlR3FDEXEdctPIBdWi9hUA80/7xk2Mb5lhUEJtDv1NT1eZr5SkxsodYrEwEfhBMfW9p+joLDHm136iuJgndbvoidSyWAnUZsE6pJgI9IPUS5L6mhHA2PODYup7T2kmiomaYm2NY6weemkqJfXZS4zO71cLJe6OKyXtjisl7Y5hHCP1l6uQpJo4fBR4JquJkjekUdyQkmrQxLnKbJZbyiJUTHxWysZ7DUysN5R1mpisjuKGlNRoD1bKiVZTkdVUZDUdWP0vSPc+qcbFHQu4e7+jkOu1ier6OsQim6n7Nw==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [BillOfMaterials_bk = _t]),
#"Added Custom" = Table.AddColumn(Source, "Cleaned", each List.RemoveMatchingItems(Text.Split([BillOfMaterials_bk],"|"),{""})),
#"Added Custom1" = Table.AddColumn(#"Added Custom", "LChild", each List.Last([Cleaned])),
#"Added Custom2" = Table.AddColumn(#"Added Custom1", "Occ", (k)=> Table.RowCount(Table.SelectRows(#"Added Custom1",each List.Contains([Cleaned],k[LChild]) and k[LChild]<>List.Last([Cleaned]))))
in
#"Added Custom2"
Hi @lbendlin , you're absolutely right: it is so much slower. I think eventually it will be loaded, but I am already waiting for 2 yours for that.
Anyway, thank you so much for your help. I need to find a different approach to filter the data, so that I can use your suggestions.
Have you tried with Table.Buffer?
Power Query does not have PATH functions. You would have to implement these manually.
Please provide sample data that fully covers your issue.
Hi @lbendlin
Thanks for your reply!
It is not yet what I am looking for. Because the codes AA|53414662 and AA|57312563 should be counted, but when I apply the filter, they are deleted (because their value is 2). Can you've a look at that?
Yes, I see
That's a rather interesting hierarchy - aka a multi parent hierarchy. Not overly surprising since you mentioned a BOM scenario, but still requires special attention. Usually it is done by "cloning" the child.
In your case we need to change the logic to only include items that do NOT appear anywhere earlier in the path.
CleanPath =
var l = PATHLENGTH([BillOfMaterials_bk])
var a = ADDCOLUMNS(GENERATESERIES(1,l),"pi",pathitem([BillOfMaterials_bk],[Value]))
return CONCATENATEX(filter(a,[pi]<>""),[pi],"|")
LastChild = PATHITEMREVERSE([CleanPath],1)
LastOccurrences =
var c = [LastChild]
var a = filter(ALL('Table'),PATHCONTAINS([CleanPath],c) && PATHITEMREVERSE([CleanPath],1)<>c)
return if(countrows(a)>0,0,1)
User | Count |
---|---|
96 | |
87 | |
84 | |
71 | |
46 |
User | Count |
---|---|
180 | |
147 | |
89 | |
74 | |
60 |