Power BI is turning 10, and we’re marking the occasion with a special community challenge. Use your creativity to tell a story, uncover trends, or highlight something unexpected.
Get startedJoin us for an expert-led overview of the tools and concepts you'll need to become a Certified Power BI Data Analyst and pass exam PL-300. Register now.
Hello all
I am working with data with duplicates; however, I can't remove them as I need to use other columns in other charts.
I would ideally like to create the below - where the first three columns are checked and, based on how many times they are or aren't duplicated, create a column with their counts. This will allow me to filter duplicates on the relevant charts.
Note that I don't need to know if the duplicate is 2,3, etc. It could just be "first" and "duplicate"—whichever is easier.
Many thanks
Nicole
Solved! Go to Solution.
Hi, @collis
You can try the following methods. Start by adding the index column to the Power query.
Count =
CALCULATE ( COUNT ( 'Table'[Site] ),
FILTER (
ALLEXCEPT ('Table','Table'[Scheduled Start],'Table'[Scheduled End],'Table'[Site]),
[Index] <= EARLIER ( 'Table'[Index] )
)
)
Is this the result you were expecting?
Best Regards,
Community Support Team _Charlotte
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Create a new column
Duplicate Status =
VAR CurrentRowKey =
[Scheduled Start] & " " & [Scheduled End] & " " & [Site]
VAR RowCount =
CALCULATE(
COUNTROWS('YourTable'),
FILTER(
'YourTable',
'YourTable'[Scheduled Start] = EARLIER('YourTable'[Scheduled Start]) &&
'YourTable'[Scheduled End] = EARLIER('YourTable'[Scheduled End]) &&
'YourTable'[Site] = EARLIER('YourTable'[Site])
)
)
VAR RowNumber =
RANKX(
FILTER(
'YourTable',
'YourTable'[Scheduled Start] = EARLIER('YourTable'[Scheduled Start]) &&
'YourTable'[Scheduled End] = EARLIER('YourTable'[Scheduled End]) &&
'YourTable'[Site] = EARLIER('YourTable'[Site])
),
'YourTable'[Scheduled Start],
,
ASC,
Dense
)
RETURN
IF(RowNumber = 1, "First", "Duplicate")
Give this a try and let me know how it works for you!
Hi,
Share data in a format that can be pasted in an MS Excel file.
Scheduled Start | Scheduled End | Site |
3/11/2023 9:00 | 3/11/2023 17:00 | NSW |
8/11/2023 9:00 | 8/11/2023 17:00 | NSW |
10/11/2023 9:30 | 10/11/2023 10:30 | NSW |
10/11/2023 10:30 | 10/11/2023 11:30 | NSW |
10/11/2023 10:30 | 10/11/2023 11:30 | NSW |
13/11/2023 13:00 | 13/11/2023 16:00 | NSW |
13/11/2023 13:30 | 13/11/2023 21:30 | NSW |
14/11/2023 9:30 | 14/11/2023 17:30 | NSW |
16/11/2023 9:00 | 16/11/2023 17:00 | NSW |
16/11/2023 12:00 | 16/11/2023 20:00 | NSW |
17/11/2023 11:30 | 17/11/2023 19:30 | NSW |
22/11/2023 11:30 | 22/11/2023 11:30 | NSW |
22/11/2023 11:30 | 22/11/2023 19:30 | NSW |
23/11/2023 9:30 | 23/11/2023 17:30 | NSW |
24/11/2023 9:00 | 24/11/2023 17:00 | NSW |
24/11/2023 9:00 | 24/11/2023 17:00 | NSW |
24/11/2023 9:00 | 24/11/2023 17:00 | NSW |
24/11/2023 9:00 | 24/11/2023 17:00 | NSW |
27/11/2023 11:00 | 27/11/2023 14:00 | NSW |
29/11/2023 9:30 | 29/11/2023 17:30 | NSW |
29/11/2023 9:30 | 29/11/2023 17:30 | NSW |
29/11/2023 9:30 | 29/11/2023 17:30 | NSW |
29/11/2023 10:00 | 29/11/2023 18:00 | NSW |
Many thanks
Hi, @collis
You can try the following methods. Start by adding the index column to the Power query.
Count =
CALCULATE ( COUNT ( 'Table'[Site] ),
FILTER (
ALLEXCEPT ('Table','Table'[Scheduled Start],'Table'[Scheduled End],'Table'[Site]),
[Index] <= EARLIER ( 'Table'[Index] )
)
)
Is this the result you were expecting?
Best Regards,
Community Support Team _Charlotte
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Hi,
This M code works
let
Source = Excel.CurrentWorkbook(){[Name="Data"]}[Content],
#"Grouped Rows" = Table.Group(Source, {"Scheduled Start", "Scheduled End", "Site"}, {{"Count", each Table.AddIndexColumn(_,"Index",1)}}),
#"Expanded Count" = Table.ExpandTableColumn(#"Grouped Rows", "Count", {"Index"}, {"Index"}),
#"Changed Type" = Table.TransformColumnTypes(#"Expanded Count",{{"Scheduled Start", type datetime}, {"Scheduled End", type datetime}, {"Site", type text}, {"Index", Int64.Type}})
in
#"Changed Type"
Hope this helps.
Thank you. So fast. This works well, but it deletes other columns. How would I keep other columns to be used for other charts?
Thank you again for the reply. Unfortunately, my knowledge isn't advanced enough. Trying though... If I list all the columns, the count won't work as the data in the other columns isn't duplicated. I only need to check duplicates across three columns, and then the other columns I need to keep to be able to do other charts. I have seen people add
{"all", each _, type table [membercode=number, capacity=number]}
but I can't get it to work. Sorry, this is really stretching my brain!
You are welcome. In the Grouped Rows step, mention all the columns.
Sorry - I just quickly put together a demonstration to help communicate (and keep the data private). Imagine they are duplicates.
Hi Nicole,
The last 3 rows in your image aren't duplicates because the time in the Scheduled End column are different?
Regards
Phil
Proud to be a Super User!
User | Count |
---|---|
59 | |
59 | |
56 | |
38 | |
29 |
User | Count |
---|---|
77 | |
62 | |
45 | |
40 | |
39 |