Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Next up in the FabCon + SQLCon recap series: The roadmap for Microsoft SQL and Maximizing Developer experiences in Fabric. All sessions are available on-demand after the live show. Register now

"R" Don't remove duplicates

Provide the option of not removing duplicates automatically when creating R visualizations or provide the ability to create R datasets using the same syntax as shown in the comments when creating an R visualization
Status: Under Review
Comments
gdeckler4
New Member
The comments of an R visualization show: #dataset <- data.frame(Column) However, I cannot use the same syntax to create my own data frame.
boefraty
Microsoft Employee
The workaround here is to add "ID column" to the data (don't use it in R script)
efglynn
Advocate IV
I shouldn't have to add an ID or key field to get all the data. If I want to remove duplicates in R, it's trivial with the "unique" statement.
efglynn
Advocate IV
When linking to an SQL Server Analysis Services database cube, I don't directly have access to the right keys that make records unique to block duplicates from being removed. Therefore, when using cubes it may not be possible to get accurate data in R in some cases. Many statistics/visualizations are worthless when duplicates have been removed. Can someone explain why removing duplicates was ever a good idea?
jo_varney
New Member
I want to use R to create a histogram. I add one column, and then it removes all the duplicates, which provides a completely inaccurate histogram. This is pretty stilly - and potentially problematic if someone uses this without noticing. Yes, I can add extra columns, or create an ID column, but I don't want to. I don't want the program to remove duplicates, just because it sees fit. There are times when it isn't appropriate - and as the analyst I want that choice. Also, I want to write the simplest code, and that should involve only one column for a histogram.
george_fullegar
New Member
I have no idea why this feature isn't standard behaviour - it's trivial in R to remove duplicates from a dataset if that behaviour is desired
MAwbre
Advocate I
Why does it remove duplicates by default? When performing univariate qualitative analysis, I want to be able to drop in a single qualitative field. This means I WANT duplicates and having keys complicates the analysis meaninglessly. That automatic removal should be made an option. I believe that it was added because of the limitation of R scripts in Power BI to 150k rows. The removal of duplicates by Power BI (in what I assume to be some sort of pre-processor directive like call judging by the invalid code syntax shows in the editor) probably helps mitigate that limitation in certain types of data sets. Unfortunately, without the option to turn off that "pre-processor" like call, an entire segment of potential analysis is complicated or even impossible (if the original data set has no key).
aaron_s_creight
New Member
The work around for this is the use of an Index column. However, this does not work if the data is coming from different tables!!! I would have to create a new table with a index column for that new table containing the variables of interest. This would defeat the purpose of using a rational data structure and other aspects.
sleptwithyomama
New Member
Microsoft always adds useless features that always are dumb. The very least, for this stupid feature, is to have it disable-possible.
renvaldas1
New Member
Please do the same for Python!