Have been playing around with the R Visuals Preview feature creating word clouds and have run into some issues. Was wondering if anyone else was seeing what I was seeing.
When you create an R visualization, you currently have to drag one field into the visual in order to get the scripting interface. When you do that, you get some grayed out text that you can't edit like:
# Create dataframe
# dataset <- data.frame(word)
# Remove duplicated rows
# dataset <- unique(dataset)
Even though these are comments, apparently there is something going on behind the scenes that creates this dataset because I can use the data frame object (dataset) in my R script. Interestingly enough, however, I cannot seem to create my own data frame using the same syntax. For example, if I try:
myDataset <- data.frame(word)
That doesn't seem to work and throws an error. In addition, I can't find any way to turn off the automatic removal of duplicates. Hence, creating a word cloud from data stored in my Power BI Desktop model becomes very problematic.
Again, I realize it is preview but was curious if anyone else was seeing the same thing I was and perhaps had found some settings or work-a-rounds that have thus far eluded me.
I solved this by adding an index column to my data, which automatically makes every row unique.
To do this
1) Open the query editor
2) Select the relevant query
3) Select the "Add Column" tab in the ribbon
4) Select "Index Column"
5) Be sure to include your index column in your R visual as a data field
As an FYI - I also noticed the same thing with respect to creating a new dataframe - which I thought was odd so I submitted a frown. Duplicate row deletion also seems quite limiting... I couldn't find a work around using toaster or wordcloud if you want to adjust the frequency, size, color etc. of the words based on number of repeats. Maybe shiny... Thanks for pointing this out.
Well, as dumb as it sounds, what I ended up doing was loading up tm and creating a Corpus pointing to the csv file that my table originally came from in my model. This loaded up all of the words and I could create my word cloud, but I thought it was pretty useless and pointless that I was forced to add something from my data model that I didn't use AT ALL, just to get to the R script where I could then circumvent the data model entirely and just load up my original data. In the end, my R script didn't refer to my data model in Power BI at all, which begs the question, why even use it.
And yes, I realize it is Preview...
This was the R script to create my word cloud if anyone is interested (pretty standard):
library(tm) library(wordcloud) words <- Corpus(DirSource('c:\\temp\\powerbi\\r')) words <- tm_map(words, stripWhitespace) words <- tm_map(words, content_transformer(tolower)) words <- tm_map(words, removeWords, stopwords("english")) words <- tm_map(words, stemDocument) wordcloud(words, scale=c(5,0.5), max.words=100, random.order=FALSE, rot.per=0.35, use.r.layout=FALSE, colors=brewer.pal(8, "Dark2"))
I agree, it is pretty pointless! But, I like your solution of creating a corpus with tm. I guess PBI didn't recognize it for what it was, a dataframe. I'm going to frown that we're not able to choose whether to keep duplicate rows when using R visuals. Even still, the functionality is really promising. I just hope that in the end it's less restrictive or rather, decisive on our behalf!
Just came across this post during my own research using R to play with word clouds. I posted to the Power BI idea forum on this very issue:
The documentation for tm states there is a constructor option for accessing external data, but the example is very thin and appears to be the package accessing a data file outside of an RDMS which I am not even sure how it would work (unless there is a driver involved somewhere).
Anyway, my preferred option would be for the tm package to be able to access any dataset already defined within a Power BI report.
Disregard on my problem. I read the tm docs a little more and found that I could access my datasets using the DataframeSource function. This works pretty well. My only downside now is that a word cloud generated with R cannot filter other visuals. If I click on another visual in my report my word cloud gets regenerated, but I cannot use the word cloud itself as a filter.