Text Analytics Sentiment API in R - Twitter

reinaldogani · ‎06-04-2017

I am using R to stream tweets from Twitter
After doing some cleaning on the tweets, ie. eliminating link, duplicated links, etc, I convert it to seriallsed JSON format

I am trying to call the text analytics (sentiment) API but always return 400 (bad service)
But when I limit the number of tweets to very small amount (10 tweets), somehow it works

Any suggestion on how to solve this problem?

GalaxyS8 <- searchTwitter("Galaxy S8", n=10000, lang='en')

GalaxyS8_tweets_df = do.call("rbind", lapply(GalaxyS8, as.data.frame))
GalaxyS8_tweets_df = subset(GalaxyS8_tweets, select = c(text))

textScrubber <- function(dataframe)
{dataframe$text <- gsub("—", " ", dataframe$text)
dataframe$text <- gsub("&", " ", dataframe$text)
dataframe$text = gsub("[[:punct:]]", "", dataframe$text)
dataframe$text = gsub("(RT|via)((?:\\b\\W*@\\w+)+)", " ", dataframe$text)
dataframe$text = gsub("@\\w+", "", dataframe$text)
dataframe$text = gsub("http\\w+", "", dataframe$text)
dataframe$text = gsub("[ \t]{2,}", "", dataframe$text)
dataframe$text = gsub("^\\s+|\\s+$", "", dataframe$text)
dataframe["DuplicateFlag"] = duplicated(dataframe$text)
dataframe = subset(dataframe, dataframe$DuplicateFlag=="FALSE")
dataframe = subset(dataframe, select = -c(DuplicateFlag))

return(dataframe)
}

GalaxyS8_tweets_df <- textScrubber(GalaxyS8_tweets_df)

GalaxyS8_tweets_df["language"] = "en"
GalaxyS8_tweets_df["id"] = seq.int(nrow(GalaxyS8_tweets_df))
request_body_GalaxyS8 = GalaxyS8_tweets_df[c(2,3,1)]

request_body_json_GalaxyS8 = toJSON(list(documents = request_body_GalaxyS8))

result_GalaxyS8 <- POST("https://westus.api.cognitive.microsoft.com/text/analytics/v2.0/sentiment",
body = request_body_json_GalaxyS8,
add_headers(.headers = c('Content-Type'='application/json','Ocp-Apim-Subscription-Key'='your-api-key')))

BeardyGeorge · ‎07-14-2017

not specifically related to your problem, but you can use | to seperate multiple objects in gsub, like so:

dataframe$text <- gsub("—|&", " ", dataframe$text)

which might make your code a little more concise - there's some great documentation on the grep function and how R handles regular expressions available online :).

v-sihou-msft · ‎06-06-2017

@reinaldogani

As mentioned in document, it has size limitation for input JSON when calling Text Analytics Sentiment API:

The maximum size of a single document that can be submitted is 10 KB, and the total maximum size of submitted input is 1 MB. No more than 1,000 documents may be submitted in one call. Rate limiting exists at a rate of 100 calls per minute - we therefore recommend that you submit large quantities of documents in a single call.

Regards,