Join us at FabCon Atlanta from March 16 - 20, 2026, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.
Register now!The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now! Learn more
I am using R to stream tweets from Twitter
After doing some cleaning on the tweets, ie. eliminating link, duplicated links, etc, I convert it to seriallsed JSON format
I am trying to call the text analytics (sentiment) API but always return 400 (bad service)
But when I limit the number of tweets to very small amount (10 tweets), somehow it works
Any suggestion on how to solve this problem?
GalaxyS8 <- searchTwitter("Galaxy S8", n=10000, lang='en')
GalaxyS8_tweets_df = do.call("rbind", lapply(GalaxyS8, as.data.frame))
GalaxyS8_tweets_df = subset(GalaxyS8_tweets, select = c(text))
textScrubber <- function(dataframe)
{dataframe$text <- gsub("—", " ", dataframe$text)
dataframe$text <- gsub("&", " ", dataframe$text)
dataframe$text = gsub("[[:punct:]]", "", dataframe$text)
dataframe$text = gsub("(RT|via)((?:\\b\\W*@\\w+)+)", " ", dataframe$text)
dataframe$text = gsub("@\\w+", "", dataframe$text)
dataframe$text = gsub("http\\w+", "", dataframe$text)
dataframe$text = gsub("[ \t]{2,}", "", dataframe$text)
dataframe$text = gsub("^\\s+|\\s+$", "", dataframe$text)
dataframe["DuplicateFlag"] = duplicated(dataframe$text)
dataframe = subset(dataframe, dataframe$DuplicateFlag=="FALSE")
dataframe = subset(dataframe, select = -c(DuplicateFlag))
return(dataframe)
}
GalaxyS8_tweets_df <- textScrubber(GalaxyS8_tweets_df)
GalaxyS8_tweets_df["language"] = "en"
GalaxyS8_tweets_df["id"] = seq.int(nrow(GalaxyS8_tweets_df))
request_body_GalaxyS8 = GalaxyS8_tweets_df[c(2,3,1)]
request_body_json_GalaxyS8 = toJSON(list(documents = request_body_GalaxyS8))
result_GalaxyS8 <- POST("https://westus.api.cognitive.microsoft.com/text/analytics/v2.0/sentiment",
body = request_body_json_GalaxyS8,
add_headers(.headers = c('Content-Type'='application/json','Ocp-Apim-Subscription-Key'='your-api-key')))
not specifically related to your problem, but you can use | to seperate multiple objects in gsub, like so:
dataframe$text <- gsub("—|&", " ", dataframe$text)
which might make your code a little more concise - there's some great documentation on the grep function and how R handles regular expressions available online :).
As mentioned in document, it has size limitation for input JSON when calling Text Analytics Sentiment API:
The maximum size of a single document that can be submitted is 10 KB, and the total maximum size of submitted input is 1 MB. No more than 1,000 documents may be submitted in one call. Rate limiting exists at a rate of 100 calls per minute - we therefore recommend that you submit large quantities of documents in a single call.
Regards,
The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now!
| User | Count |
|---|---|
| 4 | |
| 3 | |
| 2 | |
| 1 | |
| 1 |
| User | Count |
|---|---|
| 4 | |
| 4 | |
| 4 | |
| 3 | |
| 3 |