Don't miss your chance to take the Fabric Data Engineer (DP-700) exam on us!
Learn moreWe've captured the moments from FabCon & SQLCon that everyone is talking about, and we are bringing them to the community, live and on-demand. Starts on April 14th. Register now
I am using R to stream tweets from Twitter
After doing some cleaning on the tweets, ie. eliminating link, duplicated links, etc, I convert it to seriallsed JSON format
I am trying to call the text analytics (sentiment) API but always return 400 (bad service)
But when I limit the number of tweets to very small amount (10 tweets), somehow it works
Any suggestion on how to solve this problem?
GalaxyS8 <- searchTwitter("Galaxy S8", n=10000, lang='en')
GalaxyS8_tweets_df = do.call("rbind", lapply(GalaxyS8, as.data.frame))
GalaxyS8_tweets_df = subset(GalaxyS8_tweets, select = c(text))
textScrubber <- function(dataframe)
{dataframe$text <- gsub("—", " ", dataframe$text)
dataframe$text <- gsub("&", " ", dataframe$text)
dataframe$text = gsub("[[:punct:]]", "", dataframe$text)
dataframe$text = gsub("(RT|via)((?:\\b\\W*@\\w+)+)", " ", dataframe$text)
dataframe$text = gsub("@\\w+", "", dataframe$text)
dataframe$text = gsub("http\\w+", "", dataframe$text)
dataframe$text = gsub("[ \t]{2,}", "", dataframe$text)
dataframe$text = gsub("^\\s+|\\s+$", "", dataframe$text)
dataframe["DuplicateFlag"] = duplicated(dataframe$text)
dataframe = subset(dataframe, dataframe$DuplicateFlag=="FALSE")
dataframe = subset(dataframe, select = -c(DuplicateFlag))
return(dataframe)
}
GalaxyS8_tweets_df <- textScrubber(GalaxyS8_tweets_df)
GalaxyS8_tweets_df["language"] = "en"
GalaxyS8_tweets_df["id"] = seq.int(nrow(GalaxyS8_tweets_df))
request_body_GalaxyS8 = GalaxyS8_tweets_df[c(2,3,1)]
request_body_json_GalaxyS8 = toJSON(list(documents = request_body_GalaxyS8))
result_GalaxyS8 <- POST("https://westus.api.cognitive.microsoft.com/text/analytics/v2.0/sentiment",
body = request_body_json_GalaxyS8,
add_headers(.headers = c('Content-Type'='application/json','Ocp-Apim-Subscription-Key'='your-api-key')))
not specifically related to your problem, but you can use | to seperate multiple objects in gsub, like so:
dataframe$text <- gsub("—|&", " ", dataframe$text)
which might make your code a little more concise - there's some great documentation on the grep function and how R handles regular expressions available online :).
As mentioned in document, it has size limitation for input JSON when calling Text Analytics Sentiment API:
The maximum size of a single document that can be submitted is 10 KB, and the total maximum size of submitted input is 1 MB. No more than 1,000 documents may be submitted in one call. Rate limiting exists at a rate of 100 calls per minute - we therefore recommend that you submit large quantities of documents in a single call.
Regards,
If you have recently started exploring Fabric, we'd love to hear how it's going. Your feedback can help with product improvements.
A new Power BI DataViz World Championship is coming this June! Don't miss out on submitting your entry.
Share feedback directly with Fabric product managers, participate in targeted research studies and influence the Fabric roadmap.
| User | Count |
|---|---|
| 1 | |
| 1 | |
| 1 | |
| 1 | |
| 1 |