::install_github("ropensci/rtweet") remotes
Twitter is a complicated place. I’ve met some of my closest friends through twitter, it’s the place where I keep in touch with my professional community, and it’s an important part of my work as a developer advocate at Voltron Data. But it’s also a general purpose social media site, and there is a lot of content out there I prefer not to see. In particular, because I’m transgender and have absolutely no desire to participate in or even view the public debate that surrounds trans lives, I want to keep that kind of content off my twitter feed. This is particularly salient to me today as a member of the Australian LGBT community. Most people who follow me on twitter probably wouldn’t be aware of it, but it’s been a rough week for LGBT folks in Australia courtesy of a rather intense political fight over LGBT rights and the ostensible (and in my personal opinion, largely fictitious) conflict with religious freedom. The details of Australian politics don’t matter for this post, however. What matters is that these kinds of political disputes necessarily spill over into my twitter feed, and it is often distressing. Events like this one are quite commonplace in my online life, and as a consequence I’ve found it helpful to partially automate my use of twitter safety features such as muting and blocking. Because my experience is not unique, I thought it might be useful to write a short post talking about the the scripts I use to manage twitter safety features from R using the rtweet package.
Warning: off-label usage
Let’s make one thing very clear at the outset. A lot of what I’m going to be doing in this blog post is “off label” use of the rtweet package. You’ll see me use it in ways that the writers of the package didn’t really intend it to be used (I think), and you’ll see me dig into the internals of the package and rely on unexported functions.
This is almost always a bad idea.
If you haven’t seen it, Hadley Wickham gave a very good talk about maintaining R packages as part the rstudio::global 2021 conference. At about the 19 minute mark he talks about the “off label” metaphor. In the context of medication, “off label” refers to any use of a medication that it’s not officially approved for. It might work, but there could be unknown consequences because maybe it hasn’t been fully explored in this context. When applied to software, “off label” use means you’re doing something with a function or package that the designer doesn’t really intend. Your code might work now, but if you’re relying on “incidental” properties of the function to achieve your desired ends, you’re taking a risk. Package maintainers will usually go to considerable lengths to make sure that updates to their packages don’t break your code when you’re using it for its intended purpose… but if you’re doing something “off label” there’s a good chance that the maintainers won’t have thought about your particular use case and they might unintentionally break your code.
In short: you go off-label at your own risk. In this particular instance it is a risk I’ve chosen to take and I’m perfectly happy to fix my scripts if (or, let’s be realistic, when) future updates to rtweet cause them to break. Or possibly abandon my scripts. I knew the risks when I went off-label.
But if you follow me along this path you need to be aware of the risk too… don’t go off-label lightly! In my case I didn’t do this on a whim: I chose this path about a year ago out of personal desperation, and I’ve had to rewrite the scripts a lot in that time. So, please be careful.
Setting up
The first step is to make sure you have the developer version of rtweet: the scripts I’ve been using rely on the dev version of the package and won’t work with the current CRAN version. To be precise, I’m currently using rtweet version 0.7.0.9011. If you don’t have it, this is the command to install:
The second step is to authenticate so that rtweet can access private information about your twitter account. The good news here is that the authentication mechanism in the dev version of rtweet is a little more streamlined than it used to be. You only need to authenticate once on your machine, and the command is as simple as this:
auth_setup_default()
With that, you should be ready to start!
Write a block/mute function
Our first task will be to write a function that can be used either to block or to mute a twitter account. A little whimsically I decided to call it cancel_user()
. Quite obviously the name is a personal joke, since it does not “cancel” anyone: the only thing blocking or muting accomplishes is to give you a little distance from the account you’re muting or blocking.
The reason for wanting one function that can switch between muting and blocking is that I typically run every process twice, once on my primary account (where, with one exception, I don’t block anyone but mute extensively) and once on my private account (where I block quite aggressively). I’d like to be able to reuse my code in both contexts, so I’ll design the core function to handle both blocks and mutes. Here’s the code:
<- function(user, type) {
cancel_user <- c(
api "block" = "/1.1/blocks/create",
"mute" = "/1.1/mutes/users/create"
):::TWIT_post(
rtweettoken = NULL,
api = api[type],
params = list(user_id = user)
) }
There’s quite a bit to unpack here.
First notice that I have called the internal function rtweet:::TWIT_post()
. This is the clearest indication that I’m working off-label. If I were interested only in muting users and never blocking, I’d be able to do this without going off-label because rtweet has an exported function called post_mute()
that you can use to mute an account. However, there is no corresponding post_block()
function (possibly for good reasons) so I’ve written cancel_user()
as my personal workaround.
Second, let’s look at the interface to the function. Unlike the more sophisticated functions provided by rtweet this is a bare-bones interface. The user
argument must be the numerical identifier corresponding to the account you want to block/mute, and type
should be either be "mute"
or "block"
depending on which action you wish to take.
Finding the numeric user id code for any given user is straightforward with rtweet. It provides a handy lookup_users()
function that you can employ for this. The actual output of the function is a data frame containing a lot of public information about the user, but the relevant information is the user_id
variable. So, if you hate me enough to want to mute or block me on twitter, I’ll make it easy on you. Here’s my numeric user id:
lookup_users("djnavarro")$user_id
[1] "108899342"
As it turns out, for the particular scripts I use, I rarely need to rely on lookup_users()
but it is a very handy thing to know about.
Preparing to scale
As written there’s nothing particularly wrong with the cancel_user()
function, but it’s also not very useful. I can use it to block or mute an individual account, sure, but if that were the problem I wanted to solve it would be a lot easier to do that using the block/mute buttons on twitter. I don’t need to write an R function to do that.
The only real reason to implement this as an R function is if you intend to automate it in some fashion and repeat the operation on a scale that would be impossible to do manually. To give a sense of the scale at which I’ve had to implement this I currently have about 220000 accounts blocked from my private account, and a similar number muted from my main account. There’s no way I could possibly do that manually, so I’m going to need to be a little more thoughtful about my cancel_user()
function.
Practice safe cancellation
The first step in making sure the function works well “at scale”1 is error handling. If I have a list of 50000 account I want to block but for one reason or another cancel_user()
throws an error on the 5th account, I don’t want to prevent R from attempting to block the remaining 49995 accounts. Better to catch the error and move on. My preferred way to do this is to use purrr::safely()
:
<- purrr::safely(cancel_user) cancel_safely
The cancel_safely()
function operates the same way as cancel_user()
with one exception. It never throws an error: it always returns a list with two elements, result
and error
. One of these is always NULL
. If cancel_user()
throws an error then result
is NULL
and error
contains the error object; if it doesn’t then error
is null and result
contains the output from cancel_user()
.
Not surprisingly, the cancel_safely()
function is much more useful when we’re trying to block or mute large numbers of accounts on twitter.
Check your quotas my loves
One thing that has always puzzled me about the twitter API is that it places rate limits on how many mutes you can post in any 15 minute period, but doesn’t seem to impose any limits on the number of blocks you can post. I’m sure they have their reasons for doing it, but it’s inconvenient. One consequence of this is that there are lots of tools that exist already for blocking large numbers of accounts. You don’t actually need to write a custom R script for that! But if you want to mute large numbers of accounts, it’s a lot harder: you have to write a script that keeps posting mutes until the rate limits are exceeded, then pauses until they reset, and then starts posting mutes again. Speaking from experience, this takes a very long time. As a purely practical matter, you don’t want to be in the business of muting large numbers of accounts unless you are patient and have a very good reason. In my case, I did.
In any case, one thing we’ll need to write a rate_exceeded()
function that returns TRUE
if we’ve hit the rate limit and FALSE
if we haven’t. That’s actually pretty easy to do, as it turns out, because any time our attempt to mute (or block) fails, the cancel_safely()
function will catch the error and capture the error message. So all we have to do to write a rate_exceeded()
function is to check to see if there’s an error message, and if there is a message, see if that message informs us that the rate limite has been exceeded. This function accomplishes that goal:
<- function(out) {
rate_exceeded if(is.null(out$error)) return(FALSE)
if(grepl("limit exceeded", out$error$message)) return(TRUE)
return(FALSE)
}
Because blocks are not rate limited, in practice this function only applies when you’re trying to mute accounts.
Be chatty babes
The last step in preparing the cancellation function to work well at scale is to make it chatty. In practice, a mass block/mute operation is something you leave running in its own R session, so you want it to leave behind an audit trail that describes its actions. A moderately verbose function is good here. You could make this as sophisticated as you like, but I find this works nicely for me:
<- function(user, type) {
cancel_verbosely
# notify user attempt has started
<- c(
msg "block" = "blocking user id",
"mute" = "muting user id"
)::local_options(scipen = 14)
withr::cli_process_start(paste(msg[type], user))
cli
# make the attempt; wait 5 mins if rate limits
# exceeded and try again
repeat {
<- cancel_safely(user, type)
out if(rate_exceeded(out)) {
Sys.sleep(300)
else {
} break
}
}
# notify user of the outcome
if(is.null(out$result)) {
::cli_process_failed()
clielse{
} ::cli_process_done()
cli
} }
Here’s what the output looks like when it successfully blocks a user. Not fancy, but it shows one line per account, specifies whether the action was a block or a mute, and makes clear whether the attempt succeeded or failed:
✓ blocking user id 15xxxx66 ... done
(where, in the real output the user id for the blocked account is of course not censored). In this function I’ve used the lovely cli package to create pretty messages at the R command line, but there’s nothing stopping you from using simpler tools if you’d prefer.
Scaling up
Now that we have a version of our block/mute function that is suitable for use on a larger scale, it’s time to put it into practice. Let’s say I have a list of 50000 users
represented as numeric ids and I want to block (or mute) all these accounts. To do this, I’ll need a vectorised version of cancellation function. Thanks to the functional programming tools in the purrr package, this is not too difficult. Here’s the cancel_users()
function that I use:
<- function(users, type) {
cancel_users <- c("block" = "blocking ", "mute" = "muting ")
msg cat(msg[type], length(users), " users\n...")
::walk(users, cancel_verbosely, type = type)
purrr }
When given a vector of user ids, the cancel_users()
function will attempt to block/mute them all one at a time. When rate limits are exceeded it will pause and wait for them to reset, and then continue with the process. For mass muting in particular it can take a long time, so it’s the kind of thing you run in its own session while you go do something else with your life. If you want to be clever about it you can make it a background job and sink the output to a log file but honestly I’m usually too lazy to bother with that: all I’m trying to do is sanitise my twitter experience, I’m not deploying production code here.
The trickier question is “where do I get this block list from?”
Here, I’m not going to be too specific, for a couple of reasons. Firstly, I don’t want to be in the business of teaching people how to track down hidden networks of users embedded in social media. That’s not something I’m comfortable doing. Secondly, if you’re doing this defensively (i.e., you’re protecting yourself from attack) then you probably already know something about where the attacks are coming from. You already have your own list of key names, because they’re the people who keep harassing you. Really, your only goal is to block them and their followers, because the thing that’s happening is they’re targeting you and they’re using their follower base as a weapon. Right? I mean if that’s not the situation you’re in, and what you’re actually trying to do is seek out a hidden population to potentially target them… yeah I’m not sure I want to be telling you the other tricks I know. So let’s keep it very simple.
The easiest trick in the book (and, honestly, one of the most powerful when you’re trying to block out harassment from a centralised “hub-and-spokes” network), is simply to find every account that follows more than \(k\) of the \(n\) of the key actors, and block/mute them. Actually, in the case of “astroturf” organisations that don’t have real grassroots support, you can probably just pick a few of the big names and block (or mute) all their followers. That will eliminate the vast majority of the horrible traffic that you’re trying to avoid. (Yes, I am speaking from experience here!)
The rtweet package makes this fairly painless courtesy of the get_followers()
function. Twitter makes follower lists public whenever the account itself is public, so you can use get_followers()
to return a tibble that contains the user ids for all followers of a particular account.2 Here’s an example showing how you an write a wrapper around get_followers()
and use it to block/mute everyone who follows a particular account:
<- function(user, type = "block", n_max = 50000, precancelled = numeric(0)) {
cancel_followers
<- get_followers(user, n = n_max, retryonratelimit = TRUE)
followers <- followers$from_id
followers
<- setdiff(followers, precancelled)
uncancelled <- sort(as.numeric(uncancelled))
uncancelled
cancel_users(uncancelled, type = type)
}
Note the precancelled
argument to this function. If you have a vector of numeric ids containing users that you’ve already blocked/muted, there’s no point wasting time and bandwidth trying to block them again, so the function will ignore anything on that list. You could use the same idea to build a whitelist of accounts that would never get blocked or muted regardless of who they follow.
We’re almost at the end of the post. There’s only one other thing I want to show here, and that’s how to extract a list of all the accounts you currently have muted or blocked. Again this particular bit of functionality isn’t exposed by rtweet directly, so you’ll notice that I’ve had to go off-label again and call an unexported function!
<- function(type, n_max, ...) {
list_cancelled <- c(
api "block" = "/1.1/blocks/ids",
"mute" = "/1.1/mutes/users/ids"
)<- list(
params include_entities = "false",
skip_status = "true"
)<- rtweet:::TWIT_paginate_cursor(NULL, api[type], params, n = n_max, ...)
resp <- unlist(lapply(resp, function(x) x$ids))
users return(users)
}
I’m not going to expand on this one, other than to mention that when you get to the point where you have hundreds of thousands of blocked or muted accounts, it’s handy to use a function like this from time to time, and to save the results locally so that you can be a little more efficient whenever you next need to refresh your block/mute lists.
Epilogue
I wrote this post in two minds. On the one hand, the rtweet developers made a decision not to support blocklists for a reason, and presumably the twitter developers have some reason for making it difficult to mute large numbers of accounts. It’s very rarely a good idea to write code that works against the clear intent of the tools you’re relying on. It is almost certain to break later on.
On the other hand, this is functionality that I personally need. On my primary account I’ve made the deliberate decision not to block anyone3 but to keep my twitter feed free of a particular type of content I have had to mute an extremely large number of accounts. Twitter makes that difficult to do, but with the help of these scripts I managed to automate the process. After a month or two, with a little manual intervention, the problematic content was gone from my feed, and I was able to get back to doing my job. So, if anyone else finds themselves in a similar situation, hopefully this blog post will help.
Footnotes
I mean, what exactly do we mean by “at scale” here? In the context of data wrangling, a hundred thousand anything is rarely considered “at scale”. But mute/blocks on twitter are usually measured in the tens or hundreds at most. Doing things at the hundreds of thousands scale is a big step up from the typical use case, and as noted later, because of how the twitter API handles mutes it’s something that can take months to complete.↩︎
This is not unrelated to the reason why I keep my follows hidden behind private lists. In public it looks like I don’t follow anyone but using private lists I actually follow several hundred people! Some time ago I had some unpleasant experiences with people using that information to target me, so now I use private lists exclusively. Being a trans woman on the internet is fuuuuuuuun.↩︎
There is one exception to this rule, but that’s a personal matter.↩︎
Reuse
Citation
@online{navarro2022,
author = {Navarro, Danielle},
title = {R Scripts for Twitter Mutes and Blocks},
date = {2022-02-11},
url = {https://blog.djnavarro.net/r-scripts-for-twitter-blocks},
langid = {en}
}