Automating Online Hate Speech Detection
A survey of deep learning approaches to the task of identifying hateful content
September 2, 2019 | đź“Ť Edinburgh, UK
Full dissertationIn March 2019, a white supremacist posted racist manifestos–which spread
through Twitter–and live-streamed the shooting of over 51 people on
Facebook and YouTube. While both companies leverage machine learning
algorithms to automatically detect and remove such content, they failed
to take down the content quickly enough. Facebook’s head of public
policy defended the platform’s slow response time:
“The video was a new type that our machine learning system hadn’t seen
before. It was a first person shooter with a GoPro on his head…This is
unfortunately an adversarial space. Those sharing the video were
deliberately splicing and cutting it and using filters to subvert
automation. There is still progress to be made with machine learning
[1]."
Progress needs to be made sooner rather than later. Hateful content on
social media contributes to real-world violence [2], recruitment to and
propaganda for terrorist individuals/groups [3], makes other users feel
less safe and secure on social platforms [4], and triggers increased
levels of toxicity in the network [5].
Governments are enforcing the need to find reasonable solutions, fast.
The Sri Lanken government temporarily banned social media networks three
separate times in the wake of the Easter 2019 suicide bombings that
killed hundreds of people, in order to prevent “social unrest via hate
messages” [6]. French leader Emmanuel Macron has made fighting online
hate speech a priority, taking meetings with Mark Zuckerberg and other
high-level Facebook representatives; as a result Facebook has agreed to
hand over data on French users suspected of spewing hate speech online.
This is an international first, and according to a counsel at law firm
Linklaters, "a strong signal in terms of regulation" [8]. Germany and
the UK have strict legislation against hate speech [7]. EU countries
have threatened to fine social networks up to 50 million euro per year
if they continue to fail to act fast enough [9].
Social media has contributed to a more open and connected world [10]. It
has promoted Western liberal values through its effects on protest
mobilization [11], community building [12], and accountability in
governments and institutions [13]. It is critical that we do not lose
the benefits of these platforms as they operate under increased
regulation and scrutiny. This research surveys the capabilities and
limitations of state-of-the-art (SOTA) deep learning classifiers. We aim
to inform policy and decision makers, who must reconcile the benefits of
social media platforms with the harms they threaten when hate speech is
allowed to propagate.
• • •
The experiments consist of three phases that demonstrate the effect of
user behavior metrics on a combination of models and embedding types.
Each of our 4 model choices (CNN, LSTM, MLP, and DENSENET) are run with
our 3 embedding types, for 4 rounds of experiments, with 3 different
seeds, for a total of 144+ experiments before tuning.
The dataset for this research comes from Founta et al.’s work that
describes the process of large scale crowdsourcing for annotations of
hateful, normal, abusive, and spam tweets [14].
We clean the tweets by tokenizing, lowercasing, and removing
punctuation.
We experiment with three types of text embeddings:
TF-IDF embeddings:/strongTF-IDF fits the training set into a weighted
vector by normalized frequency of the 10,000 most common words in our
vocabulary. Our validation and test set are transformed using the
learned weights.
Pretrained Twitter embeddings:These embeddings are from a Word2Vec model trained on 400 million raw
English tweets, with an embedding dimension of 400 [15].
Pretrained BERT embeddings:Google’s BERT, or
Bidirectional Encoder Representations from Transforms, is a novel method
of pre-training language representations which obtains SOTA results on a
range of NLP tasks described in [16].
• • •
Phase 1: We compare our deep learning models to a
baseline logistic regression model to begin with an idea around how much
of an improvement they can offer. Here the feature embedding to our deep
learning models are simply the tweet embeddings.
Phase 2: The goal of this round of experiments is to apply some type
of context to our embeddings. Here and for the remaining experiments, we
shift to building a neural network with multiple inputs in order to have
the network learn from the annotated tweet in addition to other types of
embeddings. We have the annotated tweet embedding and the context tweet
embedding as separate inputs; they are processed by different parts of
the model architecture. The learned features for each are concatenated
and fed into a final fully connected layer. First, because of the
statistics collected around the behavior of hateful users and
retweeting, we define pairs of tweets: the original tweet and a context
tweet. The context tweet is for the case that the tweet was a response
to something else. Next, because tweet in-degree has been shown to be
significant [17] we focus on in-degree in terms of number of times
someone has retweeted a given tweet and number of times someone has
favorited a given tweet. We concatenate the logged retweet and favorite
counts to our tweet and reply embeddings. We are interested to see if
the network can better learn from the retweet and favorite numbers as a
measure of context.
Multiple Input CNN-BERT Model ArchitecturePhase 3:This phase aims to add context in a more sophisticated way.
For each tweet in our dataset, we crawl the author’s user timeline and
collect 200 tweets. We then conduct topic modeling through the LDA
approach. We use LDA, as opposed to other topic modeling techniques,
because LDA represents documents (or tweets) as random mixtures over
topics in the corpus, which reflects what we expect from tweets on a
user’s timeline [18]. We also concatenate the coherence and perplexity
scores of the user’s timeline to each embedded topic word to add a
global measure of topic modeling.
Hyperparameter tuning: We tune our best performing models–by model
type and embedding type–by experimenting with learning rate,
regularization, number of layers, and the model specific parameter.
Hyperparameter Tuning, Facilitated by Comet.ML• • •
Our final model, phase 2 CNN-BERT, successfully picked up on negative
tweet sentiment and identified the abusive class at the highest rate, of
82% accuracy and f-score 0.78. The model offers a significant
improvement on detecting hate speech, as we are able to improve on our
logistic regression baseline performance on the hateful class by 0.13
f-score on a dataset with scarce hateful labels. If we were to randomly
annotate a tweet as hateful with 4% probability, we’d achieve around 4%
accuracy on the hateful class. Thus, we interpret the final f-score on
the hateful class of 0.33 as relatively high.
The models that used Google’s pretrained BERT embeddings performed
better than TF-IDF and Twitter pretrained embeddings across most models
in our three phases of experiments. CNN-BERT outperformed the logistic
regression, MLP, LSTM, and DenseNet models for all three phases of
experiments. Before tuning, our best performing model is the CNN
multiple input model architecture with tweet and user topic BERT
embeddings. After tuning, our best performing model is the CNN multiple
input model architecture with tweet and reply BERT embeddings. We
hypothesize that this is because the parameter choices of a single
layer, 47 filter CNN with high dropout will overfit with BERT and a
large measure of user context. The pretrained BERT embeddings add enough
semantic information to give us our most competitive models and adding
additional metrics of context through aspects of the social network does
not improve performance.
The task of automating the detection of hate speech on social media
platforms remains a challenge, in part due to the difficulty in
obtaining high-quality, large-scale annotated datasets and the scarce
hateful samples available for machine learning models to learn from. Our
experiments reflect this and suggest that improving the quality and
consistency of annotations in our dataset is likely to result in more
accurate automated systems.
Sources
- J. Wakefield, “Hate speech: Facebook, twitter and youtube told off
by mps,” Apr 2019.
- Müller and C. Schwarz, “Fanning the flames of hate: Social media and
hate crime,” Available at SSRN 3082972, 2018.
- I. Awan, “Cyber-extremism: Isis and the power of social media,”
Society, vol. 54, no. 2, pp. 138–149, 2017.
- M. ElSherief, S. Nilizadeh, D. Nguyen, G. Vigna, and E. Belding, “Peer
to peer hate: Hate speech instigators and their targets,” in Twelfth
International AAAI Conference on Web and Social Media, 2018.
- J. Cheng, M. Bernstein, C. Danescu-Niculescu-Mizil, and J. Leskovec,
“Anyone can become a troll: Causes of trolling behavior in online
discussions,” in Pro- ceedings of the 2017 ACM conference on computer
supported cooperative work and social computing. ACM, 2017, pp.
1217–1230.
- T. Marcin, “Facebook, youtube, whatsapp banned again in sri lanka
after violence against muslims,” May 2019.
- E. Stein, “History against free speech: The new german law against
the” auschwitz”: And other:” lies”,” Michigan Law Review, vol. 85, no.
2, pp. 277– 324, 1986.
- M. Rosemain, “Exclusive: In a world first, facebook to give data on
hate speech…” Jun 2019.
- E. cial Thomasson, “German cabinet agrees to fine so- media over
hate speech,” Apr 2017.
- H. Rainie, J. Q. Anderson, and J. Albright, The future of free speech,
trolls, anonymity and fake news online. Pew Research Center
Washington, DC, 2017.
- A. Breuer, T. Landman, and D. Farquhar,“Social media and protest
mobilization: Evidence from the tunisian revolution,” Democratization,
vol. 22, no. 4, pp. 764– 792, 2015.
- S. J. Jackson, M. Bailey, and B. Foucault Welles, “# girlslikeus:
Trans advocacy and community building online,” New Media & Society,
vol. 20, no. 5, pp. 1868– 1888, 2018.
- R. Enikolopov, M. Petrova, and K. Sonin, “Social media and
corruption,” American Economic Journal: Applied Economics, vol. 10,
no. 1, pp. 150–74, 2018.
- A. M. Founta, C. Djouvas, D. Chatzakou, I. Leontiadis, J. Blackburn,
G. Stringhini, A. Vakali, M. Sirivianos, and N. Kourtellis, “Large
scale crowdsourcing and characterization of twitter abusive behavior,”
in Twelfth International AAAI Conference on Web and Social Media,
2018.
- F. Godin, B. Vandersmissen, W. De Neve, and R. Van de Walle,
“Multimedia lab@ acl wnut ner shared task: Named entity recognition
for twitter microposts using distributed word representations,” in
Proceedings of the workshop on noisy user-generated text, 2015, pp.
146–153.
- J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training
of deep bidirectional transformers for language understanding,” arXiv
preprint arXiv:1810.04805, 2018.
- R. Nishi, T. Takaguchi, K. Oka, T. Maehara, M. Toyoda, K.-i.
Kawarabayashi, and N. Masuda, “Reply trees in twitter: data analysis
and branching process mod- els,” Social Network Analysis and Mining,
vol. 6, no. 1, p. 26, 2016.
- D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation,”
Journal of machine Learning research, vol. 3, no. Jan, pp. 993–1022,
2003.