Automatic Identification of Hate Speech

Johan Eddebo; Mika Hietanen; Mathias Johansson

doi:10.12688/f1000research.147107.1

Home Browse Automatic Identification of Hate Speech – A Case-Study of alt-Right...

ALL Metrics

-

Views

-

Downloads

Get PDF

Get XML

Export

▬

✚

Research Article

Automatic Identification of Hate Speech – A Case-Study of alt-Right YouTube Videos

[version 1; peer review: 3 approved, 1 approved with reservations]

Johan Eddebo¹, Mika Hietanen ², Mathias Johansson³

PUBLISHED 23 Apr 2024

Author details Author details

¹ Centre for Multidisciplinary Research on Religion and Society (CRS), Uppsala University, Uppsala, Sweden
² Department of Communication and Media, Lund University, Lund, 221 00, Sweden
³ Department of Arts and Cultural Sciences, Lund University, Lund, Sweden

Johan Eddebo
Roles: Conceptualization, Data Curation, Investigation, Methodology, Writing – Original Draft Preparation

Mika Hietanen
Roles: Conceptualization, Project Administration, Resources, Writing – Original Draft Preparation, Writing – Review & Editing

Mathias Johansson
Roles: Data Curation, Formal Analysis, Methodology, Writing – Original Draft Preparation

OPEN PEER REVIEW

REVIEWER STATUS

This article is included in the Political Communications gateway.

This article is included in the Artificial Intelligence and Machine Learning gateway.

This article is included in the AI & Democracy collection.

Abstract

Background

Identifying hate speech (HS) is a central concern within online contexts. Current methods are insufficient for efficient preemptive HS identification. In this study, we present the results of an analysis of automatic HS identification applied to popular alt-right YouTube videos.

Methods

This essay describes methodological challenges of automatic HS detection. The case study concerns data on a formative segment of contemporary radical right discourse. Our purpose is twofold. (1) To outline an interdisciplinary mixed-methods approach for using automated identification of HS. This bridges the gap between technical research on the one hand (such as machine learning, deep learning, and natural language processing, NLP) and traditional empirical research on the other. Regarding alt-right discourse and HS, we ask: (2) What are the challenges in identifying HS in popular alt-right YouTube videos?

Results

The results indicate that effective and consistent identification of HS communication necessitates qualitative interventions to avoid arbitrary or misleading applications. Binary approaches of hate/non-hate speech tend to force the rationale for designating content as HS. A context-sensitive qualitative approach can remedy this by bringing into focus the indirect character of these communications. The results should interest researchers within social sciences and the humanities adopting automatic sentiment analysis and for those analysing HS and radical right discourse.

Conclusions

Automatic identification or moderation of HS cannot account for an evolving context of indirect signification. This study exemplifies a process whereby automatic hate speech identification could be utilised effectively. Several methodological steps are needed for a useful outcome, with both technical quantitative processing and qualitative analysis being vital to achieve meaningful results. With regard to the alt-right YouTube material, the main challenge is indirect framing. Identification demands orientation in the broader discursive context and the adaptation towards indirect expressions renders moderation and suppression ethically and legally precarious.

Keywords

Automatic hate speech identification; Hate speech; alt-right; YouTube; interdisciplinary research

Corresponding author: Mika Hietanen

Competing interests: No competing interests were disclosed.

Grant information: Johan Eddebo’s contribution was financed by the Marianne and Marcus Wallenberg Foundation and Mathias Johansson’s by the research platform DigitalHistory@Lund, a collaboration between the Section for Media History, the Lund Centre for the History of Knowledge (LUCK), and the Lund University Humanities Lab (https://projekt.ht.lu.se/digitalhistory).
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Copyright: © 2024 Eddebo J et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite: Eddebo J, Hietanen M and Johansson M. Automatic Identification of Hate Speech – A Case-Study of alt-Right YouTube Videos [version 1; peer review: 3 approved, 1 approved with reservations]. F1000Research 2024, 13:328 (https://doi.org/10.12688/f1000research.147107.1) First published: 23 Apr 2024, 13:328 (https://doi.org/10.12688/f1000research.147107.1) Latest published: 23 Apr 2024, 13:328 (https://doi.org/10.12688/f1000research.147107.1)

Methods

Overview

There is no cut-and-ready method for our purposes. By necessity our approach is experimental in design and should be seen as exploratory when it comes to how a verbal and contextual phenomenon such as hate speech can be identified automatically. Our twofold purpose has led us to implement two complementary methodological approaches. In the following, we first describe an approach to automatic identification in four steps: gathering an annotated corpus (for this we have used three corpora); text pre-processing; training of the classifier and applying it to the YouTube transcripts. Finally, we engage with the material through a qualitative close reading of select parts to better understand the previous step, and the character of the material in general.

The challenge of automatically classifying the sentiment of a text has a long history (Biagioni, 2016), and the sub-problem of identifying what is offensive in texts has also been given attention (Schmidt & Wiegand, 2019). A methodologically simple method is the lexical approach: to rely on an annotated lexicon of terms related to the sentiment (positively and negatively) and to count the frequencies of terms in a text. If there are more positive (or negative) terms or phrases in the text it can be classified as positive (or negative). This has some obvious drawbacks: terms are used variously and can relate to a context in contradictory ways. An example of this is sarcasm. A statement or phrasing that uses a term in its non-literal sense is a challenge for language-modelling. This is especially true when such statements are also used in their literal meaning in the same corpus. Hate speech is a specific variation of sentiment analysis; instead of classifying statements on a scale between two opposites (positive and negative) it classifies statements as hate speech or not hate speech. Gitari et al. (2015) perform a series of tests using this lexical approach to the issue of hate speech, with limited success.

With the advancement of machine learning (ML) and deep learning (DL) methods, the sophistication and range of available methods have increased. These methods tend to share one typical drawback: they are data hungry, that is, they need a lot of data to be trained to perform their task (Ploeg et al., 2014; Adadi, 2021). Typically, the more data and varied data a model is trained on, the better it will generalise what it learns to new data. When data is not readily available, collecting this type of large-scale data is time-consuming, especially if it needs to be annotated. When just having a lot of data is not enough or the required classification is not available it is standard practice to rely on manual annotation. It has become common to crowdsource this task, using volunteers or commissioned individuals using platforms such as Zooniverse and Figure 8 (Wang et al., 2013).

Data for training the model

For this study we combined three openly available hate speech annotated corpora of tweets written in English: Hate-speech-and-offensive-language, HSOL (Davidson et al., 2017), HatEval – as subset of the HatEval corpus (Basile et al., 2019), which in turn is based on a corpus for misogyny identification by Fersini et al., (2019), and Offensive Language Identification Dataset, OLID (Zampieri et al., 2019). These corpora were created with different purposes in mind and their definitions of hate speech are similar but not identical. This is not a problem because together they provide a holistic view of what people consider hate speech.

The annotation of each corpus was achieved through crowdsourcing – enlisting crowds of people to manually annotate tweets – using a rule of majority to determine each tweet’s de facto category. While this is an effective way to get each tweet annotated by multiple people for large corpora and reducing the bias of the individual annotator, it also adds some opacity to the process. The annotators are anonymous and only one of the three corpora has published the instructions provided to the annotators (Basile, n.d.). While human annotation is considered the golden standard, it is not perfect. Davidson et al. (2017) noted that annotators had more disagreement of what should be regarded as hate speech and what is merely offensive than what is neutral; an observation that reflected in the performance of their model. As our purpose is to make a binary classification between hate speech and not, we discarded tweets belonging to the other categories from each corpus and kept a total of 42,430 tweets from the three corpora that were annotated as hate speech and neutral.

Towards a classifying model

Using this corpus of 42,430 tweets we proceeded to train a classifier, the first part of which is a simple preprocessing pipeline that transforms the raw text into numeric input for the actual model as follows: (1) cleaning, (2) tokenizing, (3) stemming, (4) removing stop words, and (5) vectorizing. Out of these five steps only the cleaning needs to be customised, which is why it will be explained in more detail in the next subsection.

The first step, explained in more detail below, homogenises from the raw text format to a lowercase string of characters. These strings are then split into their constituent terms, tokens. These tokens are then stemmed, i.e., reduced to their core meaning or word by removing endings. This step may reduce different meaning tokens to a similar token, such as the verb ‘booking’ and the noun ‘books,’ which both become ‘book.’ This step reduces the number of tokens dramatically and reduces the chance that the model learns too much from a rarely used variant of a word.

From these tokens we remove words that are too common to add specific meanings to the texts, words like ‘a,’ ‘of,’ ‘and.’ The tokens that remain after this step make the basis for the model’s vocabulary, the ‘words’ or tokens the model will recognize.

Data cleaning and preparation

Special attention was paid to cleaning the tweets, since their format is different compared to our empirical material of transcriptions from YouTube. There are some conventions of Twitter and other social media that do not translate well to other written media and not at all to spoken work. For example, the use of @USER to direct one’s statement to a particular user, the inclusion of URLs, the use of emojis, and of hashtags. The latter sometimes consist of single words, a phrase, or an acronym, which sometimes can easily be separated automatically into its parts. None of these conventions exist in transcriptions and were therefore removed. The output of this step is a lowercase string of characters without @ or # tags, and without URLs.

Training the hate speech classifier

For the training and selection of models we used the SciKitLearn and Tensorflow (Keras) libraries, as these provide a lot of stable and efficient algorithms for ML models and selection. Before vectorizing and standardising the corpus, we split the data 80:20 randomly stratified by the outcome variable, depending on if the tweet was annotated as hate speech or not, to ensure that we have a similar distribution of hate speech and non-hate speech in both segments. The 20% was withheld from the model during training and was at the end used to evaluate the model. This reduces overfitting, that the model learns too much from the specific dataset it was trained on and therefore cannot generalise to new data. Similarly, some ML and DL algorithms withhold parts of the training data during the training for similar effect.

After experimenting with simpler models in SciKitLearn, random forests (Breiman, 2001), logistic regression (Menard, 2010), and Naïve Bayes, (Manning et al., 2008), which did not yield satisfactory results, we moved on to working with TensorFlow to train Artificial Neural Network (ANN) models. Specifically, we used Convolutional Neural Networks, CNN (LeCun et al., 1990) and Recurrent Neural Networks, RNN (Hochreiter & Schmidhuber, 1997; both explained in Kotu & Deshpande, 2018, ch. 10) which are more complex in nature and have the potential to solve more complex problems. We used TensorFlow’s ‘hyperband tuning’ feature to select the final RNN architecture with a testing performance of f-1 score of .854.

Output

The end-product of this process is a trained classifier that takes texts as an input and returns a value within the range of [0,1] which can be interpreted as an intensity or probability of hate speech within that text. Table 1 contains the confusion matrix of the model’s predictions based on the testing data, the 20% of the corpus withheld during training. We can see that the model manages to correctly classify 89% of the hate speech tweets as hate speech but only identifies 74% of the non-hate speech tweets as non-hate speech. We can therefore expect it to reliably predict when a text does not contain hate speech, but we should be more sceptical when a text is predicted as hate speech.

Table 1. Confusion matrix of the trained model’s performance on the testing set.

Annotation	Predicted
	Hate	No hate
Hate	89	11
No hate	26	74

Structuring transcriptions for classification

The limit imposed on tweets is on character level, namely 280 characters (since 2017; earlier 140 characters; Rosen & Ihara, 2017). Roughly speaking, this restricts tweets to at most three sentences. Simultaneously, there is no technical restriction for how long a sentence can be and our transcripts lack punctuation.

Working from the concept of sentences we segmented the transcripts into synthetic sentences by dividing the texts into atoms of seven words and joining every three consecutive atoms into sentences. In this way we have 21-word sentences and atoms are included in three sentences, see Table 2.

Table 2. Structure of synthetic sentences.

Sentence₀: [	Atom₀	Atom₂	Atom₃	]
	Sentence₁: [	Atom₂	Atom₃	Atom₄	]
		Sentence₂: [	Atom₃	Atom₄	Atom₅	]

We overlapped the sentences to reduce the chances that we split a spoken sentence in ways that remove words from their context. The three-atom schema is the smallest combination that ensures that each atom is at the centre of the selected context. We experimented with different atom-lengths on a few different pieces of the material and inspected the results manually before deciding on the 7-word atom. With shorter atoms, many phrases that manually were marked as hate speech, were not identified as such by the algorithm and longer atoms did not improve results.

When looking at the transcripts on paragraph level we used a similar approach to aggregate the synthetic sentences into synthetic paragraphs using seven atoms around a paragraph’s central atom using the earlier output from the classifier for full synthetic sentences. As illustrated by Table 3, the first and last atom from each video transcript is only used in one synthetic paragraph. Even so, the central atoms of each paragraph are used up to three times due to how we overlapped the paragraphs.

Table 3. Structure of synthetic paragraphs.

Par₀: [	Sen₀	…	Sen₇	…	Sen₁₄	]
		Par₁: [	Sen₇	…	Sen₁₄	…	Sen₂₁	]
			Par₂: [	…	Sen₁₄	…	Sen₂₁	…	Sen₂₈	]

For the overall score of each paragraph, we used the mean, median and maximum scores of all full synthetic sentences of the paragraph as independent probabilities of the paragraphs containing hate speech and calculated their complement as a measurement for the likelihood that the paragraph contains hate speech.

Close reading

The outcome of the automated identification is evaluated (below) through a close reading approach, discerning the explicit and implicit argumentative structure and significations (Brummett, 2010). The material flagged as likely to be hate speech, as well as the low-probability material, are furthermore scrutinised in relation to a holistic definition of hate speech as well as the textual context as such.

We base our understanding of hate speech on the discussion in Hietanen & Eddebo (2023) which refers to speech acts or acts of communication which express intended harm, disparagement, or vilification, or inherently imply the same, and target a group or set of groups defined in relation to protected characteristics, such as race, gender, or religion (for lists of protected characteristics, see Table 2 in Hietanen & Eddebo, 2023).

Material: Alt-Right on YouTube

YouTube Alt-Right Corpus

Our selection of channels and videos for analysis is based on Data & Society’s report Alternative Influence (Lewis, 2018) which maps networks of influencers on YouTube characterised by ‘reactionary’ positions, ‘a general opposition to feminism, social justice, or left-wing politics’ (p. 8). The report presents an overview of channels with right-wing politics as well as a positive approach in relation to the radical right or alt-right movement. The YouTube transcripts, automated or uploaded by the channel, were downloaded as text-files. A manual check indicated that the quality of the transcripts was high and reliably conveyed the oral narrative.

We further selected a number of channels formally connected to the radical right, specifically, and therefore more likely associated with the sort of politically controversial communications subject to hate speech suppression. For this, we employed Rydgren’s (2018, pp. 23–24) basic definition of the radical right which emphasises ethnonationalism anchored in narratives about the past, directed towards strengthening the nation, returning to traditional values, and establishing a localised, organic, and ethnically homogenous polity. For clarification, ‘radical right’ is an ideological classifier or political preference for which we employ Rydgren’s definition. The alt-right is a contemporary, US-based white nationalist ideological movement characterised by radical right ideology. ‘Far right’ is a broader ideological classifier than ‘radical right,’ yet which overlaps with the latter to a great extent. ‘White supremacist’ is a particular ideological position which is generally a component part of these frameworks.

From this basis, we made a sub-selection of the nine most popular channels (based on the number of subscribers) from Data & Society’s report, all of which are in English. The 15 most popular video clips (in number of views) were selected from each channel. Except for the control, this selection excluded the rare material that had no political content nor any connections to radical right narratives whatsoever. The selection was made during spring and summer of 2019, apart from the control channel, where the final selection was made in June 2021.

History Time Corpus

We selected a control channel, also in English, assumed to contain very little hate speech, namely ‘History Time’ (Kelly, n.d.). This channel exhibits a certain thematic and conceptual overlap with the other material due to its focus on ethnicities in conflict and narratives about the past.

Stormfront corpus

For contrast we included sentences from the English-speaking white supremacist forum Stormfront (Stormfront, n.d.; dataset Garciá-Pablos & Perez, n.d.). Stormfront is a far-right discussion forum which can be assumed to contain a high degree of explicit hate speech (Costello & Hawdon, 2019).

Results

Automatic classification

Table 4 presents descriptive statistics of the model’s predictions for the sentences across all three corpora. On average the model identifies over 50% more hate speech in the alt-right transcripts (.111) than in the history-channel (.069), and even more in the Stormfront material (.159). Though the difference is less pronounced across the quartiles, the level of identified hate speech in the alt-right corpus is consistently nested between the other two corpora.

Table 4. Descriptive statistics of predicted hate in synthetic sentences and synthetic paragraphs.

	N	M	SD	q1^a	q2 ^a	q3 ^a	>.5 (%)^b
Synthetic sentences
Alt-right	54,853	.111	.192	.008	.027	.109	7
History	47,357	.069	.140	.005	.015	.057	3
Stormfront	10,795	.159	.200	.022	.078	.211	8
Difference (%)^c	16	60	37	69	80	90	133
Synthetic paragraphs
Alt-right	7,887	.463	.296	.195	.414	.738	43
History	6,812	.342	.262	.120	.268	.521	27
Difference (%)^c	16	35	13	62	54	42	59

a The respective quartile of scores given by the model.

b Share of units that are predicted to be hate speech with a score >.5.

c The relative change from ‘History’ to ‘Alt-right.’

The second part of Table 4 contains the descriptive statistics of the predicted hate on the paragraph level (described above) for the alt-right and history channel; the Stormfront data consists largely of short posts across different threads which prevents aggregation into paragraphs. With this approach the overall level of hate is pronounced; a much higher level of hate speech is identified across both corpora and the relative difference between the two is decreased. Still, the overall level of identified hate speech remains notably higher in the alt-right corpus (M = .463, SD = .296; 43% were predicted to contain hate speech).

On average, the model noted 4,8 percentage points more hate speech in the Stormfront data than in the alt-right YouTube data (.159–.111); 4,2 percentage points more hate speech in the alt-right YouTube data than in the History Time data; and, finally, 9 percentage points more hate speech in the Stormfront data than in the History Time data. This is consistent with the expectation that the control, History Time, contains the least amount of hate speech, and that the Stormfront data contains the most amount of hate speech.

Since the Stormfront corpus was annotated for hate speech, we can confirm that our model missed 80% of those instances that were annotated for hate speech. At the same time, the model correctly identified 90% of the instances of non-hate speech.

Qualitative close reading

We evaluated the outcome of the automated identification through a close reading approach. The material flagged as likely to be hate speech, as well as the low-probability material, were furthermore scrutinised in relation to our view on hate speech as well as the textual context.

A subsidiary category of proximate hate speech was used to identify videos, synthetic sentences, or paragraphs, which were not themselves possible to categorise as hate speech, but which nonetheless were clearly parts of broader acts of hate speech or narratives identifiable as such in connection to the discursive context. The synthetic sentences can for instance be considered proximate hate speech in relation to the meso level of paragraphs, and the paragraphs (or sentences) in relation to the macro level of the video clip as such (a distinction elaborated below).

An example of proximate hate speech defined in relation to overarching levels would be a statement about the lower IQ scores of sub-Saharan native tribes within the context of a video that, considered as a whole, purveys scientific racism. The statement as such, while precarious, could be neutral in another context, but is here auxiliary to a claim of racial inferiority on a higher level, and should therefore be categorised as hate speech.

It is not our intention to support any particular definition of hate speech, nor its implementation in a context of moderation or censorship, but rather to explore the various outcomes of such an implementation of a normative understanding of hate speech.

Levels of hate speech

In our qualitative assessment, we examined three categories of material in detail. We ran synthetic sentences derived from the transcripts through the classifier, the outcome of which served to flag and identify synthetic paragraphs constituted by said sentences, as well as entire videos.

Three levels of contextual proximity were brought to bear both on the methodological framing of the material analysed, as well as in the qualitative analysis as such: sentences, paragraphs, and videos.

The sentences represent the micro context, and it is on this level that the classifier operates directly. As an example, in the micro context of a synthetic sentence certain concepts can amount to propositions expressing speech acts that match the definition of hate speech. The sentences can also express complex ideas that likewise match the definition, or be sufficiently close thereto, to be flagged as hate speech. The micro context in our analysis consists of the synthetic sentence and the immediate text fragments before and after the synthetic sentence.

The synthetic paragraphs on the level above function as the meso context. Here, sets of related statements are interpreted in a manner similar to how we approach the sentences.

Finally, the videos are the macro context. It should be noted that several additional levels above the videos implicitly will follow from this sort of classification, such as the channels themselves or the entire U.S. alt-right discourse of the late 2010s. In the qualitative evaluation, we did not consider these additional levels of a macro or meta context other than in the sense of identifying overarching well-known ideological anchor points such as the great replacement theory or scientific racism, when interpreting statements at the lower levels.

Inspecting sentences

We analysed the top 1,000 and the bottom 10,000 synthetic sentences through close reading. This tenfold difference between the groups stems from the model predicting that hate speech sentences are greatly outnumbered and this group therefore needs to be scrutinised closer. Table 5 summarises the results.

Table 5. Results of a close reading on sentence and paragraph level.

Sample	Not hate speech		Hate speech
			Total		Direct		Proximate
Sentence
Top 1000	728	(72.8)	272	(27.2)	177	(17.7)	95	(9.5)
Bottom 10,000	9950	(99.5)	50	(0.5)	13	(0.1)	37	(0.4)
Paragraph
Top 124	88	(71)	36	(29)	17	(13.7)	19	(15.3)
Bottom 124	122	(98.4)	2	(1.6)	0	(0)	2	(1.6)

The outcomes here are strongly divergent. In the top category, we consider 27.2% of the sentences to be hate speech, either explicitly or indirectly. Of this set we considered only about a third as proximate hate speech through the qualitative analysis.

The bottom category is almost devoid of material categorised as hate speech. The result of the close reading is that only 13 sentences out of 10,000 carry that designation, and 50 (5%) in total are either explicitly or indirectly considered hate speech. Here, on the other hand, 37 (0.4%) of the designated set was considered proximate hate speech.

Inspecting paragraphs

The evaluation of paragraphs also gave us a marked division between top and bottom (Table 5, above). We scrutinised 124 paragraphs, each at the respective ends of the hate speech probability hierarchy. In the top, 17 were considered incorporating hate speech, or were as such designated hate speech. Nineteen paragraphs contained material proximate to hate speech or could as such be considered proximate hate speech, whereas the remaining 88 paragraphs were found to be neutral.

In contrast, the bottom set contained almost nothing that could be considered even proximate to hate speech. Two were placed in this category, while the remaining 122 paragraphs were designated neutral. In addition, the two paragraphs in the proximate category were not very strong examples of this intermediate category, with one (Supplementary Material, channel 7_RIT_66560, line 70) containing a quite tacit allusion to an antisemitic sentiment, and the other (channel 6_BP_26276, line 115) a veiled reference to race realism with regard to the video as broader context.

Inspecting videos

The 14 videos whose transcripts contained the highest ratio of synthetic sentences likely to contain hate speech were qualitatively assessed in detail. Four of these were considered explicitly to incorporate hate speech. In these, hate speech was not connected to individual synthetic sentences, but to larger sections of information across the video. Four additional videos were categorised as considered proximate hate speech. By analysing content and presentation, we designated the remaining six videos as neutral. Nine of these have since been removed from the platform, after an update of YouTube’s guidelines in the Summer of 2019 (YouTube, 2019; Hern, 2020).

The 14 videos whose transcripts contained the lowest ratio of synthetic sentences classified as probably containing hate speech were likewise qualitatively assessed in detail. Of these, none could be considered incorporating hate speech per se. Five could be classified as containing material proximate to hate speech or as such considered proximate hate speech in accordance with the above. The remaining nine were designated as neutral. Eight of these videos have since been removed.

Discussion

A key implication of this study is the fact that hate speech narratives in our material seem to be tacitly constructed at more complex levels of discourse. Explicit acts of immediate hate speech are almost non-existent, something that partly is due to the nature of the material, which is intended to disseminate a point of view before a neutral or amicable audience. But the lack of explicit hate speech is likely also due to the creators’ awareness of moderation and suppression of hate speech.

Initial observations to this effect compelled us to approach the material through three levels of analysis, the micro, meso, and macro levels, which confirmed the tendency towards positions and narratives being indirectly structured at more complex levels of discourse. This is focused through the non-binary distinction between hate speech and proximate hate speech.

Similar studies generally use a binary approach (e.g., hate/non-hate) when annotating, which inevitably obscures a more nuanced characterization (Vrysis et al., 2021) and entails certain issues of definition.

When we move from sentences towards paragraphs, we can see how the proximate hate speech category grows. Something designated as proximate hate speech essentially ‘points upwards’ towards more complex levels of discourse involving a broader sphere of meaning. Paragraphs with the highest number of proximate-flagged synthetic sentences are thus being indicated in this way, and the same holds for entire videos. The false positive rate (i.e., sentences, paragraphs, and videos flagged by the model as hate speech) is similar for sentences (72.8%) and paragraphs (70.1%), but lower for videos (42,8%; 6 out of 14).

The false negative rate does not reflect the same pattern. While false negatives fall when we move from sentences to paragraphs, of the set of videos with the lowest ratio of flagged synthetic sentences, all of 5 out of 14 (35.7%) were considered proximate hate speech. This is likely an artefact of the general character of the material as such which overwhelmingly tacitly affirms or reproduces narratives that can be categorised as hate speech.

This upwards indication also implies that any actual reception of hate speech narratives is mainly indirect. In other words, we are here dealing with something much more akin to Ellul’s notion of integration propaganda (or sociological propaganda) than agitprop (Ellul, 1973, ch. 1.3, sect. ‘Propaganda of Agitation and Propaganda of Integration’). This would imply that our corpus exemplifies long-term consensus building approaches rather than explicit calls to action or concrete efforts of organisation.

The top and bottom extremes

We illustrate the qualitative designation of proximate hate speech by giving two examples from the top and bottom categories of sentences, respectively, in relation to their associated meso and macro levels.

Synthetic sentence 301 in the bottom 5% category reads: ‘group’s average IQ and sub-Saharan Africans would also have some of the most conservative and racist societies based on their group’s’ (Supplementary material, tab ‘bottom 5%’). This sentence immediately signifies an attribution of a causal relationship between IQ scores and political tendencies. In the context of the radical right discourse on race, this sort of causal inference is common and immediately invokes the discursive attribution of essential racial characteristics we for instance find in the framework of scientific racism (Farber, 2011, ch. 2). The sentence does not explicitly signify hate speech by any stretch of the imagination, but it immediately appeals to an essential association between race and intelligence and a factor of comparison between distinct racial groups.

When we look for more information on the meso level, i.e., the synthetic paragraph in which the sentence in question is embedded, we find an explicit juxtaposition between the IQ scores of Aborigines and sub-Saharan Africans and that of East Asians and Japanese, with an added emphasis on the association between a lack of understanding of ‘the intricacies of the world’ and the lower IQ scores. We also see more of the implied connection between IQ scores and political affiliation which evidently is a polemic against a posited correlation between racism and lower intelligence such as in this transcribed synthetic paragraph:

a person to not understand the intricacies of the world and this in turns makes him a racist well then there’s a lot of explaining to do that would then make Australian Aborigines on average the most politically conservative as well as racist leaning groups of people on earth according to the same survey and their groups average IQ and sub-Saharan Africans would also have some of the most conservative and racist societies based on their group’s average IQ so by nations East Asians and the Japanese in particular would be the most welcoming to foreigners as their median IQ places them as among the (Supplementary Material, tab ‘Paragraphs,’ line 2257).

In other words, in maintaining that Aborigines on average would be the most politically conservative as well as racist leaning group, the author is actually saying that Aborigines are the least intelligent people in the world, with the worst capacity to ‘understand the intricacies of the world.’ Thus, the synthetic sentence, at the micro level, is categorised as proximate hate speech since it immediately implies the propositional content at the meso level. This notwithstanding, neither the sentence nor the paragraph was flagged by the filter as likely to contain hate speech.

When we look at an example from the top-level sentences categorised as proximate hate speech, the situation is similar. Sentence 209 reads: ‘thousands of people from the Middle East pouring into Europe at least let people talk about their fears right he’ll talk.’ The immediate signification here has little to do with hate speech. It connects high levels of immigration to ‘fears,’ and vaguely implies that there are efforts to suppress discussion of these fears.

Looking to the meso level, i.e., the transcribed synthetic paragraphs, the latter point is emphasised, explicitly mentioning forceful discursive repression of such discussion, yet there is still nothing akin to hate speech here:

the problem is that I really am always suspicious when there are significant social problems that nobody can talk about when when facts become a problem to the discussion the discussion itself has turned cancerous quote unquote and speech yami no we look I mean if you want to [assuage] people’s fears about hundreds of thousands of people from the Middle East pouring into Europe at least let people talk about their fears right he’ll talk about the facts that give them concern but everybody gets screamed down and that is not a good sign here’s another example so forty years ago the Swedish (Supplementary Material, tab ‘Paragraph,’ line 376; word within brackets added due to an error in the automatic transcription).

However, on the macro level, i.e., in the context of the entire video, a fuller picture emerges. The ‘fears’ of the lower levels are here connected to claims of mass rapes specifically targeting European women, the intentional displacement of white populations through the mass immigration of high-fertility non-white population groups, all of which are also associated to issues of racial characteristics as well as Marxist and feminist collusions to ‘destroy the West.’

Examples from the macro level

As additional comparison we give two examples from the macro level, one video each from the top and bottom categories. We begin with the video ‘Milo obliterates student’ from September 2016, randomised from our bottom 10% of videos, i.e., the 10% with the lowest number of sentences assessed as likely to be hate speech. Incidentally, the video has since been removed. The video presents a rather short monologue by Milo Yiannopoulos who laments what he perceives as a suppression of humour in the media. He connects this process to authoritarian policies and adds that he embraces politically incorrect humour since he simply finds such things as ‘AIDS,’ ‘Islam,’ and ‘trannies’ funny. The speaker is then interrupted by a person from the audience and responds with a few sentences which include the clause ‘fuck your feelings.’ We designated this video clip as not containing hate speech, either immediately or proximately, in relation to our definitions above. Criticising content moderation policies is obviously not hate speech, nor is incivility, and the admittedly demeaning approach to the groups in question does not fulfil the requisites of hate speech, notwithstanding any protected characteristics.

The video ‘What the Founders Really Thought about Race’ is randomised from the top 10% of videos. The video clip in question is from May 2017 and features a documentary-style presentation of the ‘actual’ views on race held by the U.S. founding fathers. This video is also removed. The presenter begins by disputing the idea that racial equality was not affirmed by the founding fathers and adds that black slaves were held by around 40% of the land-owning colonists and that a segregationist perspective was dominant. It is further argued that the founding fathers and influential early politicians, including Lincoln, desired to expel blacks from the United States, not least due to the widespread disgust over ‘miscegenation.’ The presentation sums up by arguing that different races build different types of societies and implies that the evident contemporary social ills can be connected to a divergence from these ostensibly traditional views of the founding fathers. During the close reading, we designated this video clip as containing explicit hate speech in accordance with the above definitions, in part due to the express usage of ‘miscegenation’ and the implicit racial supremacist message at the end.

Comparative examples from top and bottom sentences

In the following, we give examples of sentences automatically classified as hate speech and non-hate speech. We compare twelve synthetic sentences from the top and bottom categories, respectively. Three from each category designated as hate speech or proximate hate speech (including immediate context fragments before and after), and three from each found to not contain hate speech. The selection is made in order of appearance in the set, i.e., the six sentences from the top category are the first hate speech and non-hate speech sentences encountered when the set is sorted by highest to lowest likelihood of containing hate speech. The opposite holds for the bottom category.

Hate Speech, Top 5% (Supplemental Material, tab ‘Sentence top 5%’)

1. for [the patriarchy] is what took us to space you just want build roads build roads it is what build the (line 2)
2. because Trump said pussy and they were fine with that grab [them] by the pussy but never mind about what rap (line 3)
3. conditions in black ghettos but what caused the ghettos here’s their answer white society is deeply implicated in the ghetto white (line 4)

Not Hate Speech, Top 5% (Supplemental Material, tab ‘Sentence top 5%’)

4. is a mosque don’t have any idea you want to guess an animal your basic bitch who is the vice president (line 8 (semi-duplicate on line 15))
5. ginger here come up here come up here come on stand here come on stand here anymore Ginger’s no wages limit (line 22)
6. 1978 a country that starts with a you utopia you went full retard man never go full retard what do you (line 35)

Hate Speech, Bottom 5% (Supplemental Material, tab ‘Sentence bottom 5%’)

7. there are deliberate policies to make us a minority anywhere we live policies intended to destroy us as a whole as (line 92)
8. can replace them (Jewish journalist) wrote America is tearing itself apart as an embittered quite conservative minority clings to power terrified at (line 95)
9. rights) activists who support everything that weakens the nation-state. This Western mindset and this activist network is perhaps best represented by [George Soros] (line 98)

Not Hate Speech, Bottom 5% (Supplemental Material, tab ‘Sentence bottom 5%’)

10. a trump presidency might signal a sea change with Brexit happening all the anti-globalist anti-globalization movements the sort of populist conservative (line 2)
11. conservatives going progressives and progressive go and conservative talk shows I think it’d be really interesting I was a bit annoyed (line 3)
12. Qatari government is so morally upright and ethical now you can bring on all sorts of conservative writers conservative people in (line 4)

Interesting to note here is that all the hate speech designated sentences or context fragments from the bottom set invoke the great replacement, with the latter two connecting with explicit antisemitism in their broader context. The great replacement hypothesis refers to the idea that native Western populations are being intentionally replaced by racial others, chiefly Muslims, a process often assumed to be the effort of a Jewish conspiracy (Betz, 2018).

Whereas the great replacement hypothesis of the alt-right discourse is clearly framed as support for purported victims of an ongoing genocide and only designated as hate speech precisely in relation to its generally accepted and quite implicit connections to antisemitic organization, it is perhaps not surprising that allusions to replacement will generally not be flagged as forms of hate speech.

Sentence three invites a clear example of the contextual analysis behind the ‘proximate’ designation. This fragment connects to American Renaissance’s themes of white supremacy in its framework (the actual implication is that it’s absurd that ‘white society’ has anything to do with the emergence of ‘ghettos’ and that these are a fruit of inherent racial inferiority).

Comparison between removed and unaffected video clips

Of the top-scoring videos, only two out of 14 (14.2%) were removed from YouTube after the material was collected. Both (100%) were designated as proximate hate speech. Of the remaining videos, four (33.3%) were designated as hate speech, two (16.6%) as proximately hate speech, and four (33.3%) as not containing hate speech.

Of the lowest-scoring videos, eight out of 14 (57.1%) were removed. Three of these (37.5%) were found to be proximate hate speech. Of the remaining six, only one (16.6%) was designated as proximate hate speech, and five (83.3%) was found not to contain hate speech.

By a quick assessment, if most of the videos removed after our data gathering were targeted due to perceived problematic content, it seems that our model is validated by the moderation policies of the platform. However, the significant gap between the number of videos removed in the respective categories is counter intuitive. Due to the small numbers involved, and the wide variety of potential factors behind channel or video removal, little can be concluded from this gap.

The number of views of the videos in question surprisingly stands in a clearly negative rather than positive relation to a video being removed. Almost all the removed videos in the top category had a significantly lower view count at the time of our data collection than the videos not removed, whereas one immediate assumption would be that higher view counts would increase negative attention and thus the likelihood of removal.

Thematically speaking there seems to be little difference between the two categories. Both the removed and remaining videos of the 10% top category engage with issues of mass immigration to Europe, feminism, gender issues, Islam, and race – all in approximately equal measure. Furthermore, several titles of the unaffected videos (‘The rape of Europe,’ ‘The Islamic state of Sweden’) are more obviously inflammatory than many of the removed clips (e.g., ‘Did Trump just save Western civilization?’ and ‘Response to Contrapoints on Degeneracy’).

Conclusions

Our purposes with this paper were to 1) transparently account for the steps needed when utilizing automatic hate speech identification, including key challenges in the process, while in relation to alt-right hate speech specifically, and (2) to identify the challenges in identifying hate speech in popular alt-right YouTube videos.

Our study addresses the issue of the automatic identification of contentious speech acts with a particular focus on the high-context character of the meanings conveyed. In this approach, which combines a quantitative and a qualitative analysis, our study is novel since previous research has combined qualitative analyses with distant reading. The results obtained from this approach, not least the high false negative rates which became obvious through close reading, clearly indicate that effective and consistent identification of hate speech communication necessitates qualitative interventions by human reviewers to avoid arbitrary or misleading applications. Indeed, the comparatively low accuracy of the filters determined through our qualitative review of the automatic flagging implies possible limitations of studies which claim a consistently high accuracy of automatic identification, e.g., data chiefly characterised by explicit discourses, or the omission of high-context communications in favour of a binary reading based on keywords.

This commonplace binary approach tends to force the rationale for designating content as hate speech to be self-contained within the literal meaning conveyed. Otherwise, it will entail an arbitrary and misleading designation of high-context content as explicit hate speech in and of itself. Even studies which explicitly engage with this issue risk falling into this trap. Paasch-Colberg et al. (2021) make a point of going beyond the ‘hate/no-hate’ dichotomy and provide a useful analysis of various possible characteristics and rhetorical strategies of hate speech communications. Nevertheless, they employ a binary classification which requires the problematic signification to be self-contained within a narrowly delimited act of communication for the requisite of hate speech to be fulfilled.

A simple context-sensitive qualitative approach like ours can remedy this by bringing into focus the indirect character of many of these communications which also will tend to characterise a discursive landscape where moderation and censorship is intensifying. This, incidentally, also precludes much of the value of purely automatic approaches.

There are structural impediments which render effective unsupported automatic identification of high-context communications difficult in principle. Automatic identification or moderation cannot account for an evolving, complex context of often indirect signification, which is hardly feasible in practice even with much more advanced algorithmic systems.

In general, this study in detail exemplifies a process whereby automatic hate speech identification could be utilised effectively. We see that several methodological steps are needed to operate in concert for the outcome to be useful, with both highly technical quantitative processing and traditional qualitative analysis being vital to achieve meaningful results.

With particular regard to the alt-right YouTube material, the main challenge in terms of detection and precise identification of hate speech relates to the often tacit and indirect framing. Identification demands orientation in the broader discursive context and the adaptation towards indirect expressions renders moderation and suppression both ethically and legally precarious. In concert with this finding, we saw significantly more material being automatically classified as hate speech in the comparatively private and unmoderated far-right Stormfront forum in comparison with the YouTube material, yet here also, the false negative rate was high (see Stormfront, n.d.).

Limitations and future directions

A challenge for the study was that the YouTube material consists of proficient speakers’ text where the avoidance of hate speech is a priority. In voiding suppression or banning from the medium, they use indirect and contextually dependent forms of hate speech, which is considerably more difficult to detect than the more overt forms of hate speech found on some discussion fora with less stringent rules of expression. Consequently, it is unsurprising that the accuracy of the model was not very high. At the same time, it is precisely these types of narratives which use concealment techniques, rhetoric, and indirect hate speech that we need to be able to better identify in contemporary online communication.

Natural Language Processing is a field that is developing quickly, and new methods and datasets continue to appear. In this study we have limited ourselves to a relatively small corpus and simple methods in part to demonstrate what can be done without a complicated or state of the art setup. In the future we hope to see more studies that combine the quantitative methodologies from Data Science with traditional hermeneutic analyses.

Finally, the relationship of our findings to Overton window issues is important. A comparison of our dated material (mainly 2015–2018) to both earlier and later sets, with an eye to the political background discourse, would provide data towards ascertaining the correlations between efforts toward discursive subtlety and both the creators’ assumptions of popular reception and the character of acceptable discourses, and the level and character of actual suppression.

Data availability statement

Top and bottom sentences and paragraphs with unique identifiers calculated from YouTube transcriptions. Supplemental data for this article can be requested from mailto:mika.hietanen@kom.lu.se.

References

Adadi A: A survey on data-efficient algorithms in big data era. J. Big Data. 2021; 8(24): 1–54. Publisher Full Text
Alonso P, Saini R, Kovacs G: TheNorth at SemEval-2020 Task 12: Hate speech detection using RoBERTa. Proceedings of the Fourteenth Workshop on Semantic Evaluation. 2020, December; pp. 2197–2202. Publisher Full Text
Badjatiya P, Gupta S, Gupta M, et al.: Deep Learning for Hate Speech Detection in Tweets. Proceedings of the 26th international conference on World Wide Web companion. 2017, April; pp. 759–760. Publisher Full Text
Basile V: Hateval Annotation Guidelines.Github.; n.d.Reference Source
Basile V, Bosco C, Fersini E, et al.: SemEval-2019 Task 5. Multilingual Detection of Hate Speech Against Immigrants and Women in Twitter. Proceedings of the 13th International Workshop on Semantic Evaluation, June 6–7, 2019, Minneapolis, MN, USA. Association for Computational Linguistics. 2019; pp. 54–63. Reference Source
Betz H-G: The Radical Right and Populism.Rydgren J, editor. The Oxford Handbook of the Radical Right. Oxford University Press; 2018; Vol. 1. : pp. 86–104. Publisher Full Text
Biagioni R: Sentiment Analysis. The SenticNet Sentiment Lexicon. Exploring Semantic Richness in Multi-Word Concepts. SpringerBriefs in Cognitive Computation. Springer; 2016; 4. : pp. 7–16. Publisher Full Text
Breiman L: Random forests. Mach. Learn. 2001; 45(1): 5–32. Publisher Full Text
Brummett BS: Techniques of Close Reading. SAGE; 2010.
Cobbe J: Algorithmic censorship by social platforms Power and resistance. Philos. Technol. 2021; 34(4): 739–766. Publisher Full Text
Costello M, Hawdon J: Hate Speech in Online Spaces. The Palgrave Handbook of International Cybercrime and Cyberdeviance. Cham: Palgrave Macmillan; 2019. Publisher Full Text
Davidson T, Warmsley D, Macy M, et al.: Automated hate speech detection and the problem of offensive language. Proceedings of the International AAAI Conference on Web and Social Media. 2017; 11(1): 512–515. Publisher Full Text
Dowlagar S, Mamidi R: Hasocone@ fire-hasoc2020. Using BERT and multilingual BERT models for hate speech detection. arXiv:2101.09007.2021. Publisher Full Text
d’Sa AG, Illina I, Fohr D: BERT and fastText Embeddings for Automatic Detection of Toxic Speech. SIIE 2020. Information Systems and Economic Intelligence. International Multi-Conference on: ‘Organization of Knowledge and Advanced Technologies’ (OCTA), February, 2020, Tunis, Tunisia. hal-02448197v2. 2020, February. Reference Source
Ellul J: Propaganda: The Formation of Men’s Attitudes. Vintage Books; 1973.
European Commission: Council of the European Union 12522/19. Information note. Assessment of the Code of Conduct on Hate Speech online.2019. https://commission.europa.eu/strategy-and-policy/policies/justice-and-fundamental-rights/combatting-discrimination/racism-and-xenophobia/eu-code-conduct-countering-illegal-hate-speech-online_en
Farber PL: Mixing Races. From Scientific Racism to Modern Evolutionary Ideas. Johns Hopkins University Press; 2011. Publisher Full Text
Farrell T, Fernandez M, Novotny J, et al.: Exploring misogyny across the manosphere in reddit. WebSci’19. Proceedings of the 10th ACM Conference on Web Science. Boston, MA, USA, June 30–July 3, 2019. Association for Computing Machinery. 2019, June; pp. 87–96. Publisher Full Text
Fersini E, Nozza E, Rosso P: Overview of the Evalita 2018 Task on Automatic Misogyny Identification (AMI). EVALITA Evaluation of NLP and Speech Tools for Italian. Proceedings of the Final Workshop 12–13 December 2018, Naples. Accademia University Press. 2019; pp. 59–66. Publisher Full Text
García-Pablos A, Perez N: Hate Speech Dataset from a White Supremacy Forum. GitHub; n.d.Reference Source
Gitari ND, Zuping Z, Damien H, et al.: A lexicon-based approach for hate speech detection. Int. J. Multimedia Ubiquitous Eng. 2015; 10(4): 215–230. Publisher Full Text
Hern A: Facebook, YouTube, Twitter and Microsoft sign EU hate speech code. The Guardian News. 2016, May 31. https://www.theguardian.com/technology/2016/may/31/facebook-youtube-twitter-microsoft-eu-hate-speech-code
Hern M: YouTube bans David Duke and other US far-right users. News, The Guardian. 2020, June 30. Reference Source
Hietanen M, Eddebo J: Towards a Definition of Hate Speech. With a Focus on Online Contexts. J. Commun. Inq. 2023; 47(4): 440–458. Publisher Full Text
Hochreiter S, Schmidhuber J: Long short-term memory. Neural Comput. 1997; 9(8): 1735–1780. Publisher Full Text
Kelly P: History Time. [Channel]. YouTube. n.d.Reference Source
Kotu V, Deshpande B: Data science. Concepts and Practice. 2nd ed.Morgan Kaufmann; 2018.
LeCun Y, Boser B, Denker J, et al.: Handwritten Digit Recognition with a Back-Propagation Network., Advances in neural information processing systems.1990; 2: 396–404.
Lewis R: Alternative Influence: Broadcasting the Reactionary Right on YouTube. Data Soc. 2018. Reference Source
Manning CD, Raghavan P, Schütze H: Introduction to information retrieval. Cambridge University Press; 2008; pp. 234–265. Ch. 13. Publisher Full Text
Maslej-Krešňáková V, Sarnovský M, Butka P, et al.: Comparison of Deep Learning Models and Various Text Pre-Processing Techniques for the Toxic Comments Classification. Appl. Sci. 2020; 10(23): 8631. Publisher Full Text
Menard SW: Logistic regression. From introductory to advanced concepts and applications. SAGE; 2010. Publisher Full Text
Nockleby JT: Hate speech.Levy LW, Karst KL, editors. Encyclopedia of the American Constitution. 2nd ed. vol. 3. . Macmillan Reference USA; 2000; pp. 1277–1279.
Ottoni R, Cunha E, Magno G, et al.: Analyzing Right-wing YouTube Channels: Hate, Violence and Discrimination. Proceedings of the 10th ACM Conference on Web Science. [cs.SI]. 2018.arXiv:1804.04096v1
Paasch-Colberg S, Strippel C, Trebbe J, et al.: From Insult to Hate Speech. Mapping Offensive Language in German User Comments on Immigration. Media Commun. 2021; 9(1): 171–180. Publisher Full Text
Rosen A, Ihara I: Giving you more characters to express yourself. Blog. 2017, September 26. Reference Source
Rydgren J: The Radical Right: An Introduction.Rydgren J, editor. The Oxford Handbook of the Radical Right. Oxford University Press; 2018; pp. 1–14. Publisher Full Text
Schmidt A, Wiegand M: A survey on hate speech detection using natural language processing. Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media, April 3, 2017, Valencia, Spain, Association for Computational Linguistics. 2019; pp. 1–10. Publisher Full Text
Stormfront: [Online Forum]. White Nationalist Community.n.d.Reference Source
Vrysis L, Vryzas N, Kotsakis R, et al.: A Web Interface for Analyzing Hate Speech. Future Internet. 2021; 13(3): 80. (18 pp.). Publisher Full Text
Zampieri M, Malmasi S, Nakov P, et al.: Predicting the type and target of offensive posts in social media. arXiv:1902.09666.2019. Publisher Full Text
van der Ploeg T , Austin PC, Steyerberg EW: Modern modelling techniques are data hungry: a simulation study for predicting dichotomous endpoints. BMC Med. Res. Methodol. 2014; 14(137): 1–13. PubMed Abstract | Publisher Full Text | Free Full Text
Wang A, Hoang CDV, Kan MY: Perspectives on crowdsourcing annotations for natural language processing. Lang. Resour. Eval. 2013; 47: 9–31. Publisher Full Text
Wulczyn E, Thain N, Dixon L: Ex machina: Personal attacks seen at scale. Iben Proceedings of the 26th international conference on world wide web. arXiv:1610.08914. 2017, April; pp. 1391–1399. Publisher Full Text
YouTube: Our ongoing work to tackle hate. YouTube Official Blog; 2019, June 5. Reference Source

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 23 Apr 2024

Author details Author details

¹ Centre for Multidisciplinary Research on Religion and Society (CRS), Uppsala University, Uppsala, Sweden
² Department of Communication and Media, Lund University, Lund, 221 00, Sweden
³ Department of Arts and Cultural Sciences, Lund University, Lund, Sweden

Johan Eddebo
Roles: Conceptualization, Data Curation, Investigation, Methodology, Writing – Original Draft Preparation

Mika Hietanen
Roles: Conceptualization, Project Administration, Resources, Writing – Original Draft Preparation, Writing – Review & Editing

Mathias Johansson
Roles: Data Curation, Formal Analysis, Methodology, Writing – Original Draft Preparation

Competing interests

No competing interests were disclosed.

Grant information

Johan Eddebo’s contribution was financed by the Marianne and Marcus Wallenberg Foundation and Mathias Johansson’s by the research platform DigitalHistory@Lund, a collaboration between the Section for Media History, the Lund Centre for the History of Knowledge (LUCK), and the Lund University Humanities Lab (https://projekt.ht.lu.se/digitalhistory).
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Article Versions (1)

version 1

Published: 23 Apr 2024, 13:328

https://doi.org/10.12688/f1000research.147107.1

Copyright

© 2024 Eddebo J et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download

Export To

metrics

	Views	Downloads
F1000Research	-	-
PubMed Central Data from PMC are received and updated monthly.	-	-

Citations

0

SEE MORE DETAILS

CITE

how to cite this article

Eddebo J, Hietanen M and Johansson M. Automatic Identification of Hate Speech – A Case-Study of alt-Right YouTube Videos [version 1; peer review: 3 approved, 1 approved with reservations]. F1000Research 2024, 13:328 (https://doi.org/10.12688/f1000research.147107.1)

NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.

track

receive updates on this article

Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?

Key to Reviewer Statuses VIEW HIDE

ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested

Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.

Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions

Version 1

VERSION 1

PUBLISHED 23 Apr 2024

Views

7

Reviewer Report 20 Aug 2024

Surabhi Adhikari, Columbia University, New York, USA

Approved

https://doi.org/10.5256/f1000research.161256.r309593

This article describes a study that aimed to automatically identify hate speech in YouTube videos associated with the alt-right movement. The researchers developed a machine learning model to classify text as hate speech or not, using a combination of three ... Continue reading

This article describes a study that aimed to automatically identify hate speech in YouTube videos associated with the alt-right movement. The researchers developed a machine learning model to classify text as hate speech or not, using a combination of three existing annotated corpora of tweets. They then applied this model to transcripts from selected YouTube channels, comparing results from alt-right channels to a control history channel and a white supremacist forum (Stormfront).
The methodology involved several steps:

Gathering and combining annotated corpora
Text preprocessing
Training a Recurrent Neural Network (RNN) classifier
Applying the classifier to YouTube transcripts
Qualitative close reading of select parts

The researchers segmented the YouTube transcripts into synthetic sentences and paragraphs to apply the classifier. They found that the model identified more hate speech in the alt-right content compared to the control, but less than in the Stormfront forum. They also conducted a qualitative analysis to evaluate the model's performance and examine the context of identified hate speech.

Study Design and Technical Soundness: The study design appears appropriate for an exploratory investigation into automatically identifying hate speech. The researchers acknowledge the experimental nature of their approach and its limitations. They use a combination of quantitative and qualitative methods, which strengthens the overall design.

The technical aspects of the machine learning approach seem sound. The researchers experimented with multiple models before settling on an RNN, used proper train-test splits to prevent overfitting, and employed hyperparameter tuning. Their approach to segmenting the YouTube transcripts into synthetic sentences and paragraphs is creative and well-reasoned.

The study design is appropriate and the work is technically sound. The researchers employed a well-structured approach combining machine learning techniques with qualitative analysis. Their use of multiple datasets, including a control group and a high-hate speech forum for comparison, strengthens the design. The technical aspects, including the development and application of the RNN classifier, are implemented with care and attention to best practices in machine learning.

The paper provides sufficient details of methods and analysis to allow replication by others. The researchers offer clear explanations of their data sources, preprocessing steps, model training process, and application to YouTube transcripts. They also describe their approach to segmenting transcripts into synthetic sentences and paragraphs. While some additional details could enhance replicability, the core methodological components are well-documented.

The statistical analysis and its interpretation are appropriate for this study. The researchers present relevant descriptive statistics of the model's predictions across different corpora. Their interpretation of these results is measured and takes into account the exploratory nature of the study. The analysis aligns well with the study's objectives and provides a solid foundation for understanding the performance of their hate speech detection model across different contexts.

Is the work clearly and accurately presented and does it cite the current literature?

Yes
Is the study design appropriate and is the work technically sound?

Yes
Are sufficient details of methods and analysis provided to allow replication by others?

Yes
If applicable, is the statistical analysis and its interpretation appropriate?

Yes
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

Yes

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Natural Language Processing

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

CITE

Report a concern

Respond or Comment

Views

14

Reviewer Report 12 Aug 2024

Somaiyeh Dehghan, Sabanci University,, Istanbul, Turkey

Approved with Reservations

https://doi.org/10.5256/f1000research.161256.r298551

Summary: This paper addresses the critical task of automatically detecting hate speech, focusing on popular alt-right YouTube channels that utilize moderation-dodging techniques. The study is highly relevant given the increasing prevalence of hate speech online and the sophisticated methods used ... Continue reading

Summary: This paper addresses the critical task of automatically detecting hate speech, focusing on popular alt-right YouTube channels that utilize moderation-dodging techniques. The study is highly relevant given the increasing prevalence of hate speech online and the sophisticated methods used by certain groups to evade detection. The authors aim to contribute to the field by exploring binary classification (hate speech vs. non-hate speech) using a substantial dataset of 42,430 tweets derived from three publicly available, annotated hate speech corpora. The study employs both traditional and neural network-based classification methods, implemented through SciKitLearn and TensorFlow, respectively. The authors report their best results with an RNN (Recurrent Neural Network) classifier model, which is a notable finding given the complexity and context-dependency of hate speech.

Relevance: The relevance of this study lies in its potential to improve the accuracy and robustness of hate speech detection systems, particularly in contexts where language and content are manipulated to avoid moderation. The work has significant implications for social media platforms, content moderators, and policymakers concerned with curbing the spread of hate speech while respecting free speech.

Major Points:

Lack of Clarity in Data Selection Criteria: The authors mention that their training data was selected from three publicly available datasets but fail to provide any criteria or justification for this selection. It is crucial to understand why these particular datasets were chosen over others and how representative they are of the broader spectrum of hate speech. Without this information, the study's findings may lack generalizability.
Unclear Test Data Description: The paper does not adequately describe the test data used to evaluate the models. A clear understanding of the test data is essential for assessing the validity and reliability of the results. The authors should specify whether the test data comes from the same datasets as the training data or if it was sourced independently. Additionally, details on how the test data was selected, including any preprocessing steps, should be provided.
Insufficient Explanation of Methods: While the authors list the classification methods used (e.g., random forests, CNNs, RNNs), they do not provide sufficient details about the implementation of these methods. This is particularly important for the neural network models, where the architecture, hyperparameters, loss functions, and optimization strategies are critical to understanding the model's performance. Including this information would allow other researchers to replicate the study and assess its robustness.
Lack of Comparative Analysis: The results of different classification methods (e.g., random forest, CNN, RNN) are not compared in detail. A comparative analysis is essential to determine which method performs best under specific conditions and why. This analysis could provide insights into the strengths and weaknesses of each approach and guide future research in this area.

Minor Points:

Moderation-Dodging Techniques: The paper briefly mentions that the alt-right YouTube channels employ moderation-dodging techniques, but it does not explore this aspect in depth. A more detailed discussion on how these techniques impact the detection of hate speech could enhance the paper's contribution.
Presentation of Results: The results section could benefit from a clearer presentation of the performance metrics for each model. Tables or graphs that compare the accuracy, precision, recall, and F1-scores of the different models would make the findings more accessible and easier to interpret.
Constructive Feedback: The study addresses an important problem and makes a valuable contribution to the field. However, to enhance its impact, the authors should provide more detailed explanations of their data selection process, test data, and the specifics of their classification models. Additionally, a thorough comparative analysis of the results would strengthen the study's conclusions and offer more actionable insights for future research. By addressing these points, the paper could serve as a more comprehensive and reliable resource for those working on hate speech detection and related NLP tasks.

Is the work clearly and accurately presented and does it cite the current literature?

Yes
Is the study design appropriate and is the work technically sound?

Yes
Are sufficient details of methods and analysis provided to allow replication by others?

Partly
If applicable, is the statistical analysis and its interpretation appropriate?

Yes
Are all the source data underlying the results available to ensure full reproducibility?

Partly
Are the conclusions drawn adequately supported by the results?

Yes

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: I am an NLP specialist currently working on a hate speech detection project. My research focuses on developing automated methods to identify and analyze online hate speech and target detection. My team and I have encountered similar issues and concerns to those addressed in this study. Our work aims to enhance the understanding and detection of hate speech to contribute to a safer and more inclusive digital environment.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

CITE

Report a concern

Respond or Comment

Views

9

Reviewer Report 10 Aug 2024

Tommi Buder-Gröndahl, University of Helsinki, Helsinki, Finland

Approved

https://doi.org/10.5256/f1000research.161256.r309594

This paper analyses hate speech (HS) in alt-right Youtube videos, by combining automatic and manual methodologies. The authors train an RNN classifier on prior manually labeled HS data from tweets, apply it to video transcripts from Youtube (synthetically divided into ... Continue reading

This paper analyses hate speech (HS) in alt-right Youtube videos, by combining automatic and manual methodologies. The authors train an RNN classifier on prior manually labeled HS data from tweets, apply it to video transcripts from Youtube (synthetically divided into pseudo-sentences and pseudo-paragraphs), and inspect the results both quantitatively and qualitatively. Their main take-home message is that alt-right videos utilize indirect means of expressing HS on a higher contextual level than sentences. This constitutes a serious challenge for automatic classification schemes that are built for finding explicit HS in narrow contexts.

Two main concerns arise based on the study's methodology and quantitative results. First, the authors chose an RNN classifier, after experimenting with simple machine learning techniques (random forests, logistic regression, and naïve Bayes) and CNNs. However, for many years already, the Transformer architecture has been the leading deep learning architecture in NLP – specifically, large pre-trained Transformer-based large language models (LLMs) such as BERT or GPT. It is therefore unclear (1) why the authors did not experiment on Transformers, and (2) why transfer learning on an LLM was not considered. The RNN does not seem to represent the contemporary state-of-the-art in text classification.

Second, the numerical results (Table 4) are somewhat hard to interpret: it is unclear what they actually tell us about HS on alt-right videos, Stormfront, or the control History channel. Since we lack an established ground-truth on which parts of the alt-right videos or the History channel are genuine HS, it is hard to say if the results tell us more about the frequency of HS or about deficiencies in the RNN classifier.

I do not believe that either concern is ultimately detrimental to the paper, since its main value lies in the close qualitative reading of HS. That being said, there is a predictable response to the authors' conclusions about the limits of automatic HS detection: high-level contextual information might indeed be unavailable for simple machine learning systems or even RNNs trained from scratch; but pre-trained LLMs might access it significantly better. Even though I do not believe that this reply is sufficient to remove the authors' worries, it deserves more consideration in future work.

In terms of strengths, the most notable contribution of the paper is the close reading of video transcript results. As the authors note, prior work on automatic HS detection has tended to limit qualitative methods to distant reading. While close reading cannot achieve the same level of coverage, it is indispensable for properly understanding HS. The division of the analysis into three levels (micro, macro, meso) is also an important addition, especially given the contrasting results between levels. The hypothesis that alt-right influencers lean less on explicit HS and more on indirect expressions is intuitively valid, but the further empirical support provided for it in the paper is clearly valuable.

Is the work clearly and accurately presented and does it cite the current literature?

Yes
Is the study design appropriate and is the work technically sound?

Yes
Are sufficient details of methods and analysis provided to allow replication by others?

Partly
If applicable, is the statistical analysis and its interpretation appropriate?

Partly
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

Yes

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: My current research concerns the interpretation of LLMs (and, more generally, any language-processing cognitive systems) in terms of linguistic and semantic theory. I have also worked on HS detection as well as the application of NLP in security-sensitive settings more generally.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

CITE

Report a concern

Respond or Comment

Views

18

Reviewer Report 09 Jul 2024

Richard Sear, The George Washington University,, Washington, DC, USA

Approved

https://doi.org/10.5256/f1000research.161256.r294951

Summary
This study presents 1) an automated hate speech detection method and 2) challenges in identifying hate speech used in popular moderation-dodging alt-right YouTube channels. The methodology of developing an automated classifier builds on the well-established field and uses ... Continue reading

Summary
This study presents 1) an automated hate speech detection method and 2) challenges in identifying hate speech used in popular moderation-dodging alt-right YouTube channels. The methodology of developing an automated classifier builds on the well-established field and uses pre-existing annotated corpora of tweets as training/test data for a standard NLP classification task pipeline. The final model selected is high-performing and uses an RNN architecture.

To study the YouTube transcripts, the authors construct synthetic sentences and paragraphs out of "atoms" (7-word chunks) using a sliding window technique. A length of 7 was decided based on manual experimentation and inspection. Sentence scores were then aggregated into video-level scores.

YouTube audio transcripts are taken from the 15 most popular clips (by view count) from the nine most popular channels (by subscriber count) in a set of channels filtered from Data & Society's Alternative Influence 2018 report using various established hate definitions. As controls, text from a history YouTube channel and a white supremacist forum were also analyzed.

During qualitative close reading, researchers identified "proximate hate speech" as well as direct hate speech. Researchers examined text at the sentence level, paragraph level, and full video level. The upshot is that this analysis found relatively little explicit hate speech (likely due to awareness of moderators) and increasing proportions of proximate hate speech with more complex analysis levels (sentence to paragraph to video). The authors conclude that it is important to go beyond a hate/no-hate dichotomy or classifying content for hate speech without its context. Future directions for this work could include improved datasets (i.e. more up to date) and more sophisticated models.

Concerns
The authors identify three openly available datasets (HSOL, HatEval, OLID) they used. They then selected a subset of these, but identified the criteria used to do so, which allows me to recreate their training dataset. However, neither the Tensorflow code used to train the model nor the model itself are linked, with the only specification for the final model being that it used an "RNN architecture". This is too generic for me to be able to accurately replicate the authors' model but could easily be solved by publishing the training code/final hyperparameters.

On a broader level, I was a little concerned to see Twitter data being used to train a classifier that was ultimately used for YouTube transcripts (a very different writing style), but the authors appear to be aware of this mismatch and make several notes about ways their methodology accounts for it. As the authors note in their "Future Work" section, new data regularly becomes available which could solve this issue.

Strengths
The study is thorough and identifies many widespread issues with the current state of hate speech detection, as well as showcases useful insights that come from the authors' detailed analysis of the data. I particularly liked the incorporation of multiple levels of close reading to both validate the automated results and gain a human understanding of how real hate speech appears on YouTube. Though this level of detailed analysis is perhaps not feasible for the sheer volume of potentially hateful online content that exists today, it shows the valuable insights that, when teams are able to do so, come from applying multiple methods in concert. I also thought the identification of "proximate" hate speech was both spot-on and extremely important for future work. As the authors note, a simple binary classification of hate speech will often fall short and lose non-literal or contextual information that is essential to analyzing how hate speech spreads online.

Additionally, authors were communicative and forthcoming when I requested access to their dataset on Zenodo.

Is the work clearly and accurately presented and does it cite the current literature?

Yes
Is the study design appropriate and is the work technically sound?

Yes
Are sufficient details of methods and analysis provided to allow replication by others?

Partly
If applicable, is the statistical analysis and its interpretation appropriate?

Yes
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

Yes

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: My research focuses on online hate and extremist speech. I study automated detection methods using NLP as well as the mechanisms of how it spreads between online communities at scale. My team has run into some of the same issues and concerns the authors address in this study.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

CITE

Report a concern

Respond or Comment

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 23 Apr 2024

Open Peer Review

Reviewer Status

Reviewer Reports

	Invited Reviewers
	1	2	3	4
Version 1 23 Apr 24	read	read	read	read

Richard Sear, The George Washington University,, Washington, DC, USA
Tommi Buder-Gröndahl, University of Helsinki, Helsinki, Finland
Somaiyeh Dehghan, Sabanci University,, Istanbul, Turkey
Surabhi Adhikari, Columbia University, New York, USA

Comments on this article

All Comments(0)

Add a comment

Sign up for content alerts

Browse by related subjects

Back to all reports

Reviewer Report

7 Views

20 Aug 2024 | for Version 1

Surabhi Adhikari, Columbia University, New York, USA

7 Views Cite this report Responses(0)

Approved

This article describes a study that aimed to automatically identify hate speech in YouTube videos associated with the alt-right movement. The researchers developed a machine learning model to classify text as hate speech or not, using a combination of three existing annotated corpora of tweets. They then applied this model to transcripts from selected YouTube channels, comparing results from alt-right channels to a control history channel and a white supremacist forum (Stormfront).
The methodology involved several steps:

Gathering and combining annotated corpora
Text preprocessing
Training a Recurrent Neural Network (RNN) classifier
Applying the classifier to YouTube transcripts
Qualitative close reading of select parts

The researchers segmented the YouTube transcripts into synthetic sentences and paragraphs to apply the classifier. They found that the model identified more hate speech in the alt-right content compared to the control, but less than in the Stormfront forum. They also conducted a qualitative analysis to evaluate the model's performance and examine the context of identified hate speech.

Study Design and Technical Soundness: The study design appears appropriate for an exploratory investigation into automatically identifying hate speech. The researchers acknowledge the experimental nature of their approach and its limitations. They use a combination of quantitative and qualitative methods, which strengthens the overall design.

The technical aspects of the machine learning approach seem sound. The researchers experimented with multiple models before settling on an RNN, used proper train-test splits to prevent overfitting, and employed hyperparameter tuning. Their approach to segmenting the YouTube transcripts into synthetic sentences and paragraphs is creative and well-reasoned.

The study design is appropriate and the work is technically sound. The researchers employed a well-structured approach combining machine learning techniques with qualitative analysis. Their use of multiple datasets, including a control group and a high-hate speech forum for comparison, strengthens the design. The technical aspects, including the development and application of the RNN classifier, are implemented with care and attention to best practices in machine learning.

The paper provides sufficient details of methods and analysis to allow replication by others. The researchers offer clear explanations of their data sources, preprocessing steps, model training process, and application to YouTube transcripts. They also describe their approach to segmenting transcripts into synthetic sentences and paragraphs. While some additional details could enhance replicability, the core methodological components are well-documented.

The statistical analysis and its interpretation are appropriate for this study. The researchers present relevant descriptive statistics of the model's predictions across different corpora. Their interpretation of these results is measured and takes into account the exploratory nature of the study. The analysis aligns well with the study's objectives and provides a solid foundation for understanding the performance of their hate speech detection model across different contexts.

Is the work clearly and accurately presented and does it cite the current literature?

Yes
Is the study design appropriate and is the work technically sound?

Yes
Are sufficient details of methods and analysis provided to allow replication by others?

Yes
If applicable, is the statistical analysis and its interpretation appropriate?

Yes
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

Yes

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Natural Language Processing

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

14 Views

12 Aug 2024 | for Version 1

Somaiyeh Dehghan, Sabanci University,, Istanbul, Turkey

14 Views Cite this report Responses(0)

Approved With Reservations

Summary: This paper addresses the critical task of automatically detecting hate speech, focusing on popular alt-right YouTube channels that utilize moderation-dodging techniques. The study is highly relevant given the increasing prevalence of hate speech online and the sophisticated methods used by certain groups to evade detection. The authors aim to contribute to the field by exploring binary classification (hate speech vs. non-hate speech) using a substantial dataset of 42,430 tweets derived from three publicly available, annotated hate speech corpora. The study employs both traditional and neural network-based classification methods, implemented through SciKitLearn and TensorFlow, respectively. The authors report their best results with an RNN (Recurrent Neural Network) classifier model, which is a notable finding given the complexity and context-dependency of hate speech.

Relevance: The relevance of this study lies in its potential to improve the accuracy and robustness of hate speech detection systems, particularly in contexts where language and content are manipulated to avoid moderation. The work has significant implications for social media platforms, content moderators, and policymakers concerned with curbing the spread of hate speech while respecting free speech.

Major Points:

Lack of Clarity in Data Selection Criteria: The authors mention that their training data was selected from three publicly available datasets but fail to provide any criteria or justification for this selection. It is crucial to understand why these particular datasets were chosen over others and how representative they are of the broader spectrum of hate speech. Without this information, the study's findings may lack generalizability.
Unclear Test Data Description: The paper does not adequately describe the test data used to evaluate the models. A clear understanding of the test data is essential for assessing the validity and reliability of the results. The authors should specify whether the test data comes from the same datasets as the training data or if it was sourced independently. Additionally, details on how the test data was selected, including any preprocessing steps, should be provided.
Insufficient Explanation of Methods: While the authors list the classification methods used (e.g., random forests, CNNs, RNNs), they do not provide sufficient details about the implementation of these methods. This is particularly important for the neural network models, where the architecture, hyperparameters, loss functions, and optimization strategies are critical to understanding the model's performance. Including this information would allow other researchers to replicate the study and assess its robustness.
Lack of Comparative Analysis: The results of different classification methods (e.g., random forest, CNN, RNN) are not compared in detail. A comparative analysis is essential to determine which method performs best under specific conditions and why. This analysis could provide insights into the strengths and weaknesses of each approach and guide future research in this area.

Minor Points:

Moderation-Dodging Techniques: The paper briefly mentions that the alt-right YouTube channels employ moderation-dodging techniques, but it does not explore this aspect in depth. A more detailed discussion on how these techniques impact the detection of hate speech could enhance the paper's contribution.
Presentation of Results: The results section could benefit from a clearer presentation of the performance metrics for each model. Tables or graphs that compare the accuracy, precision, recall, and F1-scores of the different models would make the findings more accessible and easier to interpret.
Constructive Feedback: The study addresses an important problem and makes a valuable contribution to the field. However, to enhance its impact, the authors should provide more detailed explanations of their data selection process, test data, and the specifics of their classification models. Additionally, a thorough comparative analysis of the results would strengthen the study's conclusions and offer more actionable insights for future research. By addressing these points, the paper could serve as a more comprehensive and reliable resource for those working on hate speech detection and related NLP tasks.

Is the work clearly and accurately presented and does it cite the current literature?

Yes
Is the study design appropriate and is the work technically sound?

Yes
Are sufficient details of methods and analysis provided to allow replication by others?

Partly
If applicable, is the statistical analysis and its interpretation appropriate?

Yes
Are all the source data underlying the results available to ensure full reproducibility?

Partly
Are the conclusions drawn adequately supported by the results?

Yes

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

I am an NLP specialist currently working on a hate speech detection project. My research focuses on developing automated methods to identify and analyze online hate speech and target detection. My team and I have encountered similar issues and concerns to those addressed in this study. Our work aims to enhance the understanding and detection of hate speech to contribute to a safer and more inclusive digital environment.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

9 Views

10 Aug 2024 | for Version 1

Tommi Buder-Gröndahl, University of Helsinki, Helsinki, Finland

9 Views Cite this report Responses(0)

Approved

This paper analyses hate speech (HS) in alt-right Youtube videos, by combining automatic and manual methodologies. The authors train an RNN classifier on prior manually labeled HS data from tweets, apply it to video transcripts from Youtube (synthetically divided into pseudo-sentences and pseudo-paragraphs), and inspect the results both quantitatively and qualitatively. Their main take-home message is that alt-right videos utilize indirect means of expressing HS on a higher contextual level than sentences. This constitutes a serious challenge for automatic classification schemes that are built for finding explicit HS in narrow contexts.

Two main concerns arise based on the study's methodology and quantitative results. First, the authors chose an RNN classifier, after experimenting with simple machine learning techniques (random forests, logistic regression, and naïve Bayes) and CNNs. However, for many years already, the Transformer architecture has been the leading deep learning architecture in NLP – specifically, large pre-trained Transformer-based large language models (LLMs) such as BERT or GPT. It is therefore unclear (1) why the authors did not experiment on Transformers, and (2) why transfer learning on an LLM was not considered. The RNN does not seem to represent the contemporary state-of-the-art in text classification.

Second, the numerical results (Table 4) are somewhat hard to interpret: it is unclear what they actually tell us about HS on alt-right videos, Stormfront, or the control History channel. Since we lack an established ground-truth on which parts of the alt-right videos or the History channel are genuine HS, it is hard to say if the results tell us more about the frequency of HS or about deficiencies in the RNN classifier.

I do not believe that either concern is ultimately detrimental to the paper, since its main value lies in the close qualitative reading of HS. That being said, there is a predictable response to the authors' conclusions about the limits of automatic HS detection: high-level contextual information might indeed be unavailable for simple machine learning systems or even RNNs trained from scratch; but pre-trained LLMs might access it significantly better. Even though I do not believe that this reply is sufficient to remove the authors' worries, it deserves more consideration in future work.

In terms of strengths, the most notable contribution of the paper is the close reading of video transcript results. As the authors note, prior work on automatic HS detection has tended to limit qualitative methods to distant reading. While close reading cannot achieve the same level of coverage, it is indispensable for properly understanding HS. The division of the analysis into three levels (micro, macro, meso) is also an important addition, especially given the contrasting results between levels. The hypothesis that alt-right influencers lean less on explicit HS and more on indirect expressions is intuitively valid, but the further empirical support provided for it in the paper is clearly valuable.

Is the work clearly and accurately presented and does it cite the current literature?

Yes
Is the study design appropriate and is the work technically sound?

Yes
Are sufficient details of methods and analysis provided to allow replication by others?

Partly
If applicable, is the statistical analysis and its interpretation appropriate?

Partly
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

Yes

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

My current research concerns the interpretation of LLMs (and, more generally, any language-processing cognitive systems) in terms of linguistic and semantic theory. I have also worked on HS detection as well as the application of NLP in security-sensitive settings more generally.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

18 Views

09 Jul 2024 | for Version 1

Richard Sear, The George Washington University,, Washington, DC, USA

18 Views Cite this report Responses(0)

Approved

Summary
This study presents 1) an automated hate speech detection method and 2) challenges in identifying hate speech used in popular moderation-dodging alt-right YouTube channels. The methodology of developing an automated classifier builds on the well-established field and uses pre-existing annotated corpora of tweets as training/test data for a standard NLP classification task pipeline. The final model selected is high-performing and uses an RNN architecture.

To study the YouTube transcripts, the authors construct synthetic sentences and paragraphs out of "atoms" (7-word chunks) using a sliding window technique. A length of 7 was decided based on manual experimentation and inspection. Sentence scores were then aggregated into video-level scores.

YouTube audio transcripts are taken from the 15 most popular clips (by view count) from the nine most popular channels (by subscriber count) in a set of channels filtered from Data & Society's Alternative Influence 2018 report using various established hate definitions. As controls, text from a history YouTube channel and a white supremacist forum were also analyzed.

During qualitative close reading, researchers identified "proximate hate speech" as well as direct hate speech. Researchers examined text at the sentence level, paragraph level, and full video level. The upshot is that this analysis found relatively little explicit hate speech (likely due to awareness of moderators) and increasing proportions of proximate hate speech with more complex analysis levels (sentence to paragraph to video). The authors conclude that it is important to go beyond a hate/no-hate dichotomy or classifying content for hate speech without its context. Future directions for this work could include improved datasets (i.e. more up to date) and more sophisticated models.

Concerns
The authors identify three openly available datasets (HSOL, HatEval, OLID) they used. They then selected a subset of these, but identified the criteria used to do so, which allows me to recreate their training dataset. However, neither the Tensorflow code used to train the model nor the model itself are linked, with the only specification for the final model being that it used an "RNN architecture". This is too generic for me to be able to accurately replicate the authors' model but could easily be solved by publishing the training code/final hyperparameters.

On a broader level, I was a little concerned to see Twitter data being used to train a classifier that was ultimately used for YouTube transcripts (a very different writing style), but the authors appear to be aware of this mismatch and make several notes about ways their methodology accounts for it. As the authors note in their "Future Work" section, new data regularly becomes available which could solve this issue.

Strengths
The study is thorough and identifies many widespread issues with the current state of hate speech detection, as well as showcases useful insights that come from the authors' detailed analysis of the data. I particularly liked the incorporation of multiple levels of close reading to both validate the automated results and gain a human understanding of how real hate speech appears on YouTube. Though this level of detailed analysis is perhaps not feasible for the sheer volume of potentially hateful online content that exists today, it shows the valuable insights that, when teams are able to do so, come from applying multiple methods in concert. I also thought the identification of "proximate" hate speech was both spot-on and extremely important for future work. As the authors note, a simple binary classification of hate speech will often fall short and lose non-literal or contextual information that is essential to analyzing how hate speech spreads online.

Additionally, authors were communicative and forthcoming when I requested access to their dataset on Zenodo.

Is the work clearly and accurately presented and does it cite the current literature?

Yes
Is the study design appropriate and is the work technically sound?

Yes
Are sufficient details of methods and analysis provided to allow replication by others?

Partly
If applicable, is the statistical analysis and its interpretation appropriate?

Yes
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

Yes

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

My research focuses on online hate and extremist speech. I study automated detection methods using NLP as well as the mechanisms of how it spreads between online communities at scale. My team has run into some of the same issues and concerns the authors address in this study.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Respond to this report

Responses (0)

[1] Adadi A: A survey on data-efficient algorithms in big data era. J. Big Data. 2021; 8(24): 1–54. Publisher Full Text

[2] Alonso P, Saini R, Kovacs G: TheNorth at SemEval-2020 Task 12: Hate speech detection using RoBERTa. Proceedings of the Fourteenth Workshop on Semantic Evaluation. 2020, December; pp. 2197–2202. Publisher Full Text

[3] Badjatiya P, Gupta S, Gupta M, et al.: Deep Learning for Hate Speech Detection in Tweets. Proceedings of the 26th international conference on World Wide Web companion. 2017, April; pp. 759–760. Publisher Full Text

[4] Basile V: Hateval Annotation Guidelines.Github.; n.d.Reference Source

[5] Basile V, Bosco C, Fersini E, et al.: SemEval-2019 Task 5. Multilingual Detection of Hate Speech Against Immigrants and Women in Twitter. Proceedings of the 13th International Workshop on Semantic Evaluation, June 6–7, 2019, Minneapolis, MN, USA. Association for Computational Linguistics. 2019; pp. 54–63. Reference Source

[6] Betz H-G: The Radical Right and Populism.Rydgren J, editor. The Oxford Handbook of the Radical Right. Oxford University Press; 2018; Vol. 1. : pp. 86–104. Publisher Full Text

[7] Biagioni R: Sentiment Analysis. The SenticNet Sentiment Lexicon. Exploring Semantic Richness in Multi-Word Concepts. SpringerBriefs in Cognitive Computation. Springer; 2016; 4. : pp. 7–16. Publisher Full Text

[8] Breiman L: Random forests. Mach. Learn. 2001; 45(1): 5–32. Publisher Full Text

[9] Brummett BS: Techniques of Close Reading. SAGE; 2010.

[10] Cobbe J: Algorithmic censorship by social platforms Power and resistance. Philos. Technol. 2021; 34(4): 739–766. Publisher Full Text

[11] Costello M, Hawdon J: Hate Speech in Online Spaces. The Palgrave Handbook of International Cybercrime and Cyberdeviance. Cham: Palgrave Macmillan; 2019. Publisher Full Text

[12] Davidson T, Warmsley D, Macy M, et al.: Automated hate speech detection and the problem of offensive language. Proceedings of the International AAAI Conference on Web and Social Media. 2017; 11(1): 512–515. Publisher Full Text

[13] Dowlagar S, Mamidi R: Hasocone@ fire-hasoc2020. Using BERT and multilingual BERT models for hate speech detection. arXiv:2101.09007.2021. Publisher Full Text

[14] d’Sa AG, Illina I, Fohr D: BERT and fastText Embeddings for Automatic Detection of Toxic Speech. SIIE 2020. Information Systems and Economic Intelligence. International Multi-Conference on: ‘Organization of Knowledge and Advanced Technologies’ (OCTA), February, 2020, Tunis, Tunisia. hal-02448197v2. 2020, February. Reference Source

[15] Ellul J: Propaganda: The Formation of Men’s Attitudes. Vintage Books; 1973.

[16] European Commission: Council of the European Union 12522/19. Information note. Assessment of the Code of Conduct on Hate Speech online.2019. https://commission.europa.eu/strategy-and-policy/policies/justice-and-fundamental-rights/combatting-discrimination/racism-and-xenophobia/eu-code-conduct-countering-illegal-hate-speech-online_en

[17] Farber PL: Mixing Races. From Scientific Racism to Modern Evolutionary Ideas. Johns Hopkins University Press; 2011. Publisher Full Text

[18] Farrell T, Fernandez M, Novotny J, et al.: Exploring misogyny across the manosphere in reddit. WebSci’19. Proceedings of the 10th ACM Conference on Web Science. Boston, MA, USA, June 30–July 3, 2019. Association for Computing Machinery. 2019, June; pp. 87–96. Publisher Full Text

[19] Fersini E, Nozza E, Rosso P: Overview of the Evalita 2018 Task on Automatic Misogyny Identification (AMI). EVALITA Evaluation of NLP and Speech Tools for Italian. Proceedings of the Final Workshop 12–13 December 2018, Naples. Accademia University Press. 2019; pp. 59–66. Publisher Full Text

[20] García-Pablos A, Perez N: Hate Speech Dataset from a White Supremacy Forum. GitHub; n.d.Reference Source

[21] Gitari ND, Zuping Z, Damien H, et al.: A lexicon-based approach for hate speech detection. Int. J. Multimedia Ubiquitous Eng. 2015; 10(4): 215–230. Publisher Full Text

[22] Hern A: Facebook, YouTube, Twitter and Microsoft sign EU hate speech code. The Guardian News. 2016, May 31. https://www.theguardian.com/technology/2016/may/31/facebook-youtube-twitter-microsoft-eu-hate-speech-code

[23] Hern M: YouTube bans David Duke and other US far-right users. News, The Guardian. 2020, June 30. Reference Source

[24] Hietanen M, Eddebo J: Towards a Definition of Hate Speech. With a Focus on Online Contexts. J. Commun. Inq. 2023; 47(4): 440–458. Publisher Full Text

[25] Hochreiter S, Schmidhuber J: Long short-term memory. Neural Comput. 1997; 9(8): 1735–1780. Publisher Full Text

[26] Kelly P: History Time. [Channel]. YouTube. n.d.Reference Source

[27] Kotu V, Deshpande B: Data science. Concepts and Practice. 2nd ed.Morgan Kaufmann; 2018.

[28] LeCun Y, Boser B, Denker J, et al.: Handwritten Digit Recognition with a Back-Propagation Network., Advances in neural information processing systems.1990; 2: 396–404.

[29] Lewis R: Alternative Influence: Broadcasting the Reactionary Right on YouTube. Data Soc. 2018. Reference Source

[30] Manning CD, Raghavan P, Schütze H: Introduction to information retrieval. Cambridge University Press; 2008; pp. 234–265. Ch. 13. Publisher Full Text

[31] Maslej-Krešňáková V, Sarnovský M, Butka P, et al.: Comparison of Deep Learning Models and Various Text Pre-Processing Techniques for the Toxic Comments Classification. Appl. Sci. 2020; 10(23): 8631. Publisher Full Text

[32] Menard SW: Logistic regression. From introductory to advanced concepts and applications. SAGE; 2010. Publisher Full Text

[33] Nockleby JT: Hate speech.Levy LW, Karst KL, editors. Encyclopedia of the American Constitution. 2nd ed. vol. 3. . Macmillan Reference USA; 2000; pp. 1277–1279.

[34] Ottoni R, Cunha E, Magno G, et al.: Analyzing Right-wing YouTube Channels: Hate, Violence and Discrimination. Proceedings of the 10th ACM Conference on Web Science. [cs.SI]. 2018.arXiv:1804.04096v1

[35] Paasch-Colberg S, Strippel C, Trebbe J, et al.: From Insult to Hate Speech. Mapping Offensive Language in German User Comments on Immigration. Media Commun. 2021; 9(1): 171–180. Publisher Full Text

[36] Rosen A, Ihara I: Giving you more characters to express yourself. Blog. 2017, September 26. Reference Source

[37] Rydgren J: The Radical Right: An Introduction.Rydgren J, editor. The Oxford Handbook of the Radical Right. Oxford University Press; 2018; pp. 1–14. Publisher Full Text

[38] Schmidt A, Wiegand M: A survey on hate speech detection using natural language processing. Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media, April 3, 2017, Valencia, Spain, Association for Computational Linguistics. 2019; pp. 1–10. Publisher Full Text

[39] Stormfront: [Online Forum]. White Nationalist Community.n.d.Reference Source

[40] Vrysis L, Vryzas N, Kotsakis R, et al.: A Web Interface for Analyzing Hate Speech. Future Internet. 2021; 13(3): 80. (18 pp.). Publisher Full Text

[41] Zampieri M, Malmasi S, Nakov P, et al.: Predicting the type and target of offensive posts in social media. arXiv:1902.09666.2019. Publisher Full Text

[42] van der Ploeg T , Austin PC, Steyerberg EW: Modern modelling techniques are data hungry: a simulation study for predicting dichotomous endpoints. BMC Med. Res. Methodol. 2014; 14(137): 1–13. PubMed Abstract | Publisher Full Text | Free Full Text

[43] Wang A, Hoang CDV, Kan MY: Perspectives on crowdsourcing annotations for natural language processing. Lang. Resour. Eval. 2013; 47: 9–31. Publisher Full Text

[44] Wulczyn E, Thain N, Dixon L: Ex machina: Personal attacks seen at scale. Iben Proceedings of the 26th international conference on world wide web. arXiv:1610.08914. 2017, April; pp. 1391–1399. Publisher Full Text

[45] YouTube: Our ongoing work to tackle hate. YouTube Official Blog; 2019, June 5. Reference Source

Automatic Identification of Hate Speech – A Case-Study of alt-Right YouTube Videos

Abstract

Background

Methods

Results

Conclusions

Keywords

Methods

Overview