Lost in moderation: How commercial content moderation apis over- and under-moderate group-targeted hate speech and linguistic variations

dc.contributor.authorHartmann, David
dc.contributor.authorOueslati, Amin
dc.contributor.authorStaufer, Dimitri
dc.contributor.authorPohlmann, Lena
dc.contributor.authorMunzert, Simon
dc.contributor.authorHeuer, Hendrik
dc.date.accessioned2025-06-02T08:44:17Z
dc.date.available2025-06-02T08:44:17Z
dc.date.issued2025
dc.description.abstractCommercial content moderation APIs are marketed as scalable solutions to combat online hate speech. However, the reliance on these APIs risks both silencing legitimate speech, called over-moderation, and failing to protect online platforms from harmful speech, known as under-moderation. To assess such risks, this paper introduces a framework for auditing black-box NLP systems. Using the framework, we systematically evaluate five widely used commercial content moderation APIs. Analyzing five million queries based on four datasets, we find that APIs frequently rely on group identity terms, such as “black”, to predict hate speech. While OpenAI’s and Amazon’s services perform slightly better, all providers under-moderate implicit hate speech, which uses codified messages, especially against LGBTQIA+ individuals. Simultaneously, they over-moderate counter-speech, reclaimed slurs and content related to Black, LGBTQIA+, Jewish, and Muslim people. We recommend that API providers offer better guidance on API implementation and threshold setting and more transparency on their APIs’ limitations.Warning: This paper contains offensive and hateful terms and concepts. We have chosen to reproduce these terms for reasons of transparency.
dc.identifier.citationHartmann, D., Oueslati, A., Staufer, D., Pohlmann, L., Munzert, S., & Heuer, H. (2025). Lost in moderation: How commercial content moderation apis over- and under-moderate group-targeted hate speech and linguistic variations. Proceedings of the 2025 CHI conference on human factors in computing systems. https://doi.org/10.1145/3706598.3713998
dc.identifier.doi10.1145/3706598.3713998
dc.identifier.isbn979-8-4007-1394-1
dc.identifier.urihttps://www.weizenbaum-library.de/handle/id/901
dc.language.isoeng
dc.publisherAssociation for Computing Machinery
dc.rightsopen access
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.subjectcontent moderation apis
dc.subjectaudit
dc.subjectai transparency and account- ability
dc.subjecthuman-ai interaction in content moderation
dc.subjectalgorithmic bias in hate speech detection
dc.titleLost in moderation: How commercial content moderation apis over- and under-moderate group-targeted hate speech and linguistic variations
dc.typeConferencePaper
dc.type.statuspublishedVersion
dcmi.typeText
dcterms.bibliographicCitation.urlhttps://doi.org/10.1145/3706598.3713998
local.researchgroupDaten, algorithmische Systeme und Ethik
local.researchtopicDigitale Technologien in der Gesellschaft
Dateien
Originalbündel
Gerade angezeigt 1 - 1 von 1
Lade...
Vorschaubild
Name:
Hartmann_ea_Lost-in-Moderation.pdf
Größe:
1.1 MB
Format:
Adobe Portable Document Format
Beschreibung: