Silencing the Risk, Not the Whistle: A Semi-automated Text Sanitization Tool for Mitigating the Risk of Whistleblower Re-Identification

dc.contributor.authorStaufer, Dimitri
dc.contributor.authorPallas, Frank
dc.contributor.authorBerendt, Bettina
dc.date.accessioned2025-03-31T13:51:13Z
dc.date.available2025-03-31T13:51:13Z
dc.date.issued2024
dc.description.abstractWhistleblowing is essential for ensuring transparency and accountability in both public and private sectors. However, (potential) whistleblowers often fear or face retaliation, even when report- ing anonymously. The specific content of their disclosures and their distinct writing style may re-identify them as the source. Legal measures, such as the EU Whistleblower Directive, are limited in their scope and effectiveness. Therefore, computational methods to prevent re-identification are important complementary tools for encouraging whistleblowers to come forward. However, current text sanitization tools follow a one-size-fits-all approach and take an overly limited view of anonymity. They aim to mitigate identification risk by replacing typical high-risk words (such as person names and other labels of named entities) and combinations thereof with placeholders. Such an approach, however, is inadequate for the whistleblowing scenario since it neglects further re-identification potential in textual features, including the whistleblower’s writing style. Therefore, we propose, implement, and evaluate a novel classification and mitigation strategy for rewriting texts that involves the whistleblower in the assessment of the risk and utility. Our prototypical tool semi-automatically evaluates risk at the word/term level and applies risk-adapted anonymization techniques to produce a grammatically disjointed yet appropriately sanitized text. We then use a Large Language Model (LLM) that we fine-tuned for paraphrasing to render this text coherent and style-neutral. We evaluate our tool’s effectiveness using court cases from the European Court of Human Rights (ECHR) and excerpts from a real-world whistleblower testimony and measure the protection against authorship attribution attacks and utility loss statistically using the popular IMDb62 movie reviews dataset, which consists of 62 individuals. Our method can significantly reduce authorship attribution accuracy from 98.81% to 31.22%, while preserving up to 73.1% of the original content’s semantics, as measured by the established cosine similarity of sentence embeddings.
dc.identifier.citationStaufer, D., Pallas, F., & Berendt, B. (2024). Silencing the Risk, Not the Whistle: A Semi-automated Text Sanitization Tool for Mitigating the Risk of Whistleblower Re-Identification. The 2024 ACM Conference on Fairness, Accountability, and Transparency, 733–745. https://doi.org/10.1145/3630106.3658936
dc.identifier.doi10.1145/3630106.3658936
dc.identifier.isbn979-8-4007-0450-5
dc.identifier.urihttps://www.weizenbaum-library.de/handle/id/860
dc.language.isoen
dc.publisherACM
dc.rightsopen access
dc.rights.urihttps://creativecommons.org/licenses/by-nd/4.0/
dc.subjectText Sanitization
dc.subjectWhistleblower Anonymity
dc.subjectAuthorship Obfuscation
dc.subjectFine-tuning Language Models
dc.subjectLLM-based Rephrasing
dc.titleSilencing the Risk, Not the Whistle: A Semi-automated Text Sanitization Tool for Mitigating the Risk of Whistleblower Re-Identification
dc.typeConferencePaper
dc.type.statuspublishedVersion
dcmi.typeText
dcterms.bibliographicCitation.urlhttps://dl.acm.org/doi/10.1145/3630106.3658936
local.researchgroupDaten, algorithmische Systeme und Ethik
local.researchtopicDigitale Technologien in der Gesellschaft
Dateien
Originalbündel
Gerade angezeigt 1 - 1 von 1
Lade...
Vorschaubild
Name:
Berendt_ea_Silencing-the-Risk.pdf
Größe:
3.69 MB
Format:
Adobe Portable Document Format
Beschreibung: