Open Access-Publikationen
Dauerhafte URI für den Bereich
Listen
Auflistung Open Access-Publikationen nach Forschungsgruppen "Daten, algorithmische Systeme und Ethik"
Gerade angezeigt 1 - 5 von 5
Treffer pro Seite
Sortieroptionen
- ItemArticulation Work and Tinkering for Fairness in Machine Learning(2024) Fahimi, Miriam; Russo, Mayra; Scott, Kristen M.; Vidal, Maria-Esther; Berendt, Bettina; Kinder-Kurlanda, KatharinaThe field of fair AI aims to counter biased algorithms through computational modelling. However, it faces increasing criticism for perpetuating the use of overly technical and reductionist methods. As a result, novel approaches appear in the field to address more socially-oriented and interdisciplinary (SOI) perspectives on fair AI. In this paper, we take this dynamic as the starting point to study the tension between computer science (CS) and SOI research. By drawing on STS and CSCW theory, we position fair AI research as a matter of 'organizational alignment': what makes research 'doable' is the successful alignment of three levels of work organization (the social world, the laboratory, and the experiment). Based on qualitative interviews with CS researchers, we analyze the tasks, resources, and actors required for doable research in the case of fair AI. We find that CS researchers engage with SOI research to some extent, but organizational conditions, articulation work, and ambiguities of the social world constrain the doability of SOI research for them. Based on our findings, we identify and discuss problems for aligning CS and SOI as fair AI continues to evolve.
- Item“Guilds” as Worker Empowerment and Control in a Chinese Data Work Platform(Association for Computing Machinery, 2024) Yang, Tianling; Miceli, MilagrosData work plays a fundamental role in the development of algorithmic systems and the AI industry. It is often performed in business process outsourcing (BPO) companies and crowdsourcing platforms, involving a global and distributed workforce as well as networks of collaborative actors. Previous work on community building among data workers centers organization and mutual support or focuses on the structuring and instrumentalization of crowdworker groups for complicated projects. We add to these lines of research by focusing on a specific form of community building encouraged and facilitated by platforms in China: guilds. Based on ethnographic work on a Chinese crowdsourcing platform and 14 semi-structured interviews with data workers, our findings show that guilds are a form of both worker empowerment and control. With this work, we add a nuanced empirical case to the interconnection of BPOs, online communities and crowdsourcing platforms in the current data production sector in China, thus expanding previous investigations on global perspectives of data production. We discuss guilds in relation to individual workers and highlight their effects on data work, including efficient coordination, enhanced standardization, and flattened power structure.
- ItemLost in moderation: How commercial content moderation apis over- and under-moderate group-targeted hate speech and linguistic variations(Association for Computing Machinery, 2025) Hartmann, David; Oueslati, Amin; Staufer, Dimitri; Pohlmann, Lena; Munzert, Simon; Heuer, HendrikCommercial content moderation APIs are marketed as scalable solutions to combat online hate speech. However, the reliance on these APIs risks both silencing legitimate speech, called over-moderation, and failing to protect online platforms from harmful speech, known as under-moderation. To assess such risks, this paper introduces a framework for auditing black-box NLP systems. Using the framework, we systematically evaluate five widely used commercial content moderation APIs. Analyzing five million queries based on four datasets, we find that APIs frequently rely on group identity terms, such as “black”, to predict hate speech. While OpenAI’s and Amazon’s services perform slightly better, all providers under-moderate implicit hate speech, which uses codified messages, especially against LGBTQIA+ individuals. Simultaneously, they over-moderate counter-speech, reclaimed slurs and content related to Black, LGBTQIA+, Jewish, and Muslim people. We recommend that API providers offer better guidance on API implementation and threshold setting and more transparency on their APIs’ limitations.Warning: This paper contains offensive and hateful terms and concepts. We have chosen to reproduce these terms for reasons of transparency.
- ItemSilencing the Risk, Not the Whistle: A Semi-automated Text Sanitization Tool for Mitigating the Risk of Whistleblower Re-Identification(ACM, 2024) Staufer, Dimitri; Pallas, Frank; Berendt, BettinaWhistleblowing is essential for ensuring transparency and accountability in both public and private sectors. However, (potential) whistleblowers often fear or face retaliation, even when report- ing anonymously. The specific content of their disclosures and their distinct writing style may re-identify them as the source. Legal measures, such as the EU Whistleblower Directive, are limited in their scope and effectiveness. Therefore, computational methods to prevent re-identification are important complementary tools for encouraging whistleblowers to come forward. However, current text sanitization tools follow a one-size-fits-all approach and take an overly limited view of anonymity. They aim to mitigate identification risk by replacing typical high-risk words (such as person names and other labels of named entities) and combinations thereof with placeholders. Such an approach, however, is inadequate for the whistleblowing scenario since it neglects further re-identification potential in textual features, including the whistleblower’s writing style. Therefore, we propose, implement, and evaluate a novel classification and mitigation strategy for rewriting texts that involves the whistleblower in the assessment of the risk and utility. Our prototypical tool semi-automatically evaluates risk at the word/term level and applies risk-adapted anonymization techniques to produce a grammatically disjointed yet appropriately sanitized text. We then use a Large Language Model (LLM) that we fine-tuned for paraphrasing to render this text coherent and style-neutral. We evaluate our tool’s effectiveness using court cases from the European Court of Human Rights (ECHR) and excerpts from a real-world whistleblower testimony and measure the protection against authorship attribution attacks and utility loss statistically using the popular IMDb62 movie reviews dataset, which consists of 62 individuals. Our method can significantly reduce authorship attribution accuracy from 98.81% to 31.22%, while preserving up to 73.1% of the original content’s semantics, as measured by the established cosine similarity of sentence embeddings.
- ItemWho trains the Data for European Artificial Intelligence?(2024) Miceli, Milagros; Tubaro, Paola; Casilli, Antonio; Le Bonniec, Thomas; Salim Wagner, Camilla; Sachenbacher, Laurenz