Research Data
Permanent URI for this collectionhttps://www.weizenbaum-library.de/handle/id/977
Browse
Item A categorized multimodal TikTok dataset(2023) Wedel, LionThis dataset encompasses 11242 entries of 5137 unique videos listed between the 31st of July and the 4th of August on the TikTok explore page (https://www.tiktok.com/explore). The page was accessed via a German IP address without being logged in. The data has been collected via the 4CAT Toolkit and the Zeeschuimer browser extension. The dataset contains the category and multimodal embeddings for each video. **Intended Purpose** The dataset is primarily intended for proof-of-concept studies, as a toy dataset to teach or to be used for seminar papers by students. Given the lack of a clear definition for each category by TikTok, the focus of such work might be to explore those definitions or to conduct work with a focus on methods. The multimodal embeddings allow for directly applying unsupervised and supervised machine learning techniques. **Contents** The dataset consists of four zipped .csv files: * – metadata.zip * – text_embeddings.zip * – audio_embeddings.zip * – video_embedding.zip **For further details, please consult the Data Report** (datenbericht_v2.pdf).Item Appendix to “Your social ties, your personal public sphere, your responsibility”(2022) Gagrčin, Emilija### Appendix Vignette 1: Private Profile Vignette 2: Public PageItem Attachment to the article “From Insult to Hate Speech: Mapping Offensive Language in German User Comments on Immigration”(2021) Paasch-Colberg, Sünje; Strippel, Christian; Trebbe, Joachim; Emmer, Martin### English Translation of German User Comments Attachment to the article “From Insult to Hate Speech: Mapping Offensive Language in German User Comments on Immigration” as part of the issue “Dark Participation in Online Communication: The World of the Wicked Web”, edited by Thorsten Quandt (University of Münster, Germany). #### Disclaimer This document contains both the German original and the English translation of all user comments quoted in the article. In order to ensure the anonymity of the authors, the original German quotations were modified. However, the parts that are relevant to the argument or example remain unchanged. Due to the nature of this document, the user comments presented here contain potentially offensive and upsetting terms, particularly racist and islamophobic. They are solely used as examples to illustrate the results of this research and do not reflect the views of the authors in any way.Item Boulianne, Shelley (2021). Pathways to Environmental Activism in Four Countries. figshare. Dataset.(2022) Boulianne, ShelleyYouthEnviroActivism Sept2021.sav YouthEnviroActivism Sept2021.sps YouthEnviroActivism Sept2021.spv YouthEnviroActivism Sept2021.xlsItem Comparison of the Draft and Adopted Delegated Act on Data Access. (DSA 40 Data Access Collaboratory)(2025-07-10) Seiling, LKFirst release of the materials for the comparison between the draft delegated act on data access (DDA) and the adopted delegated act (DA). Includes: - – source documents (draft delegated act on data access + appendix [DDA] and the adopted delegated act on data access + appendix [DA]) - – .csv table with aligned text passages from the source documents and their positions (excluding appendix) (changes_DDA_DA.csv) - – python script to generate the html for visual comparison (compare_DDA_DA.html) - – the resulting html to be rendered by a browser (compare_DDA_DA.html) - – The python requirements (requirements.txt) - – The README.md (README.md) Comparison is available at: https://dsa40collaboratory.eu/dda-da-comparison/Item Data of the paper: Clickbait or conspiracy? How Twitter users address the epistemic uncertainty of a controversial preprint(Center for Open Science, 2022-06-22) Franzreb, Carlos; Schimmler, Sonja; Bauer, Mareike FenjaThis project contains sources related with the paper „Clickbait or conspiracy? How Twitter users address the epistemic uncertainty of a controversial preprint“:
Scripts
Scripts used to retrieve Tweets and to analyze/visualize them.
Quantitative Analysis: Data
+ tweets.json: All Tweets of the relevant users as nodes and their relationships (retweet, quote or reply) as edges.
+ users_clustered.json: Users as nodes and their follow-relationships as edges, clustered with the Leiden algorithm.
+ follower_network.json: JSON file corresponding to Figure 1.
+ interaction_network.json: JSON file corresponding to Figure 2
Qualitative Analysis: Data
Replies and quotes of the Tweets that are used in the qualitative analysis.Item Digital Turn Without Digital Methods? Mapping the Journey of Journalism Studies [Dataset](2025) Fan, Yangliu; Ohme, Jakob; Neuberger, ChristophRecent years have seen a growing diversity in journalism studies, primarily ascribed to digital transformation in the contemporary context. Analyzing 6,770 publications from the five major journalism journals—Journalism, Journalism & Mass Communication Quarterly, Journalism Practice, Journalism Studies, and Digital Journalism—between 1995 and 2022, we find new evidence that the digital turn is highly visible in journalism studies. Using document co-citation analysis, we first have identified distinct and coherent, yet loosely integrated, research clusters that focus on different journalistic topics, i.e., specialties. Second, we find that digital journalism has not only been integrated into the research agendas within the field but has also formed stand-alone and distinct research clusters. We further show that field structure has developed over the years in response to digital transformation. Yet, digital and computational methods remain in the stark minority compared with the more traditional methods. Our results suggest that journalism studies could benefit from novel inter-cluster communications and methodological innovations.Item Digitalization and organizational change during the Covid-19 crisis (Data and Codebook)(Weizenbaum Institute, 2024) Krzywdzinski, Martin; Butollo, Florian; Bovenschulte, Marc; Nerger, MichaelThe data set is based on a survey of companies conducted to investigate the extent to which the pandemic led to a strategic reorientation of digitalization measures in companies. The aim was to investigate whether digitalization measures were newly established and intensified, in which areas digitalization took place, and how the digitalization measures differed depending on the company’s level of digitalization, sector, and size. #### Methods The survey was realized as CATI by the market research company Hopp Marktforschung with the support of VDI/VDE Innovation+Technik GmbH (Marc Bovenschulte and Michael Nerger). The survey was conducted in Germany in two waves: in July and August 2021 and September and October 2022. The survey included 540 companies in the first wave and 605 companies in the second wave. 120 companies participated in both waves. The respondents belonged to the top and middle management of the companies (e.g. head of division). The survey was conducted in two waves in order to examine differences between companies' approaches at the beginning and later in the course of the pandemic and to check whether companies’ strategies solidified over time or whether they were just short-term reactions.Item FAIREST Metrics and Assessment Data(zenodo, 2021-11-08) d’Aquin, Mathieu; Kirstein, Fabian; Schimmler, Sonja; Oliveira, Daniela; Urbanek, SebastianThis data supplements the article “FAIREST: A Framework for Assessing Research Repositories”. In the article, we introduce the FAIREST principles, an extension of the well-known FAIR principles. Along these principles, we provide comprehensive metrics for assessing and selecting solutions for building digital repositories for research artefacts. The metrics are based on two pillars:
- an analysis of established features and functionalities, drawn from existing solutions,
- a literature review on general requirements for digital repositories for research artefacts and related systems.
- – ResearchGate
- – Academia.edu
- – Zenodo
- – arXiv
- – Bibsonomy
- – Figshare
- – CKAN
- – DSpace
- – Invenio
- – Dataverse
- – EPrints
We further describe an assessment of 11 widespread solutions, with the goal to provide an overview of the current landscape of research data repository solutions, identifying gaps and research challenges to be addressed. The solutions are:
Overview of the data
01 FAIREST Assessment Metrics and Solutions (All-in-one).xlsx
This Excel file includes both the assessment metrics and the results for the 11 solutions
02 FAIREST Assessment Metrics.csv
The assessment metrics as CSV
XX FAIREST Assessment XXX.csv
Assessment result for the respective solution
14 FAIREST Assessment Template.xlsx
A template to apply the metrics to an individual solution
Note: Fill in your assessment in column F and get the result at the bottom of the sheet
Item How Influencers and Multipliers Drive Polarization and Issue Alignment on Twitter/X - Data (Version v1) [Data set](2025) Pournaki, Armin; Gaisbauer, Felix; Olbrich, EckehardWe provide anonymized retweet networks extracted from trending topics in Germany collected between 2021 to 2023. More specifically, we collected tweets from 2021-03-29 to 2023-07-12 according to the following scheme: at the beginning of each day, we launched a script that collects the current "trending topics" (from now on referred to as "trends") in Germany using the Twitter Trend API (v1). By default, trends are personalized based on the account's Twitter/X usage. One can, however, disable the personalization by setting a specific location from which to draw the trending topics, which then yields "popular topics among people in a specific geographic location" (X/Twitter2025). We re-ran the script every 15 minutes. At the end of each day, we counted the number of times each trending topic appeared during the day and kept the top 5 most frequent ones. This gave us a proxy of the five most important trending topics for that day. We then used the Twitter Search API (v1) to collect German-speaking tweets using the exact trend keyword as a query on the day it trended and the day after (48hrs). All the tweets were collected using a single Twitter API key, collecting tweets for maximally 24 hours every day. For each trend, we extract a retweet network, in which nodes are Twitter users and a directed link is drawn from user to if retweets . We provide one retweet network for each trend as a csv after anonymizing the user_ids. There is one csv for each trend containing the columns source,target,weight. The filename contains the date and the keyword that was searched: T__.csv All the individual files are contained in rtn.zip. Additionally, we computed a topic model on the full text of tweets which allowed us to classify each trend into one larger metatopic (such as Covid, Climate Change, Sports, ...). This topic assignment is contained in trend2topic.csv. For more information on the topic model, please refer to the paper https://doi.org/10.1609/icwsm.v19i1.35890.Item Instagram homophily datasets and codes(2023) Pignolet, Yvonne-Anne; Schmid, Stefan; Seelisch, ArneThis file contains datasets and python codes related to the paper "Gender-Specific Homophily on Instagram and Implications on Information Spread".Item Interviews zu Forschungsdateninfrastrukturen und digitalen Praktiken offener Wissenschaft am Weizenbaum-Institut(Zenodo, 2022-02-04) Bauer, Mareike; Wünsche, HannesDie Forschungsgruppe „Digitalisierung der Wissenschaft“ begleitet am Weizenbaum-Institut den Aufbau eines Repositoriums für Publikationen und Forschungsdaten. Als Teil der Anforderungsanalyse wurden leitfadengestützte Interviews mit wissenschaftlichen Mitarbeiter*innen des Weizenbaum-Instituts durchgeführt. Ziel dieser war es, deren Erfahrung mit und Anforderungen an Forschungsdateninfrastrukturen zu identifizieren. Dieser Datensatz beinhaltet: \+ Studienreport \+ anonymisierte Interviewtranskripte \+ E-Mail Aufruf \+ Interviewleitfaden \+ Einwilligungserklärung.Item Multi-Platform Social Media Data Donation Behavior Dataset(2025) Wedel, Lion; Mayer, Anna-Theresa; Fan, Yangliu; Gaisbauer, Felix; Ohme, JakobThis repository contains the dataset used for all analyses presented in the papers: - \ [Study 1] Wedel, L., Ohme, J., Mayer, A. T., Gaisbauer, F., & Fan, Y. (2025). The platform matters: cross-platform differences in data donation willingness, behavior, and bias. Communication Methods and Measures, 1–25. https://doi.org/10.1080/19312458.2025.2605946 - \ [Study 2] Wedel, L., Ohme, J. (2026). Longitudinal Data Donation Behavior and Data Omission across Four Social Media Platforms. Computational Communications Research [accepted for publication] The repository is structured into two directories, one for the data and code for each study. The directory “Study 1” contains the study data as a .csv file and a PDF with a variable overview. The directory “Study 2” contains four .csv files, the Jupyter notebook code for the analysis in the paper, and a PDF file giving a quick overview of each file. Here, the Jupyter notebook also serves also as an explanation file.Item Online Supplementary Material for “How Right-Wing Populist Comments Affect Online Deliberation on News Media Facebook Pages“(2022) Thiele, Daniel; Turnšek, TjašaAppendix A: Literature Review Appendix B: Dictionaries for the Topic of Migration Appendix C: Codebook Appendix D: Automated Text Analysis Appendix E: Summary Statistics for Step 1 and Step 2 Appendix F: Right-wing Populism in Comments by Media Type Appendix G: Regression Tables for Step 1 References in Appendices A-GItem Opportunities for extremism: a comparative study of German far-right social movement networks on Twitter/X, Telegram, and Gettr [Supplementary Material](2025) Gong, BaoningSupplementary Material for: Opportunities for extremism: a comparative study of German far-right social movement networks on Twitter/X, Telegram, and GettrItem Replication Data for: “Who reports witnessing and performing corrections on social media in the US, UK, Canada, and France?”(2024) Tang, Rongwei; Vraga, Emily; Bode, Leticia; Boulianne, ShelleyThese are the replication materials for the article "Who reports witnessing and performing corrections on social media in the US, UK, Canada, and France?" ### Files + HKSMR data.tab + HKSMR syntax.spsItem Replication Data for: Message deletion on Telegram: Affected data types and implications for computational analysis(Center for Open Science, 2022-11-01) Bühling, KilianOnline supplement for: Buehling, K. (2023). Message deletion on Telegram: Affected data types and implications for computational analysis. Communication Methods and Measures. https://www.doi.org/10.1080/19312458.2023.2183188. Please see the full paper for a description of data and methods.Item Reproduction Material for: Whose ideas are worth spreading? The representation of women and ethnic groups in TED talks(Harvard Dataverse, 2019-07-12) Schwemmer, CarstenThis repository contains replication data for "Whose ideas are worth spreading? The representation of women and ethnic groups in TED talks". (2009-07-12)
#### Readme The R code for all analysis is included in the file "ted_talks_analysis_replication.Rmd". An html version of the analysis including output is available in file "ted_talks_analysis_replication.html" The remaining files are either figures, UTF-8 encoded and tab-delimited datasets (original format .tsv, dataverse format .tab), or R objects (ending with .RData). Please consider that data for TED talks is licensed under a Creative Commons License by TED. The Python code for collecting the data from TED and the image recognition service is not included in this repository. Unfortunately the TED website already changed since the time of data collection, and the code therefore does not work anymore. In case you have any questions about the replication material, feel free to contact me at c.schwem2er@gmail.com
#### File list - readme.txt - ted_main_dataset.tsv - ted_ready_for_analysis.RData - ted_talks_analysis_replication.html - ted_talks_analysis_replication.Rmd - ted_talks_validation.tsv - ted_yt-comments_sentiment.tsv - ted_yt-comments.tsv - figures/figure2.pdf - figures/figure2.png - figures/figure3.pdf - figures/figure3.png - figures/figure4.pdf - figures/figure4.png - figures/supporting_information_figure1.png - figures/supporting_information_figure2.png - figures/supporting_information_figure3.png - figures/supporting_information_figure4.png - figures/supporting_information_figure5.pngItem Subject Access Request response data - 105 iOS and 120 Android apps(2020-06-28) Kröger, Jacob Leon; Lindemann, Jens; Herrmann, DominikThis data shows how 225 app vendors responded to subject access requests in a longitudinal privacy study between the years 2015 and 2019. Details can be found in the corresponding publication: Jacob Leon Kröger, Jens Lindemann, and Dominik Herrmann. 2020. How do App Vendors Respond to Subject Access Requests? A Longitudinal PrivacyStudy on iOS and Android Apps. In The 15th International Conference onAvailability, Reliability and Security (ARES 2020), August 25–28, 2020, VirtualEvent, Ireland. ACM, New York, NY, USAItem Supplemental material for “Beware: Processing of Personal Data—Informed Consent Through Risk Communication”(2024) Seiling, Lukas; Gsenger, Rita; Mulugeta, Filmona; Henningsen, Marte; Mischau, Lena; Schirmbeck, MarieAppendix A: GDPR content analysis Appendix B: Expert interview questions Appendix C: Results of the systematic qualitative content analysis of expert interviews
- «
- 1 (current)
- 2
- 3
- »