Research Data

Permanent URI for this collectionhttps://www.weizenbaum-library.de/handle/id/977

Browse

Search Results

Now showing 1 - 10 of 51
  • Thumbnail Image
    Item
    Multi-Platform Social Media Data Donation Behavior Dataset
    (2025) Wedel, Lion; Mayer, Anna-Theresa; Fan, Yangliu; Gaisbauer, Felix; Ohme, Jakob
    This repository contains the dataset used for all analyses presented in the papers: - \ [Study 1] Wedel, L., Ohme, J., Mayer, A. T., Gaisbauer, F., & Fan, Y. (2025). The platform matters: cross-platform differences in data donation willingness, behavior, and bias. Communication Methods and Measures, 1–25. https://doi.org/10.1080/19312458.2025.2605946 - \ [Study 2] Wedel, L., Ohme, J. (2026). Longitudinal Data Donation Behavior and Data Omission across Four Social Media Platforms. Computational Communications Research [accepted for publication] The repository is structured into two directories, one for the data and code for each study. The directory “Study 1” contains the study data as a .csv file and a PDF with a variable overview. The directory “Study 2” contains four .csv files, the Jupyter notebook code for the analysis in the paper, and a PDF file giving a quick overview of each file. Here, the Jupyter notebook also serves also as an explanation file.
  • Thumbnail Image
    Item
    Instagram homophily datasets and codes
    (2023) Pignolet, Yvonne-Anne; Schmid, Stefan; Seelisch, Arne
    This file contains datasets and python codes related to the paper "Gender-Specific Homophily on Instagram and Implications on Information Spread".
  • Thumbnail Image
    Item
    How Influencers and Multipliers Drive Polarization and Issue Alignment on Twitter/X - Data (Version v1) [Data set]
    (2025) Pournaki, Armin; Gaisbauer, Felix; Olbrich, Eckehard
    We provide anonymized retweet networks extracted from trending topics in Germany collected between 2021 to 2023. More specifically, we collected tweets from 2021-03-29 to 2023-07-12 according to the following scheme: at the beginning of each day, we launched a script that collects the current "trending topics" (from now on referred to as "trends") in Germany using the Twitter Trend API (v1). By default, trends are personalized based on the account's Twitter/X usage. One can, however, disable the personalization by setting a specific location from which to draw the trending topics, which then yields "popular topics among people in a specific geographic location" (X/Twitter2025). We re-ran the script every 15 minutes. At the end of each day, we counted the number of times each trending topic appeared during the day and kept the top 5 most frequent ones. This gave us a proxy of the five most important trending topics for that day. We then used the Twitter Search API (v1) to collect German-speaking tweets using the exact trend keyword as a query on the day it trended and the day after (48hrs). All the tweets were collected using a single Twitter API key, collecting tweets for maximally 24 hours every day. For each trend, we extract a retweet network, in which nodes are Twitter users and a directed link is drawn from user to if retweets . We provide one retweet network for each trend as a csv after anonymizing the user_ids. There is one csv for each trend containing the columns source,target,weight. The filename contains the date and the keyword that was searched: T__.csv All the individual files are contained in rtn.zip. Additionally, we computed a topic model on the full text of tweets which allowed us to classify each trend into one larger metatopic (such as Covid, Climate Change, Sports, ...). This topic assignment is contained in trend2topic.csv. For more information on the topic model, please refer to the paper https://doi.org/10.1609/icwsm.v19i1.35890.
  • Thumbnail Image
    Item
    Digital Turn Without Digital Methods? Mapping the Journey of Journalism Studies [Dataset]
    (2025) Fan, Yangliu; Ohme, Jakob; Neuberger, Christoph
    Recent years have seen a growing diversity in journalism studies, primarily ascribed to digital transformation in the contemporary context. Analyzing 6,770 publications from the five major journalism journals—Journalism, Journalism & Mass Communication Quarterly, Journalism Practice, Journalism Studies, and Digital Journalism—between 1995 and 2022, we find new evidence that the digital turn is highly visible in journalism studies. Using document co-citation analysis, we first have identified distinct and coherent, yet loosely integrated, research clusters that focus on different journalistic topics, i.e., specialties. Second, we find that digital journalism has not only been integrated into the research agendas within the field but has also formed stand-alone and distinct research clusters. We further show that field structure has developed over the years in response to digital transformation. Yet, digital and computational methods remain in the stark minority compared with the more traditional methods. Our results suggest that journalism studies could benefit from novel inter-cluster communications and methodological innovations.
  • Thumbnail Image
    Item
    Subject Access Request response data - 105 iOS and 120 Android apps
    (2020-06-28) Kröger, Jacob Leon; Lindemann, Jens; Herrmann, Dominik
    This data shows how 225 app vendors responded to subject access requests in a longitudinal privacy study between the years 2015 and 2019. Details can be found in the corresponding publication: Jacob Leon Kröger, Jens Lindemann, and Dominik Herrmann. 2020. How do App Vendors Respond to Subject Access Requests? A Longitudinal PrivacyStudy on iOS and Android Apps. In The 15th International Conference onAvailability, Reliability and Security (ARES 2020), August 25–28, 2020, VirtualEvent, Ireland. ACM, New York, NY, USA
  • Thumbnail Image
    Item
    Opportunities for extremism: a comparative study of German far-right social movement networks on Twitter/X, Telegram, and Gettr [Supplementary Material]
    (2025) Gong, Baoning
    Supplementary Material for: Opportunities for extremism: a comparative study of German far-right social movement networks on Twitter/X, Telegram, and Gettr
  • Thumbnail Image
    Item
    TikTok Content Scraper
    (github, 2025-02-12) Bukold, Quentin
    This scraper allows you to download both TikTok videos and slides without an official API key. In addition, about 100 metadata about the video, author, music, video file and hashtags can be scraped. The scraper was built as a Python class and can be inherited by a custom parent class. This allows the scraper to be easily connected to a database, for example. Version 2.0 represents a complete architectural redesign of the first version aimed at reducing code complexity, improving usability, and introducing new features including SQLite database integration for progress tracking. This project was originally developed at: https://github.com/Q-Bukold/TikTok-Content-Scraper
  • Thumbnail Image
    Item
    Comparison of the Draft and Adopted Delegated Act on Data Access. (DSA 40 Data Access Collaboratory)
    (2025-07-10) Seiling, LK
    First release of the materials for the comparison between the draft delegated act on data access (DDA) and the adopted delegated act (DA). Includes: - – source documents (draft delegated act on data access + appendix [DDA] and the adopted delegated act on data access + appendix [DA]) - – .csv table with aligned text passages from the source documents and their positions (excluding appendix) (changes_DDA_DA.csv) - – python script to generate the html for visual comparison (compare_DDA_DA.html) - – the resulting html to be rendered by a browser (compare_DDA_DA.html) - – The python requirements (requirements.txt) - – The README.md (README.md) Comparison is available at: https://dsa40collaboratory.eu/dda-da-comparison/
  • Thumbnail Image
    Item
    Reproduction Material for: Whose ideas are worth spreading? The representation of women and ethnic groups in TED talks
    (Harvard Dataverse, 2019-07-12) Schwemmer, Carsten
    This repository contains replication data for "Whose ideas are worth spreading? The representation of women and ethnic groups in TED talks". (2009-07-12)
    #### Readme The R code for all analysis is included in the file "ted_talks_analysis_replication.Rmd". An html version of the analysis including output is available in file "ted_talks_analysis_replication.html" The remaining files are either figures, UTF-8 encoded and tab-delimited datasets (original format .tsv, dataverse format .tab), or R objects (ending with .RData). Please consider that data for TED talks is licensed under a Creative Commons License by TED. The Python code for collecting the data from TED and the image recognition service is not included in this repository. Unfortunately the TED website already changed since the time of data collection, and the code therefore does not work anymore. In case you have any questions about the replication material, feel free to contact me at c.schwem2er@gmail.com
    #### File list - readme.txt - ted_main_dataset.tsv - ted_ready_for_analysis.RData - ted_talks_analysis_replication.html - ted_talks_analysis_replication.Rmd - ted_talks_validation.tsv - ted_yt-comments_sentiment.tsv - ted_yt-comments.tsv - figures/figure2.pdf - figures/figure2.png - figures/figure3.pdf - figures/figure3.png - figures/figure4.pdf - figures/figure4.png - figures/supporting_information_figure1.png - figures/supporting_information_figure2.png - figures/supporting_information_figure3.png - figures/supporting_information_figure4.png - figures/supporting_information_figure5.png
  • Thumbnail Image
    Item
    Supplemental Material for “Notable enough? The questioning of women’s biographies on Wikipedia”
    (2024) Martini, Franziska
    Table I. Sample size Table II. Reliability coefficients Table III. Total and relative numbers of biographies nominated for deletion by specific criteria for notability Figure I. Percentage of users and their level of justification by their position and the biography’s gender Figure II. Numbers of German-language Wikipedia biographies by tagged gender category and year of creation Figure III. Number of deletion nominations (N = 396) per logged-in users (N = 158) Figure V. Number of decisions (N = 362) per administrator (N = 35) Figure IV. Number of discussions (N = 461) logged-in users (N = 608) participated in