📊 Projekt

Computational Literary Studies Infrastructure (CLS INFRA)

Institute of Polish Language at the Polish Academy of Sciences

Computational Literary Studies Infrastructure (CLS INFRA)

Institution: Institute of Polish Language at the Polish Academy of Sciences Category: Project
Website: https://clsinfra.io/

Short Description

The CLS INFRA project develops a shared infrastructure for computational literary studies to standardize and interlink heterogeneous data, tools, and methods. Target groups are researchers in literary studies, Digital Humanities, and related disciplines, particularly at universities. The main benefit lies in improving access to multilingual literary corpora, promoting reproducibility, and facilitating collaboration across national and disciplinary boundaries.

General Description

-


Thematic Classification

Subject Areas

  • Humanities
  • Computer Science
  • Digital Humanities
  • Literary Studies
  • Computational Literary Studies
  • Linguistics
  • Cultural Studies
  • Data Management
  • Open Science
  • Machine Learning
  • Natural Sciences (indirectly via NLP and data analysis)

Research Fields

  • Computational Literary Studies
  • Natural Language Processing (NLP)
  • Stylometry / Computational Stylistics
  • Multilingual Text Analysis
  • Digital Humanities
  • Text Mining
  • Named Entity Recognition (NER)
  • Relational Extraction (REX)
  • Sentiment Analysis (SA)
  • Aspect-Based Sentiment Analysis (ABSA)
  • Programmatic Corpora
  • Linked Open Data
  • Data Curation and Sharing
  • Metadata Standards for Literary Corpora
  • Literary Network Analysis
  • Genre Analysis
  • Authorship Attribution
  • Literary History
  • Gender Analysis
  • Canonicity Studies
  • Digital Editions
  • Text Encoding Initiative (TEI)
  • Corpus Enrichment
  • Machine Learning in Humanities
  • Computational Semantics
  • Cross-Lingual Transfer Learning
  • Historical Social Network Analysis
  • Digital Cultural Heritage
  • Open Science
  • Research Data Management
  • Transnational Access to Research Infrastructures
  • Training and Skills Development in Digital Humanities

Specializations

  • Computational Literary Studies (CLS)
  • Development of shared data, tool, and knowledge resources
  • Standardization and interoperability of literary data
  • Multilingual and transnational literary research
  • Programmable Corpora (programmierbare Korpora)
  • Natural Language Processing (NLP) for literary texts
  • Stylistics and computer-assisted style analysis
  • Data annotation and enrichment
  • Open Science and data-sharing practices
  • Transnational Access (Transnational Access, TNA)
  • Training and skill development for researchers
  • Research infrastructure for digital humanities
  • Linking research, libraries, and GLAM sectors (Galleries, Libraries, Archives, Museums)
  • Application of CLS beyond academic research (journalism, politics, medicine, culture)
  • Development of APIs and toolkits for research
  • Research on canon formation and literary quality
  • Historical network analysis and social networks in literature
  • Digital editions and text processing
  • Use of AI and generative AI (e.g. Retrieval-Augmented Generation) in literary studies

Keywords

  • CLS INFRA - Computational Literary Studies - Programmable Corpora - DraCor - NLP Toolchains - Multilingual Literary Data - TEI Standard - Transnational Access - Training Schools - Open Science

Funding

Funding Provider: -
Funding Program: Horizon 2020
Funding Reference: 101004984
Funding Period: 2022-2025
Project Volume: 1.5 Mio. Euro


Team & Partners

Project Leadership

  • Prof. Maciej Eder (Institute of Polish Language, Polish Academy of Sciences)

Involved Persons

  • Dr. Julie M. Birkholz (Assistant Professor Digital Humanities, Lead of KBR’s Digital Research Lab)
  • Ingo Börner (Research Associate, University of Potsdam)
  • Ruth Bruchertseifer (Researcher)
  • Floor Buschenhenke (Researcher)
  • Joanna Byszuk (Research Associate, Computational Stylistics Group)
  • Sally Chambers (Digital Humanities Research Coordinator, Ghent Centre for Digital Humanities)
  • Mag. Phil. Vera Maria Charvat (Researcher)
  • Mgr. Silvie Cinková Ph.D. (Researcher, Charles University)
  • Tess Dejaeghere (Researcher)
  • Anna Dijkstra (Work Package 4 Coordinator)
  • Julia Dudar (Researcher)
  • DI Matej Ďurčo (Researcher)
  • Evgeniia Fileva (Researcher, University of Trier)
  • Vicky Garnett (Training and Education Officer, DARIAH-EU)
  • Françoise Gouzi (Open Science Officer, DARIAH-EU)
  • Dr. Sarah Hoover (Postdoctoral Researcher, NUI Galway)
  • Bartłomiej Kunda (Coordinating Manager, Institute of Polish Language)
  • Prof. Dr. Els Lefever (Associate Professor, Ghent University)
  • PD Dr. Michał Mrugalski (Researcher)
  • Dr. Ciara L. Murphy (Postdoctoral Researcher, NUI Galway)
  • Dr. Carolin Odebrecht (Researcher)
  • Eliza Papaki (Researcher)
  • Marco Raciti (Researcher)
  • Dr. Emily Ridge (Lecturer, National University of Ireland Galway)
  • Ass. Prof. Salvador

Affiliated Institutions

-

External Partners

  • Austrian Academy of Sciences
  • Charles University
  • Digital Research Infrastructure for the Arts and Humanities
  • Ghent Centre for Digital Humanities, Ghent University
  • Belgrade Centre for Digital Humanities
  • Huygens Institute for the History of the Netherlands (Royal Netherlands Academy of Arts and Sciences)
  • Trier Center for Digital Humanities, Trier University
  • Moore Institute, National University of Ireland Galway
  • The Trinity Centre for Digital Humanities, Trinity College Dublin
  • National University of Distance Education
  • École Normale Supérieure de Lyon
  • Humboldt University of Berlin
  • Institute of Polish Language at the Polish Academy of Sciences
  • University of Potsdam

Project Contents

Goals

  • Establishment of a shared, sustainable infrastructure for computational literary studies in Europe
  • Standardization and unification of data, tools, and methods in literary studies
  • Improvement of access to and reusability of multilingual literary data
  • Promotion of collaboration between well-equipped and under-equipped research institutions
  • Expansion of the application of computational methods beyond academic research (e.g. in journalism, politics, medicine)

Work Packages

  • WP1: Project Management and Coordination
  • WP2: Communication, Dissemination, and Exploitation
  • WP3: Methodological Considerations and Community Building
  • WP4: Training and Skills Development
  • WP5: Data Landscape Review and Institutional Perspectives
  • WP6: Data Inventory and Toolkit Development
  • WP7: Building the Ecosystem of Programmable Corpora
  • WP8: NLP Toolchains and Corpus Enrichment
  • WP9: Transnational Access (TNA) Programme

Methods

  • Stylometry (Multilingual Stylometry Showcase)
  • Network analysis (Detecting Small Worlds in a Corpus of Thousands of Theater Plays)
  • Aspect-based Sentiment Analysis (ABSA) (D8.5 Report on Applied NLP Sentiment Analysis)
  • Named Entity Recognition (NER) (D8.3 Report on Applied NLP Named Entity Recognition)
  • Relational Extraction (REX) (D8.4 Report on NLP Relational Extraction)
  • Text mining
  • Natural Language Processing (NLP)
  • Programmable Corpora (D7.1: On programmable Corpora and DraCor)
  • TEI (Text Encoding Initiative) and TEI standardization
  • Transformation Toolbox (VELD: Versioned Executable Logic and Data)
  • Data integration and interoperability
  • Metadata analysis and standardization
  • Corpus enrichment
  • Data planning and design
  • Data provision and processing
  • Data archiving and publication
  • Data reuse
  • Open Science and Open Access
  • Research data life cycle (Research Data Life Cycle)
  • Qualitative and quantitative data analysis
  • Quantitative approaches to stylistic variation
  • Distant reading
  • Computational stylistics
  • Historical social network analysis
  • Multimodal stylometry
  • Machine

Expected Outcomes

  • Establishment of a shared, sustainable infrastructure for computer-assisted literary studies (CLS) in Europe
  • Standardization and unification of heterogeneous literary data, methods, and tools
  • Improvement of accessibility and reusability of literary data through uniform metadata and formats
  • Creation of a central catalog (CLSCor) for discovery and integration of literary corpora and tools
  • Development of Programmable Corpora with open APIs for machine-readable texts
  • Extension of the multilingual NLP toolchain for literary research, particularly for low-resource languages
  • Provision of training materials and training schools to strengthen competencies in CLS
  • Promotion of Transnational Access (TNA) for researchers from various countries and institutions
  • Creation of a comprehensive toolkit for data-driven research and data exchange in the research process
  • Increased reproducibility and traceability of research results through versioning of corpora and APIs
  • Strengthening of collaboration between research institutions, libraries, GLAM sectors, and other stakeholders
  • Expansion of CLS applications beyond academia

Contact

Contact Person: - Dr. Julie M. Birkholz - Ingo Börner - Ruth Bruchertseifer - Floor Buschenhenke - Joanna Byszuk - Sally Chambers - Mag. Phil. Vera Maria Charvat - Mgr. Silvie Cinková Ph.D. - Tess Dejaeghere - Anna Dijkstra - Julia Dudar - DI Matej Ďurčo - Prof. Maciej Eder - Dr Jennifer Edmond - Evgeniia Fileva - Vicky Garnett - Françoise Gouzi - Dr Sarah Hoover - Bartłomiej Kunda - Prof. Dr. Els Lefever - PD Dr. Michał Mrugalski - Dr Ciara L. Murphy - Dr. Carolin Odebrecht - Eliza Papaki - Marco Raciti - Dr Emily Ridge - Ass. Prof. Salvador Ros - Prof. Dr. Christof Schöch - Dr Artjoms Šeļa - Dr Justin Tonra - Dr. Erzsébet Tóth-Czifra - Prof Dr Peer Trilcke - Prof. Dr Karina van Dalen-Oskam - Lisanne M. van Rossum rMA - Vera Yakupova - Dr Joris van Zundert
Email: info@clsinfra.io
Project Website: https://clsinfra.io/


Recorded: 2026-01-14
Source: https://clsinfra.io/

Visit Website