📊 Projekt

Computational Literary Studies Infrastructure (CLS INFRA)

Institute of Polish Language at the Polish Academy of Sciences

Computational Literary Studies Infrastructure (CLS INFRA)

Institution: Institute of Polish Language at the Polish Academy of Sciences Category: Project
Website: https://clsinfra.io/

Short Description

The service offering aims at standardizing and integrating literary data collections for computer-aided literary studies. It targets researchers and universities in the humanities, particularly those working with multilingual and heterogeneous text corpora. The main benefit lies in improving the accessibility, reusability, and reproducibility of research data through a shared infrastructure and data model. Universities benefit from simplified data management and stronger networking between research groups.

General Description

-


Thematic Classification

Subject Areas

  • Humanities
  • Computer Science
  • Digital Humanities
  • Literary Studies
  • Computational Literary Studies
  • Linguistics
  • Cognitive Science
  • Philosophy
  • Art History
  • Medical Humanities

Research Fields

  • Computational Literary Studies
  • Natural Language Processing (NLP)
  • Stylometry / Computational Stylistics
  • Multilingual Text Analysis
  • Digital Humanities
  • Text Mining
  • Named Entity Recognition (NER)
  • Relational Extraction (REX)
  • Sentiment Analysis (SA)
  • Aspect-Based Sentiment Analysis (ABSA)
  • Programmatic Corpora
  • Linked Open Data
  • Data Curation and Sharing
  • Metadata Standards for Literary Corpora
  • Literary Network Analysis
  • Genre Analysis
  • Authorship Attribution
  • Literary History
  • Gender Analysis
  • Canonicity Studies
  • Digital Editions
  • Text Encoding Initiative (TEI)
  • Corpus Enrichment
  • Machine Learning in Humanities
  • Computational Semantics
  • Cross-Lingual Transfer Learning
  • Readability Studies
  • Historical Social Network Analysis
  • Literary Data Infrastructure
  • Reproducible Research in Humanities
  • Open Science in Humanities
  • Transnational Access to Research Infrastructures
  • Training and Skills Development in Digital Humanities

Specializations

  • Computational Literary Studies (CLS)
  • Digital Humanities (DH)
  • Multilingual literary data infrastructure
  • Data curation and standardization
  • Natural Language Processing (NLP) for literary texts
  • Programmable corpora development
  • Linked Open Data (LOD) integration
  • Transnational Access (TNA) for research infrastructure
  • Training and skills development in CLS
  • Methodological best practices documentation
  • Open Science and data sharing policies
  • Interoperability of literary corpora and tools
  • Corpus enrichment and annotation
  • Reproducibility in digital literary research
  • Cross-lingual computational stylistics
  • Literary network analysis
  • Sentiment and aspect-based sentiment analysis (ABSA)
  • Named Entity Recognition (NER) and Relational Extraction (REX)
  • Text mining and distant reading
  • Digital editions and scholarly editing
  • Research data life cycle management
  • Metadata standards for literary corpora
  • Integration of AI and generative models in humanities research
  • Application of CLS methods beyond academia (journalism, policy, GLAM, medical humanities)

Keywords

  • CLS INFRA
  • Computational Literary Studies
  • Programmable Corpora
  • DraCor
  • NLP toolchains
  • Multilingual literary data
  • TEI standard
  • Transnational Access
  • Training Schools
  • Open Science

Funding

Funding Provider: -
Funding Program: Horizon 2020
Funding Reference: 101004984
Funding Period: 2022-2025
Project Volume: 1.5 Mio. Euro


Team & Partners

Project Leadership

  • Prof. Maciej Eder (Institute of Polish Language, Polish Academy of Sciences)

Involved Persons

  • Dr. Julie M. Birkholz (Assistant Professor Digital Humanities, Lead of KBR’s Digital Research Lab)
  • Ingo Börner (Research Associate, University of Potsdam)
  • Ruth Bruchertseifer (Researcher)
  • Floor Buschenhenke (Researcher)
  • Joanna Byszuk (Research Associate, Computational Stylistics Group)
  • Sally Chambers (Digital Humanities Research Coordinator, Ghent Centre for Digital Humanities)
  • Mag. Phil. Vera Maria Charvat (Researcher)
  • Mgr. Silvie Cinková Ph.D. (Researcher, Charles University)
  • Tess Dejaeghere (Researcher)
  • Anna Dijkstra (Work Package 4 Coordinator)
  • Julia Dudar (Researcher)
  • DI Matej ÄŽurÄŤo (Researcher)
  • Evgeniia Fileva (Researcher, University of Trier)
  • Vicky Garnett (Training and Education Officer, DARIAH-EU)
  • Françoise Gouzi (Open Science Officer, DARIAH-EU)
  • Dr. Sarah Hoover (Postdoctoral Researcher, NUI Galway)
  • BartĹ‚omiej Kunda (Coordinating Manager, Institute of Polish Language)
  • Prof. Dr. Els Lefever (Associate Professor, Ghent University)
  • PD Dr. MichaĹ‚ Mrugalski (Researcher)
  • Dr. Ciara L. Murphy (Postdoctoral Researcher, NUI Galway)
  • Dr. Carolin Odebrecht (Researcher)
  • Eliza Papaki (Researcher)
  • Marco Raciti (Researcher)
  • Dr.

Affiliated Institutions

-

External Partners

  • Austrian Academy of Sciences
  • Charles University
  • Digital Research Infrastructure for the Arts and Humanities
  • Ghent Centre for Digital Humanities, Ghent University
  • Belgrade Centre for Digital Humanities
  • Huygens Institute for the History of the Netherlands (Royal Netherlands Academy of Arts and Sciences)
  • Trier Center for Digital Humanities, Trier University
  • Moore Institute, National University of Ireland Galway
  • The Trinity Centre for Digital Humanities, Trinity College Dublin
  • Institute of Polish Language at the Polish Academy of Sciences
  • University of Potsdam
  • National University of Distance Education
  • École Normale SupĂ©rieure de Lyon
  • Humboldt University of Berlin

Project Contents

Goals

  • Establishment of a shared, sustainable infrastructure for computational literary studies in Europe
  • Standardization and unification of data, tools, and methods in literary studies
  • Improvement of access to and reusability of multilingual literary data
  • Promotion of collaboration between well-resourced and less well-resourced research institutions
  • Expansion of the application possibilities of computational literary analysis beyond academic boundaries (e.g., in journalism, politics, medicine)

Work Packages

  • WP1: Project Management and Coordination
  • WP2: Communication, Dissemination, and Exploitation
  • WP3: Methodological Considerations and Community Building
  • WP4: Training and Skills Development
  • WP5: Data Landscape Review and Institutional Perspectives
  • WP6: Data Inventory and Toolkit Development
  • WP7: Building the Ecosystem of Programmable Corpora
  • WP8: NLP Toolchains and Corpus Enrichment
  • WP9: Transnational Access (TNA) Programme

Methods

  • Stylometry (Multilingual Stylometry Showcase)
  • Network analysis (Detecting Small Worlds in a Corpus of Thousands of Theater Plays)
  • Aspect-based Sentiment Analysis (ABSA)
  • Named Entity Recognition (NER)
  • Relational Extraction (REX)
  • Text mining
  • Natural Language Processing (NLP)
  • Programmable Corpora
  • TEI Standard (Text Encoding Initiative)
  • Versioning of living and programmable corpora
  • Data annotation
  • Corpus management
  • Corpus enrichment
  • Multilingual workflow development
  • Open Science practices
  • Research data management
  • FAIR data (Findable, Accessible, Interoperable, Reusable)
  • Linked Open Data (LOD)
  • Retrieval-Augmented Generation (RAG)
  • Generative KI approaches
  • Corpus formation and consolidation
  • Data standardization
  • Transformation toolbox (VELD mechanism)
  • Open-Source toolchains
  • Jupyter Notebooks for documentation and reuse of workflows
  • Corpus enrichment and NLP toolchains
  • Scansion analysis and visualization (Poetrylab + rantanplan)
  • Metric development for computer-aided drama analysis

Expected Outcomes

  • Establishment of a shared, sustainable infrastructure for computational literary studies in Europe
  • Standardization and unification of literary data, methods, and tools
  • Improved access and reusability of literary data through uniform standards and interoperability
  • Creation of a central catalog (CLSCor) for discoverability of literary corpora and tools
  • Development of a transformation tool (VELD) for harmonizing and converting data formats
  • Creation of Programmable Corpora with open APIs for machine-readable texts
  • Promotion of research reproducibility through versioning of corpora and APIs
  • Expansion of research competencies through training schools and educational offerings for researchers with diverse backgrounds
  • Development of tools and workflows for annotation, NLP processing, and data analysis in multilingual contexts
  • Creation of a comprehensive toolkit for data sharing and management throughout the research data lifecycle
  • Strengthening collaboration between research institutions and enabling transnational access to key resources (TNA Fellowships)
  • Increased

Contact

Contact Person: - Dr. Julie M. Birkholz - Ingo Börner - Ruth Bruchertseifer - Floor Buschenhenke - Joanna Byszuk - Sally Chambers - Mag. Phil. Vera Maria Charvat - Mgr. Silvie Cinková Ph.D. - Tess Dejaeghere - Anna Dijkstra - Julia Dudar - DI Matej Ďurčo - Prof. Maciej Eder - Dr Jennifer Edmond - Evgeniia Fileva - Vicky Garnett - Françoise Gouzi - Dr Sarah Hoover - Dr Michal Křen - Bartłomiej Kunda - Prof. Dr. Els Lefever - PD Dr. Michał Mrugalski - Dr Ciara L. Murphy - Dr. Carolin Odebrecht - Eliza Papaki - Marco Raciti - Dr Emily Ridge - Ass. Prof. Salvador Ros - Prof. Dr. Christof Schöch - Dr Artjoms Šeļa - Dr Justin Tonra - Dr. Erzsébet Tóth-Czifra - Prof Dr Peer Trilcke - Prof. Dr Karina van Dalen-Oskam - Lisanne M. van Rossum rMA - Vera Yakupova - Dr Joris van Zundert
Email: info@clsinfra.io
Project Website: https://clsinfra.io/


Recorded: 2026-01-14
Source: https://clsinfra.io/

Visit Website