Projects funded by the NCN


Information on the principal investigator and host institution

Information of the project and the call

Keywords

Equipment

Delete all

Large-Scale Text Analysis and Methodological Foundations of Computational Stylistics

2017/26/E/HS2/01019

Keywords:

computational stylistics stylometry quantitative linguistics large corpora Big Data

Descriptors:

  • HS2_6:
  • HS2_1:
  • HS2_2:

Panel:

HS2 - Culture and cultural production: literary theory and comparative literature, history of literature, linguistics, library science, cultural studies, arts, architecture

Host institution :

Instytut Języka Polskiego PAN

woj. małopolskie

Other projects carried out by the institution 

Principal investigator (from the host institution):

dr hab. Maciej Jakub Eder 

Number of co-investigators in the project: 6

Call: SONATA BIS 7 - announced on 2017-06-14

Amount awarded: 1 258 200 PLN

Project start date (Y-m-d): 2018-05-08

Project end date (Y-m-d): 2024-11-07

Project duration:: 78 months (the same as in the proposal)

Project status: Project settled

Project description

Download the project description in a pdf file

Note - project descriptions were prepared by the authors of the applications themselves and placed in the system in an unchanged form.

Information in the final report

  • Publication in academic press/journals (12)
  • Articles in post-conference publications (11)
  • Book publications / chapters in book publications (4)
  1. Stylistic fingerprints, POS tags and inflected languages: A case study in Polish
    Authors:
    Maciej Eder, Rafał Górski
    Academic press:
    Journal of Quantitative Linguistics (rok: 2023, tom: 30, strony: 86-103), Wydawca: Taylor & Francis
    Status:
    Published
    DOI:
    10.1080/09296174.2022.2122751 - link to the publication
  2. Stylistic change in early modern Spanish poetry through network analysis (with an especial focus on Fernando de Herrera's role)
    Authors:
    Laura Hernandez Lorenzo
    Academic press:
    Neophilologus (rok: 2022, tom: 106, strony: 397–417), Wydawca: Springer
    Status:
    Published
    DOI:
    10.1007/s11061-021-09717-2 - link to the publication
  3. Słowozbiory Tekstów Drugich
    Authors:
    Maciej Maryl, Maciej Eder
    Academic press:
    Teksty Drugie (rok: 2023, tom: 33, strony: 346-364), Wydawca: IBL PAN
    Status:
    Published
    DOI:
    10.18318/td.2023.1.21 - link to the publication
  4. The Voices of Doctor Who – How Stylometry Can be Useful in Revealing New Information About TV Series
    Authors:
    Joanna Byszuk
    Academic press:
    Digital Humanities Quarterly (rok: 2020, tom: 14, strony: 25934), Wydawca: Association for Computers and the Humanities
    Status:
    Published
  5. Topic Modeling, Long Texts and the Best Number of Topics: Some Problems and Solutions
    Authors:
    Stefano Sbalchiero, Maciej Eder
    Academic press:
    Quantity & Quality (rok: 2020, tom: 54, strony: 1095–1108), Wydawca: Springer
    Status:
    Published
    DOI:
    10.1007/s11135-020-00976-w - link to the publication
  6. Erinevused, kaugused ja sõrmejäljed: Stilomeetria ja mitmemõõtmelise tekstianalüüsi alused [Differences, distances and fingerprints: the fundamentals of stylometry and multivariate text analysis]
    Authors:
    Šeļa, Artjoms
    Academic press:
    Keel ja Kirjandus (rok: 2021, tom: 45878, strony: 696-718), Wydawca: Estonian Academy of Sciences
    Status:
    Published
    DOI:
    10.54013/kk764a3 - link to the publication
  7. Computational thematics: Comparing algorithms for clustering the genres of literary fiction
    Authors:
    Oleg Sobchuk, Artjoms Šeļa
    Academic press:
    Social Sciences Communications (rok: 2024, tom: 11, strony: 438), Wydawca: Nature
    Status:
    Published
    DOI:
    10.1057/s41599-024-02933-6 - link to the publication
  8. Detecting Ottokar II's 1248–1249 uprising and its instigators in co-witnessing networks
    Authors:
    Jeremi Ochab, Jan Škvrňák, Michael Škvrňák
    Academic press:
    Historical Methods: A Journal of Quantitative and Interdisciplinary History (rok: 2022, tom: 55, strony: 189-208), Wydawca: Taylor & Francis
    Status:
    Published
    DOI:
    10.1080/01615440.2022.2065397 - link to the publication
  9. The fall of genres that did not happen: formalising history of the universal semantics of Russian iambic tetrameter
    Authors:
    Antonina Martynenko, Artjoms Šeļa
    Academic press:
    Studia Metrica et Poetica (rok: 2023, tom: 10, strony: 89-111), Wydawca: University of Tartu Press
    Status:
    Published
  10. Challenging Stylometry: the authorship of the baroque play La Segunda Celestina
    Authors:
    Laura Hernandez Lorenzo, Joanna Byszuk
    Academic press:
    Digital Scholarship in the Humanities (rok: 2023, tom: 38, strony: 544–558), Wydawca: Oxford University Press
    Status:
    Published
    DOI:
    10.1093/llc/fqac063 - link to the publication
  11. La prosa de Gustavo Adolfo Bécquer en los límites de la poesía: Análisis estilométrico
    Authors:
    Laura Hernandez Lorenzo
    Academic press:
    apropos [Perspektiven auf die Romania] (rok: 2022, tom: 9, strony: 37-56), Wydawca: Hamburg Universitat
    Status:
    Published
    DOI:
    0.15460/apropos.9.1875 - link to the publication
  12. Metronome: tracing variation in poetic meters via local sequence alignment
    Authors:
    Ben Nagy, Artjoms Šeļa, Mirella De Sisto, Petr Plecháč
    Status:
    Accepted for publication
  1. Feature Selection in Authorship Attribution: Ordering the Wordlist
    Authors:
    Joanna Byszuk, Maciej Eder
    Conference:
    Digital Humanities 2019: Book of Abstratcs (rok: 2019, ), Wydawca: University of Utrecht
    Data:
    konferencja 43658
    Status:
    Published
  2. Improving the performance of word frequencies in authorship attribution
    Authors:
    Maciej Eder
    Conference:
    Proceedings of the 16th International Conference on Statistical Analysis of Textual Data (rok: 2022, ), Wydawca: VadiStat
    Data:
    konferencja 6-8.07.2022
    Status:
    Published
  3. Manhattan, Euclidean, and their siblings: exploring exotic similarity measures in text classification
    Authors:
    Maciej Eder, Jeremi Ochab
    Conference:
    Digital Humanities 2024: Book of Abstracts (rok: 2024, ), Wydawca: George Mason University Press
    Data:
    konferencja 6-9.08.2024
    Status:
    Published
  4. Weak Genres: Modeling Association Between Poetic Meter and Meaning in Russian Poetry
    Authors:
    Artjoms Šeļa, Boris Orekhov, Roman Leibov
    Conference:
    CHR 2020: Workshop on Computational Humanities Research (rok: 2020, ), Wydawca: CEUR-WS.org
    Data:
    konferencja 18–20.11.2020
    Status:
    Published
  5. Using Word Embeddings for Validation and Enhancement of Spatial Entity Lists
    Authors:
    Berenike Herrmann, Joanna Byszuk, Giulia Grisot
    Conference:
    Digital Humanities 2022 (rok: 2022, ), Wydawca: University of Tokyo
    Data:
    konferencja 25-29.07.2022
    Status:
    Published
  6. Stylometric investigations into translationese: The Baby-Sitters Club across languages
    Authors:
    Joanna Byszuk, Quinn Dombrowski
    Conference:
    Proceedings of the 16th International Conference on Statistical Analysis of Textual Data (rok: 2022, ), Wydawca: VadiStat
    Data:
    konferencja 6-8.07.2022
    Status:
    Published
  7. Identifying Similarities in Text Analysis: Hierarchical Clustering (Linkage) versus Network Clustering (Community Detection)
    Authors:
    Jeremi K. Ochab, Joanna Byszuk, Steffen Pielström, Maciej Eder
    Conference:
    Digital Humanities 2019: Book of Abstracts (rok: 2019, ), Wydawca: University of Utrecht
    Data:
    konferencja 43657
    Status:
    Published
  8. Boosting word frequencies in authorship attribution
    Authors:
    Maciej Eder
    Conference:
    CHR 2022: Computational Humanities Research Conference (rok: 2022, ), Wydawca: CEUR-WS.org
    Data:
    konferencja 12-14.12.2022
    Status:
    Published
  9. Detecting direct speech in multilingual collection of 19th-century novels
    Authors:
    Joanna Byszuk, Michał Woźniak, Mike Kestemont, Albert Leśniak, Wojciech Łukasik, Artjoms Šeļa, Maciej Eder
    Conference:
    Language Resources and Evaluation (LREC) (rok: 2020, ), Wydawca: European Language Resources Association (ELRA)
    Data:
    konferencja 11-16.05.2020
    Status:
    Published
  10. One word to rule them all: Understanding word embeddings for authorship attribution
    Authors:
    Maciej Eder, Artjoms Šeļa
    Conference:
    Digital Humanities 2022 (rok: 2022, ), Wydawca: University of Tokyo Press
    Data:
    konferencja 25–29.07.2022
    Status:
    Published
  11. Measuring Rhythm Regularity in Verse: Entropy of Inter-Stress Intervals
    Authors:
    Artjoms Šeļa, Mikhail Gronas
    Conference:
    CHR 2022: Computational Humanities Research Conference (rok: 2022, ), Wydawca: CEUR-WS.org
    Data:
    konferencja 12-14.12.2022
    Status:
    Published
  1. Tekst w humanistyce cyfrowej. Modelowanie tematyczne
    Authors:
    Maciej Eder
    Book:
    Od Gutenberga do Zuckerberga. Wstęp do humanistyki cyfrowej (rok: 2023, tom: b.d., strony: 129-141), Wydawca: Universitas
    Status:
    Published
  2. On computers in text analysis
    Authors:
    Joanna Byszuk
    Book:
    The Bloomsbury Handbook to the Digital Humanities (rok: 2023, tom: b.d., strony: 159–168), Wydawca: James O'Sullivan
    Status:
    Published
  3. Text v digitálních humanitních vědách: tematické modelování
    Authors:
    Maciej Eder
    Book:
    Od Gutenberga k Zuckerbergovi. Úvod do digitálních humanitních věd (rok: 2024, tom: 1, strony: 133-147), Wydawca: Universitas
    Status:
    Published
  4. From stage to page: language independent bootstrap measures of distinctiveness in fictional speech
    Authors:
    Artjoms Šeļa, Ben Nagy, Joanna Byszuk, Laura Hernández-Lorenzo, Botond Szemes, Maciej Eder
    Book:
    Workshop on Computational Drama Analysis: Achievements and Opportunities (rok: 2024, tom: 1, strony: 149-166), Wydawca: de Gruyter
    Status:
    Published