IRN “Cyclades” project

Cyclades in a nutshell

  • Cyclades = Corpora and Computational Linguistics for Digital Humanities
  • Counties involved: France (Lattice; the SciencesPo médialab; The French national Library – BnF), UK (Univ. of Cambridge, The Turing Institute, the British Library); Germany (Göttingen Centre for Digital Humanities), USA (Stanford Literary Lab)
  • The network involves 8 research labs and institutions from 4 different countries
  • Interdisciplinary Project


Social sciences and Humanities research is often based on large textual corpora, that it would be unfeasible to read in detail. Natural Language Processing (NLP) can identify important concepts and actors mentioned in a corpus, as well as the relations between them. Such information can provide an overview of the corpus useful for domain-experts, and help identify corpus areas relevant for a given research question. However, existing technology is not robust enough, and often needs to be adapted to address specific needs expressed in the digital humanities community (either in social sciences or in literature).

Project goals

  • The collaboration between project participants will make it possible to :
    • Develop new areas of research
    • Develop new techniques for text analysis
    • Adapt tools to new problems and
    • Evaluate the solutions proposed in an original way, both qualitative and quantitative.


  • France
    • Lattice
    • SciencesPo médialab
    • The French national Library – BnF
  • UK
    • Univ. of Cambridge (Cambridge DH Network, LTL)
    • The Turing Institute
    • The British Library
  • Germany (Göttingen Centre for Digital Humanities)
  • USA (Stanford Literary Lab)


  • Technological Work Packages
    • WP1. Named entity recognition and entity linking
    • WP2. Extraction of specific patterns from text
    • WP3. Advanced Textual Content Representation and Visualization
  • Applicative Work Packages
    • WP4. Application to Literature
    • WP5. Applications to Social sciences