Laboratoire Lattice - UMR 8094
ENS-CNRS
1 rue Maurice Arnoux, 92120 Montrouge

Pablo Ruiz-Fabo
(ancien membre)

Doctorant
Ancien Membre, ENS

  • Thesis Title: Concept-based and Relation-based Corpus Navigation: Applications of Natural Language Processing in Digital Humanities
  • Supervisor: Thierry Poibeau
  • Prototypes: apps code
  • Funding: Allocation de thèse Région Île-de-France 2014-2017
  • Defended on: June 23, 2017

  • Publications (Peer-reviewed)


    Conference Proceedings


  • Ruiz Fabo, Pablo, Clara Martínez Cantón, Thierry Poibeau and Elena González-Blanco. (2017). Enjambment detection in a large diachronic corpus of Spanish sonnets. In LaTeCH-CLFL 2017, Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature. Vancouver, Canada.
  • Ruiz, Pablo, Clément Plancq, and Thierry Poibeau. (2016). More than Word Cooccurrence: Exploring Support and Opposition in International Climate Negotiations with Semantic Parsing. In Proceedings of LREC, Tenth International Conference on Language Resources and Evaluation, pp. 1902-1907. Portorož, Slovenia. [UI Link]
  • Ruiz, Pablo, and Thierry Poibeau (2015). Combining Open Source Annotators for Entity Linking through Weighted Voting. In Proceedings of *SEM 2015. Fourth Joint Conference on Lexical and Computational Semantics. Denver, U.S. [Data]
  • Ruiz, Pablo, Thierry Poibeau, and Frédérique Mélanie (2015). ELCO3: Entity Linking with Corpus Coherence Combining Open Source Annotators. In Proceedings of the Demonstrations at NAACL 2015. Denver, U.S. [UI Link]
  • Ruiz, Pablo, and Thierry Poibeau (2015). EL92: Entity Linking Combining Open Source Annotators via Weighted Voting. In Proceedings of SemEval 2015, 355-359. Denver, U.S.
  • Ruiz, Pablo, Aitor Álvarez, and Haritz Arzelus. (2014) Phoneme Similarity Matrices to Improve Long Audio Alignment for Automatic Subtitling. In LREC, Ninth International Conference on Language Resources and Evaluation, pp. 437-442. [Resources]
  • Álvarez, Aitor, Haritz Arzelus, and Pablo Ruiz. (2014). Long audio alignment for automatic subtitling using different phone-relatedness measures. In ICASSP 2014, IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6280-6284. [Resources]
  • Álvarez, Aitor, Pablo Ruiz and Haritz Arzelus. (2014). Improving a long audio aligner through phone-relatedness matrices for English, Spanish and Basque. In TSD, 17th International Conference on Text, Speech and Dialogue, pp. 476-483. [Resources]
  • Ruiz, Pablo, Montse Cuadros, and Thierry Etchegoyhen. (2013) Lexical Normalization of Spanish Tweets with Preprocessing Rules, Domain-specific Edit Distances, and Language Models. In Tweet-Norm@ SEPLN, pp. 59-63. IV Congreso Español de Informática.

  • Conference Abstracts


  • Ruiz Fabo, Pablo, Helena Bermúdez Sabel, Clara Martínez Cantón, Elena González-Blanco and Borja Navarro-Colorado. (2018). The Diachronic Spanish Sonnet Corpus (DISCO): TEI and Linked Open Data Encoding, Data Distribution and Metrical Findings. In Digital Humanities Conference (DH 2018), Mexico City.
  • Martínez Cantón, Clara, Pablo Ruiz Fabo and Elena González-Blanco (2018). ANJA, ¿dónde están los encabalgamientos? In Digital Humanities Conference (DH 2018), Mexico City
  • Ruiz Fabo, Pablo, Clara Martínez Cantón and José Calvo Tello. (2018). DISCO: Diachronic Spanish Sonnet Corpus. Digital Humanities im deutschsprachigen Raum (DHd 2018)
  • Ruiz Fabo, Pablo, Clara Martínez Cantón and Thierry Poibeau. (2017). Distant Rhythm: Automatic enjambment detection on four centuries of Spanish sonnets. In Digital Humanities Conference (DH 2017). Montréal, Canada.
  • Martínez Cantón, Clara, Pablo Ruiz Fabo, Elena González-Blanco and Thierry Poibeau. (2017). Automatic enjambment detection as a new source of evidence in Spanish versification. In Plotting Poetry: On Mechanically-Enhanced Reading / Machiner la poésie: Sur les lectures appareillées. Basel, Switzerland.
  • Martínez Cantón, Clara, Pablo Ruiz Fabo and Borja Navarro-Colorado. (2017). A caballo entre el verso y las humanidades digitales. La evaluación de herramientas como modo de aprendizaje en el aula. In III Congreso Internacional de Humanidades Digitales Hispánicas. Málaga, Spain.
  • Nanni, Federico and Pablo Ruiz. (2016). Entities as topic labels: Improving topic interpretability and evaluability combining Entity Linking and Labeled LDA. In Digital Humanities Conference (DH 2016). Kraków, Poland.
  • Mélanie, Frédérique, Estelle Tieberghien, Pablo Ruiz, Thierry Poibeau, Tim Causer and Melissa Terras (2016). Mapping the Bentham Corpus. In Digital Humanities Conference (DH 2016). Kraków, Poland. [UI Link]
  • Ruiz, Pablo, Clément Plancq, and Thierry Poibeau. (2016). Climate Negotiation Analysis. In Digital Humanities Conference (DH 2016). Kraków, Poland. [Slides] [UI Link]
  • Poibeau, Thierry and Pablo Ruiz. (2015). Generating Navigable Semantic Maps from Social Sciences Corpora. In Digital Humanities Conference (DH 2015). Sydney, Australia.
  • Poibeau, Thierry, Melissa Terras, Pablo Ruiz Fabo, Steven Gray, Glenn Roe. (2015). Workshop Visualizing Data for Digital humanities: Producing Semantic Maps with Information extracted from Corpora and other Media. In Digital Humanities Conference (DH 2015).

  • Journal Articles


  • Martínez Cantón, Clara, and Pablo Ruiz Fabo. (Accepted). La evaluación de herramientas como modo de aprendizaje e introducción de las Humanidades Digitales en el aula universitaria. La experiencia docente « Poesía distante ». To appear in Didáctica, Lengua y Literatura. Madrid: Universidad Complutense.
  • Pablo Ruiz Fabo, Thierry Poibeau. (2019). Mapping the Bentham Corpus: Concept-based Navigation. Journal of Data Mining and Digital Humanities, Special Issue Atelier Digit_Hum. Episciences.org. hal-01915730
  • Lauscher, Anne, Federico Nanni, Pablo Ruiz Fabo, and Simone Ponzetto. (2016). Entities as Topic Labels: Combining Entity Linking and Labeled LDA to Improve Topic Interpretability and Evaluability. IJCoL, Italian Journal of Computational Linguistics, 2(2): 67-87. Special Issue on Digital Humanities and Computational Linguistics.
  • Ruiz, Pablo, Montse Cuadros, and Thierry Etchegoyhen. (2014). Lexical Normalization of Spanish Tweets with Rule-Based Components and Language Models. Procesamiento del Lenguaje Natural 52: 45-52. SEPLN, Spanish NLP Society.

  • Corpora


    Ruiz Fabo, Pablo, Helena Bermúdez Sabel, Clara Martínez Cantón, and José Calvo Tello. (2017). Diachronic Spanish Sonnet Corpus (DISCO). Madrid: UNED. https://github.com/pruizf/disco https://doi.org/10.5281/zenodo.1012567


    Supporting Materials


    GitHub profile
    ANJA (Automatic eNJambment Analyzer): Online interface for enjambment detection in Spanish.
    Proposition Extraction Demo: Navigating actors and their statements in the Earth Negotiations Bulletin, a corpus on international climate negotiations
    Entity Linking Demo: Case study on the PoliInformatics corpus, about the American financial crisis of 2008
    Mapping the Bentham corpus: Wikification and visualization of over 25,000 pages of manuscripts by the philosopher and social reformer Jeremy Bentham (collaboration with UCL).
    Digital Humanities 2015 Workshop: Visualizing corpora based on concepts and entities
    Phoneme Similarity: Similarity metrics for sequence alignment applied to automatic subtitling

    Publications (Invited)


  • Martínez Cantón, Clara, Petr Plecháč, Pablo Ruiz Fabo, and Levente Seláf (2017). [Plotting Poetry: On Mechanically Enhanced Reading, 5th-7th October, Basel, Switzerland [Chronicle]->https://doi.org/10.12697/smp.2017.4.2.05]. Studia Metrica et Poetica, 4(2), 126-137.

  • Talks (Invited)


  • (with Helena Bermúdez Sabel) Linked Open Data: Unchain your Corpora. CLiGS Group, Chair for Computational Philology, Universität Würzburg. April 2018. Poster Blog
  • Detección de rasgos métricos y encabalgamiento: aplicaciones. Institut d’études hispaniques, Université Paris IV-Sorbonne. April 2018. Slides
  • Contribuciones del Procesamiento del lenguaje natural a la navegación de corpus digitales. BNE, National Library of Spain. December 2017. Slides Video
  • Visualizing the Transcribe Bentham Corpus. UCLDH Seminar, London, UK. December 2016 [Slides]
  • Identifier dans un corpus des acteurs, des concepts et les relations entre eux. Action Nationale de la Formation du réseau MATE-SHS. CNRS. (Fréjus, France, November 2016) [Slides]
  • Análisis estilístico y métrico: Automatización y evaluación. Online talks for UNED (Madrid, Spain). October 2016 and January 2017. Slides1 Slides2
  • Gephi et Cytoscape, pour une visualisation des données Sciences humaines et sociales / Sciences de la vie et de la terre. EPHE, Paris, France (February 2016) [Slides in English]
  • Natural Language Processing in Digital Humanities: application examples. At IXA NLP Group’s Seminar, University of the Basque Country (San Sebastian, Spain, January 2016) [Slides]
  • Méthodes de traitement automatique des langues (TAL) en Humanités numériques : Entity Linking et Extraction de propositions. Médialab, SciencesPo. Paris, France (December 2015). [Slides]
  • Application de la résolution réferentielle d’entités (entity linking) au domaine des Humanités numériques. Journée data science — social science. Institut des Systèmes Complexes, Paris, France (November 2015) [Slides]