Laboratoire Lattice - UMR 8094
ENS-CNRS
1 rue Maurice Arnoux, 92120 Montrouge
Pablo Ruiz-Fabo
(ancien membre)
Doctorant
Ancien Membre, ENS
Thesis Title: Concept-based and Relation-based Corpus Navigation: Applications of Natural Language Processing in Digital Humanities
Supervisor: Thierry Poibeau
Prototypes: apps code
Funding: Allocation de thèse Région Île-de-France 2014-2017
Defended on: June 23, 2017
Publications (Peer-reviewed)
Conference Proceedings
Ruiz Fabo, Pablo, Clara Martínez Cantón, Thierry Poibeau and Elena González-Blanco. (2017). Enjambment detection in a large diachronic corpus of Spanish sonnets. In LaTeCH-CLFL 2017, Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature. Vancouver, Canada.
Ruiz, Pablo, Clément Plancq, and Thierry Poibeau. (2016). More than Word Cooccurrence: Exploring Support and Opposition in International Climate Negotiations with Semantic Parsing. In Proceedings of LREC, Tenth International Conference on Language Resources and Evaluation, pp. 1902-1907. Portorož, Slovenia. [UI Link]
Ruiz, Pablo, and Thierry Poibeau (2015). Combining Open Source Annotators for Entity Linking through Weighted Voting. In Proceedings of *SEM 2015. Fourth Joint Conference on Lexical and Computational Semantics. Denver, U.S. [Data]
Ruiz, Pablo, Thierry Poibeau, and Frédérique Mélanie (2015). ELCO3: Entity Linking with Corpus Coherence Combining Open Source Annotators. In Proceedings of the Demonstrations at NAACL 2015. Denver, U.S. [UI Link]
Ruiz, Pablo, and Thierry Poibeau (2015). EL92: Entity Linking Combining Open Source Annotators via Weighted Voting. In Proceedings of SemEval 2015, 355-359. Denver, U.S.
Ruiz, Pablo, Aitor Álvarez, and Haritz Arzelus. (2014) Phoneme Similarity Matrices to Improve Long Audio Alignment for Automatic Subtitling. In LREC, Ninth International Conference on Language Resources and Evaluation, pp. 437-442. [Resources]
Álvarez, Aitor, Haritz Arzelus, and Pablo Ruiz. (2014). Long audio alignment for automatic subtitling using different phone-relatedness measures. In ICASSP 2014, IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6280-6284. [Resources]
Álvarez, Aitor, Pablo Ruiz and Haritz Arzelus. (2014). Improving a long audio aligner through phone-relatedness matrices for English, Spanish and Basque. In TSD, 17th International Conference on Text, Speech and Dialogue, pp. 476-483. [Resources]
Ruiz, Pablo, Montse Cuadros, and Thierry Etchegoyhen. (2013) Lexical Normalization of Spanish Tweets with Preprocessing Rules, Domain-specific Edit Distances, and Language Models. In Tweet-Norm@ SEPLN, pp. 59-63. IV Congreso Español de Informática.
Conference Abstracts
Ruiz Fabo, Pablo, Helena Bermúdez Sabel, Clara Martínez Cantón, Elena González-Blanco and Borja Navarro-Colorado. (2018). The Diachronic Spanish Sonnet Corpus (DISCO): TEI and Linked Open Data Encoding, Data Distribution and Metrical Findings. In Digital Humanities Conference (DH 2018), Mexico City.
Martínez Cantón, Clara, Pablo Ruiz Fabo and Elena González-Blanco (2018). ANJA, ¿dónde están los encabalgamientos? In Digital Humanities Conference (DH 2018), Mexico City
Ruiz Fabo, Pablo, Clara Martínez Cantón and José Calvo Tello. (2018). DISCO: Diachronic Spanish Sonnet Corpus. Digital Humanities im deutschsprachigen Raum (DHd 2018)
Ruiz Fabo, Pablo, Clara Martínez Cantón and Thierry Poibeau. (2017). Distant Rhythm: Automatic enjambment detection on four centuries of Spanish sonnets. In Digital Humanities Conference (DH 2017). Montréal, Canada.
Martínez Cantón, Clara, Pablo Ruiz Fabo, Elena González-Blanco and Thierry Poibeau. (2017). Automatic enjambment detection as a new source of evidence in Spanish versification. In Plotting Poetry: On Mechanically-Enhanced Reading / Machiner la poésie: Sur les lectures appareillées. Basel, Switzerland.
Martínez Cantón, Clara, Pablo Ruiz Fabo and Borja Navarro-Colorado. (2017). A caballo entre el verso y las humanidades digitales. La evaluación de herramientas como modo de aprendizaje en el aula. In III Congreso Internacional de Humanidades Digitales Hispánicas. Málaga, Spain.
Nanni, Federico and Pablo Ruiz. (2016). Entities as topic labels: Improving topic interpretability and evaluability combining Entity Linking and Labeled LDA. In Digital Humanities Conference (DH 2016). Kraków, Poland.
Mélanie, Frédérique, Estelle Tieberghien, Pablo Ruiz, Thierry Poibeau, Tim Causer and Melissa Terras (2016). Mapping the Bentham Corpus. In Digital Humanities Conference (DH 2016). Kraków, Poland. [UI Link]
Ruiz, Pablo, Clément Plancq, and Thierry Poibeau. (2016). Climate Negotiation Analysis. In Digital Humanities Conference (DH 2016). Kraków, Poland. [Slides] [UI Link]
Poibeau, Thierry and Pablo Ruiz. (2015). Generating Navigable Semantic Maps from Social Sciences Corpora. In Digital Humanities Conference (DH 2015). Sydney, Australia.
Poibeau, Thierry, Melissa Terras, Pablo Ruiz Fabo, Steven Gray, Glenn Roe. (2015). Workshop Visualizing Data for Digital humanities: Producing Semantic Maps with Information extracted from Corpora and other Media. In Digital Humanities Conference (DH 2015).
Journal Articles
Martínez Cantón, Clara, and Pablo Ruiz Fabo. (Accepted). La evaluación de herramientas como modo de aprendizaje e introducción de las Humanidades Digitales en el aula universitaria. La experiencia docente « Poesía distante ». To appear in Didáctica, Lengua y Literatura. Madrid: Universidad Complutense.
Pablo Ruiz Fabo, Thierry Poibeau. (2019). Mapping the Bentham Corpus: Concept-based Navigation. Journal of Data Mining and Digital Humanities, Special Issue Atelier Digit_Hum. Episciences.org. hal-01915730
Lauscher, Anne, Federico Nanni, Pablo Ruiz Fabo, and Simone Ponzetto. (2016). Entities as Topic Labels: Combining Entity Linking and Labeled LDA to Improve Topic Interpretability and Evaluability. IJCoL, Italian Journal of Computational Linguistics, 2(2): 67-87. Special Issue on Digital Humanities and Computational Linguistics.
Ruiz, Pablo, Montse Cuadros, and Thierry Etchegoyhen. (2014). Lexical Normalization of Spanish Tweets with Rule-Based Components and Language Models. Procesamiento del Lenguaje Natural 52: 45-52. SEPLN, Spanish NLP Society.
Corpora
– Ruiz Fabo, Pablo, Helena Bermúdez Sabel, Clara Martínez Cantón, and José Calvo Tello. (2017). Diachronic Spanish Sonnet Corpus (DISCO). Madrid: UNED. https://github.com/pruizf/disco https://doi.org/10.5281/zenodo.1012567
Supporting Materials
– GitHub profile
– ANJA (Automatic eNJambment Analyzer): Online interface for enjambment detection in Spanish.
– Proposition Extraction Demo: Navigating actors and their statements in the Earth Negotiations Bulletin, a corpus on international climate negotiations
– Entity Linking Demo: Case study on the PoliInformatics corpus, about the American financial crisis of 2008
– Mapping the Bentham corpus: Wikification and visualization of over 25,000 pages of manuscripts by the philosopher and social reformer Jeremy Bentham (collaboration with UCL).
– Digital Humanities 2015 Workshop: Visualizing corpora based on concepts and entities
– Phoneme Similarity: Similarity metrics for sequence alignment applied to automatic subtitling
Publications (Invited)
Martínez Cantón, Clara, Petr Plecháč, Pablo Ruiz Fabo, and Levente Seláf (2017). [Plotting Poetry: On Mechanically Enhanced Reading, 5th-7th October, Basel, Switzerland [Chronicle]->https://doi.org/10.12697/smp.2017.4.2.05]. Studia Metrica et Poetica, 4(2), 126-137.
Talks (Invited)
(with Helena Bermúdez Sabel) Linked Open Data: Unchain your Corpora. CLiGS Group, Chair for Computational Philology, Universität Würzburg. April 2018. Poster Blog
Detección de rasgos métricos y encabalgamiento: aplicaciones. Institut d’études hispaniques, Université Paris IV-Sorbonne. April 2018. Slides
Contribuciones del Procesamiento del lenguaje natural a la navegación de corpus digitales. BNE, National Library of Spain. December 2017. Slides Video
Visualizing the Transcribe Bentham Corpus. UCLDH Seminar, London, UK. December 2016 [Slides]
Identifier dans un corpus des acteurs, des concepts et les relations entre eux. Action Nationale de la Formation du réseau MATE-SHS. CNRS. (Fréjus, France, November 2016) [Slides]
Análisis estilístico y métrico: Automatización y evaluación. Online talks for UNED (Madrid, Spain). October 2016 and January 2017. Slides1 Slides2
Gephi et Cytoscape, pour une visualisation des données Sciences humaines et sociales / Sciences de la vie et de la terre. EPHE, Paris, France (February 2016) [Slides in English]
Natural Language Processing in Digital Humanities: application examples. At IXA NLP Group’s Seminar, University of the Basque Country (San Sebastian, Spain, January 2016) [Slides]
Méthodes de traitement automatique des langues (TAL) en Humanités numériques : Entity Linking et Extraction de propositions. Médialab, SciencesPo. Paris, France (December 2015). [Slides]
Application de la résolution réferentielle d’entités (entity linking) au domaine des Humanités numériques. Journée data science — social science. Institut des Systèmes Complexes, Paris, France (November 2015) [Slides]