Directeur de recherche au CNRS ; Directeur du laboratoire
I am a CNRS Director of Research and head of the LATTICE laboratory (Langues, Textes, Traitements informatiques et Cognition) since 2012. I am also an Affiliated Lecturer at the Department of Theoretical and Applied Linguistics (DTAL) of the University of Cambridge. During the academic year 2008-2009, I was a Visiting Research Fellow at Corpus Christi College.
From 2003 to 2009, I worked as a CNRS Research Fellow at Laboratoire d’Informatique de Paris-Nord. In 2002-2003, I was an associate professor at the Centre de Recherche en Ingénierie Multilingue (CRIM) within the Institut National des Langues et Civilisations Orientales (INaLCO) and before that a research engineer at Thales Recherche et Technologie (1998−2002).
I mainly work on Natural Language Processing (NLP), especially on the following topics : Information Extraction, Question Answering, Semantic Zoning, Knowledge Acquisition from text and Named Entity tagging. Apart from NLP, my main interests include Language Acquisition, Cognitive Science, Epistemology and the History of Linguistics.
I have recently initiated a small research group focusing on the application of NLP techniques in the context of Digital Humanities projects. I am also especially interested in Finnic (i.e. Finnish and closely related languages) and more generally Uralic languages.
Most of my publications are referenced in the open repository HAL, with a PDF version for open source documents.
Teaching and lecturing
I am regularly lecturing in various institutions (Ecole normale supérieure, Univ. Paris 3 - Sorbonne nouvelle, INaLCO, University of Cambridge), mainly on Computational linguistics (esp. Information extraction and Text mining), Corpus linguistics and Digital Humanities.
Since 2015, I am part of a new graduate level programme in Digital Humanities held at Paris Sciences et Lettres. A PSL Master in Digital Humanities should be created in 2017.
I am part of three research projects :
I am the PI of a project called LAKME (Linguistically Annotated Corpora Using Machine Learning Techniques) funded by PSL for 2015-2017. Lakme will explore new techniques for the annotation of textual corpora of morphology rich languages (Medieval French, Rabbinic Hebrew and diverse Finno-Ugric languages)
I am the PI for LATTICE of a European project called ATLANTIS (Artificial Language understanding In Robots), 2016-2018. ATLANTIS attempts to understand and model the very first stages in grounded language learning, and will propose models and implementations for a robotic environment
I participate in a project called DEMOCRAT funded by ANR (Agence Nationale de Recherche), 2016-2019. The project will explore techniques describing and modelling reference chains : including diachronic and comparative language studies thanks to automatic annotation techniques. The PI of this project is Frédéric Landragin
I am also part of the ERC LEXICAL project in collaboration with the University of Cambridge (PI, Anna Korhonen)
In 2014-2015, I was the PI of a small scale research collaboration with the UCL Centre for Digital Humanities at University College London (see our publications on the topic during DH2015 and DH2016). I am also collaborating with the Language Technology Lab of the University of Cambridge on these topics.
Pablo Ruiz Fabo (2014-, Ecole normale supérieure ; regional PhD grant) : investigating natural language processing techniques for social sciences
Jean-Christophe Pautre (2014-, Université Paris 3, Bibliothèque Nationale de France) : implementing semantic information retrieval techniques for the BNF search engine (Gallica)
Miquel Cornudella Gaya (2014-, Ecole normale supérieure, Cifre grant with Sony CSL Paris) : modeling language evolution
past PhD students
Pierre Marchal (2010−2015, INALCO ; national PhD grant) : large scale acquisition of verbal subcategorization frames for Japanese (Pierre is now working as a research scientist for the SAP Research Centre in Paris and Boston)
Elisa Omodei (2011−2014, Ecole normale supérieure ; regional PhD grant) : Modeling the socio-semantic dynamics of scientific communities (Elisa is now a post-doctoral student at the Department of Mathematics and Computer Engineering at the Rovira i Virgili University, in Tarragona, Spain)
Zorana Ratkovic (2010−2014, Université Paris 3 ; project funded) : parsing for information extraction from texts (Zorana is now working as a research engineer for a IT company in the Paris area)
Mani Ezzat (2009−2013, INALCO ; Cifre grant with Arisem) : automatic acquisition of relations between entities (now working as a research engineer at Exalead)
Yufan Guo (2009−2013, University of Cambridge, co-supervision with Anna Korhonen ; funded by Cambridge) : text zoning of scientific texts (now working as a research engineer at IBM USA)
Cédric Messiant (2006−2010, Université Paris 13 ; national DGA grant) : automatic lexical acquisition from large corpora (now working as a research engineer at Ecreall, a IT company in Lille)
Aurélien Bossard (2006−2010, Université Paris 13 ; national PhD grant) : automatic summarization (now an associate professor at Université Paris 8)
Amanda Bouffier (2004−−2008, Université Paris 13 ; national PhD grant) : discursive analysis of medical texts (now an outstanding drummer and occasionally an independent consultant in text mining)