Director of research
CNRS, Laboratory adjunct head
I am a CNRS Director of Research and adjunct head of the LATTICE laboratory (Langues, Textes, Traitements informatiques et Cognition). I currently hold a PrAIRIe fellowship (PrAIRIe = Paris Artificial Intelligence Research Institute) in Natural language Processing and Digital Humanities. I am also an Affiliated Lecturer at the Department of Theoretical and Applied Linguistics (DTAL) of the University of Cambridge, and part of Cambridge Digital Humanities.
In 2018-2019, I was a Rutherford Visiting fellow at the Turing Institute (London / Cambridge). From 2003 to 2009, I worked as a CNRS Research Fellow at Laboratoire d’Informatique de Paris-Nord. In 2002-2003, I was an associate professor at the Centre de Recherche en Ingénierie Multilingue (CRIM) within the Institut National des Langues et Civilisations Orientales (INaLCO) and before that a research engineer at Thales Recherche et Technologie (1998-2002).
I mainly work on Natural Language Processing (NLP), especially on the following topics: Information Extraction, Question Answering, Semantic Zoning, Knowledge Acquisition from text and Named Entity tagging. Apart from NLP, my main interests include Language Acquisition, Cognitive Science, Epistemology and the History of Linguistics.
More recently I have been active in two other domains of research.
Digital Humanities is a growing field at the intersection of computational methods and the Humanities. I have recently developed a wide range of activities around this theme at LATTICE and we now have a number of running projects (esp. Oupoco and the French part of BookNLP), see here and here for recent publications. I am also involved in the PSL Master in Digital Humanities.
Last but not least, I am especially interested in language diversity and language typology. I work on Finnic (i.e. Finnish and closely related languages) and more generally Uralic languages. We have recently developed multilingual parsing models that have been applied successfully to under-resouced languages like Finnish, Saami and Komi (joint work with KyungTae Lim and Niko Partanen). See here for more information.
- Presentation of the Oupoco project, primed at IJCAI 2023 (best entertaining video involving AI)
- A podcast (France Culture) on the links between computers and poetry since the 1950s (oct. 2023, in French)
- A podcast (Radio France) on machine translation: Comment la traduction automatique s’est-elle mise à (mieux) marcher ? (le Code a changé, sept. 2020, in French)
- Contributions (in French) on TheConversation
- Artificial Intelligence and Society: What would a better AI mean? (Prairie Tutorial, oct 2022)
Recent Publications (books)
In 2017, I have published a survey on machine translation (Machine Translation, MIT Press – Essentiel knowledge).
The dream of a universal translation device goes back many decades, long before Douglas Adams’s fictional Babel fish provided this service in The Hitchhiker’s Guide to the Galaxy. Since the advent of computers, research has focused on the design of digital machine translation tools—computer programs capable of automatically translating a text from a source language to a target language. This has become one of the most fundamental tasks of artificial intelligence. This volume in the MIT Press Essential Knowledge series offers a concise, nontechnical overview of the development of machine translation, including the different approaches, evaluation issues, and market potential. The main approaches are presented from a largely historical perspective and in an intuitive manner, allowing the reader to understand the main principles without knowing the mathematical details.
This book has been translated in several languages, especially in French by Odile Jacob (Paris, May 2019). The French version has been completely updated to take into account the last developments in neural machine translation.
Other versions of the book exists in Chinese, Korean
All my publications are freely available online on the HAL server.
Teaching and lecturing
I am regularly lecturing in various institutions
- Natural Language Processing for Digital Humanities, Master in Digital Humanities, at Paris Sciences et Lettres (PSL)
- Computational and corpus linguistics at the University of Cambridge
I am especially involved in the Master in Digital Humanities of Paris Sciences et Lettres (with the Ecole des chartes and the Ecole pratique des hautes études). I am the ENS representative in this programme.
I am also involved in two ‘intensive PSL weeks’ on ‘Ethics and AI’ for the first one, and on ‘Digital Humanities and AI’ (DHAI) for the second one, held each year, in Spring (DHAI) and Autumn (Ethics and AI).
- I hold a PRAIRIE chair (PRAIRIE si the 3IA Institute of Paris). Within PRAIRIE, I mainly work on natural language processing and digital humanities, as well as on the practical impact of AI on society.
- I am also involved in the MEDIALEX project (with Médialab/SciencesPo Paris, CREST and INA). In this project, we explore the relationships between the media and the political sphere. The project is led by Sylvain Parasie, from SciencesPo.
- Noé Durandard (2023-, Ecole normale supérieure ; in collaboration with TUM, Munich). Subjectivity in Large Language Models (LLMS) (opinions, tastes, cultural preferences) : management, encoding, personalization
- Jean Barré (2022-, Ecole normale supérieure/EUR TRanslitterae): Characterizing literary genres through computational methods
- Armin Pournaki (2020-, Ecole normale supérieure/EUR TRanslitterae – cotutelle avec l’U. Leipzig, J. Jost)
- Salomé Do (2019-, Ecole normale supérieure/PRAIRIE)
Past PhD students
- Karim Lasri (2019- 2023), Ecole normale supérieure (PhD funded by PRAIRIE). Generalization within languages models (the case of Bert).
- Mylène Maignant (2018-2022, Ecole normale supérieure (PhD funded by the EUR Translitterae, ENS_PSL): Contemporary drama analysis using digital methods
- KyungTae Lim (2017-2020, Ecole normale supérieure: Multilingual Universal Dependency parsing
- Yuanfeng Lu (2017-2021, Ecole normale supérieure, PhD funded by the Chinese government): Natural language processing for a semi-automatic stylistic analysis of literary texts
- Tian Tian (2015-2019, Université Paris 3 Sorbonne nouvelle — PhD initially supervised by Isabelle Tellier, who passed away dramatically in June 2018; PhD initally funded by a Cifre grant with Synthesio) Named entity recognition in noisy texts.
- Miquel Cornudella Gaya (2014-2017, Ecole normale supérieure, Cifre grant with Sony CSL Paris): modeling language evolution (Miquel is now a post-doc at University Pompeu Fabra, Spain)
- Pablo Ruiz Fabo (2014-2017, Ecole normale supérieure; regional PhD grant): investigating natural language processing techniques for social sciences (Pablo is now a Maître de conférences at the University of Strasbourg)
- Pierre Marchal (2010-2015, INALCO; national PhD grant) : large scale acquisition of verbal subcategorization frames for Japanese (Pierre is now a research engineer at SAP Paris and Boston)
- Elisa Omodei (2011-2014, Ecole normale supérieure; regional PhD grant): Modeling the socio-semantic dynamics of scientific communities (Elisa is now a post-doctoral student at the Department of Mathematics and Computer Engineering at the Rovira i Virgili University, in Tarragona, Spain)
- Zorana Ratkovic (2010-2014, Université Paris 3; project funded): parsing for information extraction from texts (Zorana is now working as a research engineer for a IT company in the Paris area)
- Mani Ezzat (2009-2013, INALCO; Cifre grant with Arisem): automatic acquisition of relations between entities (now working as a research engineer at Exalead)
- Yufan Guo (2009-2013, University of Cambridge, co-supervision with Anna Korhonen; funded by Cambridge): text zoning of scientific texts (now working as a research engineer at IBM USA)
- Cédric Messiant (2006-2010, Université Paris 13; national DGA grant) : automatic lexical acquisition from large corpora (now working as a research engineer at Ecreall, a IT company in Lille)
- Aurélien Bossard (2006-2010, Université Paris 13; national PhD grant): automatic summarization (now an associate professor at Université Paris 8)
- Amanda Bouffier (2004–2008, Université Paris 13; national PhD grant) : discursive analysis of medical texts (now an outstanding drummer and occasionally an independent consultant in text mining)