LaTTiCe - UMR 8094
1 rue Maurice Arnoux
Tél. : +33(0) 1 58 07 66 20
Fax.: +33(0) 1 58 07 66 29

Thierry Poibeau

Directeur de recherche au CNRS ; Directeur du laboratoire

I am a CNRS Director of Research and head of the LATTICE labo­ratory (Langues, Textes, Trai­te­ments infor­ma­tiques et Cog­nition) since 2012. I am also an Affi­liated Lec­turer at the Department of Theo­re­tical and Applied Linguistics (DTAL) of the Uni­versity of Cambridge. During the aca­demic year 2008-​​2009, I was a Visiting Research Fellow at Corpus Christi College.

From 2003 to 2009, I worked as a CNRS Research Fellow at Labo­ra­toire d’Informatique de Paris-​​Nord. In 2002-​​2003, I was an asso­ciate pro­fessor at the Centre de Recherche en Ingé­nierie Multilingue (CRIM) within the Ins­titut National des Langues et Civi­li­sa­tions Orien­tales (INaLCO) and before that a research engineer at Thales Recherche et Technologie (19982002).

I mainly work on Natural Lan­guage Pro­cessing (NLP), espe­cially on the fol­lowing topics : Infor­mation Extraction, Question Ans­wering, Semantic Zoning, Know­ledge Acqui­sition from text and Named Entity tagging. Apart from NLP, my main interests include Lan­guage Acqui­sition, Cog­nitive Science, Epis­te­mology and the History of Linguistics.

I have recently ini­tiated a small research group focusing on the appli­cation of NLP tech­niques in the context of Digital Huma­nities pro­jects. I am also espe­cially inter­ested in Finnic (i.e. Finnish and closely related lan­guages) and more gene­rally Uralic lan­guages.


Most of my publi­ca­tions are refe­renced in the open repo­sitory HAL, with a PDF version for open source documents.

Teaching and lecturing

I am regu­larly lec­turing in various ins­ti­tu­tions (Ecole normale supé­rieure, Univ. Paris 3 - Sor­bonne nou­velle, INaLCO, Uni­versity of Cam­bridge), mainly on Com­pu­ta­tional lin­guistics (esp. Infor­mation extraction and Text mining), Corpus lin­guistics and Digital Humanities.

Since 2015, I am part of a new gra­duate level pro­gramme in Digital Huma­nities held at Paris Sciences et Lettres. A PSL Master in Digital Huma­nities should be created in 2017.

Research projects

I am part of three research projects :

- I am the PI of a project called LAKME (Lin­guis­ti­cally Anno­tated Corpora Using Machine Learning Tech­niques) funded by PSL for 2015-​​2017. Lakme will explore new tech­niques for the anno­tation of textual corpora of mor­phology rich lan­guages (Medieval French, Rab­binic Hebrew and diverse Finno-​​Ugric lan­guages)
- I am the PI for LATTICE of a European project called ATLANTIS (Arti­ficial Lan­guage unders­tanding In Robots), 2016-​​2018. ATLANTIS attempts to understand and model the very first stages in grounded lan­guage learning, and will propose models and imple­men­ta­tions for a robotic envi­ronment
- I par­ti­cipate in a project called DEMOCRAT funded by ANR (Agence Nationale de Recherche), 2016-​​2019. The project will explore tech­niques des­cribing and modelling refe­rence chains : including dia­chronic and com­pa­rative lan­guage studies thanks to auto­matic anno­tation tech­niques. The PI of this project is Fré­déric Landragin

I am also part of the ERC LEXICAL project in col­la­bo­ration with the Uni­versity of Cam­bridge (PI, Anna Korhonen)

In 2014-​​2015, I was the PI of a small scale research col­la­bo­ration with the UCL Centre for Digital Humanities at Uni­versity College London (see our publi­ca­tions on the topic during DH2015 and DH2016). I am also col­la­bo­rating with the Lan­guage Tech­nology Lab of the Uni­versity of Cam­bridge on these topics.

PhD students


- Pablo Ruiz Fabo (2014-​​, Ecole normale supé­rieure ; regional PhD grant) : inves­ti­gating natural lan­guage pro­cessing tech­niques for social sciences
- Jean-​​Christophe Pautre (2014-​​, Uni­versité Paris 3, Biblio­thèque Nationale de France) : imple­menting semantic infor­mation retrieval tech­niques for the BNF search engine (Gallica)
- Miquel Cor­nu­della Gaya (2014-​​, Ecole normale supé­rieure, Cifre grant with Sony CSL Paris) : modeling lan­guage evolution

past PhD students

- Pierre Marchal (20102015, INALCO ; national PhD grant) : large scale acqui­sition of verbal sub­ca­te­go­ri­zation frames for Japanese (Pierre is now working as a research scientist for the SAP Research Centre in Paris and Boston)
- Elisa Omodei (20112014, Ecole normale supé­rieure ; regional PhD grant) : Modeling the socio-​​semantic dynamics of scien­tific com­mu­nities (Elisa is now a post-​​doctoral student at the Department of Mathe­matics and Com­puter Engi­neering at the Rovira i Virgili Uni­versity, in Tar­ragona, Spain)
- Zorana Rat­kovic (20102014, Uni­versité Paris 3 ; project funded) : parsing for infor­mation extraction from texts (Zorana is now working as a research engineer for a IT company in the Paris area)
- Mani Ezzat (20092013, INALCO ; Cifre grant with Arisem) : auto­matic acqui­sition of rela­tions between entities (now working as a research engineer at Exalead)
- Yufan Guo (20092013, Uni­versity of Cam­bridge, co-​​supervision with Anna Korhonen ; funded by Cam­bridge) : text zoning of scien­tific texts (now working as a research engineer at IBM USA)
- Cédric Mes­siant (20062010, Uni­versité Paris 13 ; national DGA grant) : auto­matic lexical acqui­sition from large corpora (now working as a research engineer at Ecreall, a IT company in Lille)
- Aurélien Bossard (20062010, Uni­versité Paris 13 ; national PhD grant) : auto­matic sum­ma­ri­zation (now an asso­ciate pro­fessor at Uni­versité Paris 8)
- Amanda Bouffier (2004−−2008, Uni­versité Paris 13 ; national PhD grant) : dis­cursive ana­lysis of medical texts (now an outs­tanding drummer and occa­sio­nally an inde­pendent consultant in text mining)