Accueil > Recherche > Bases de Données

A Syntactically Annotated Corpus of Medieval French

par Thierry POIBEAU - publié le

A syntactically annotated corpus of Medieval French (see also the "projects" section). This corpus is the result of the French-German ANR-DFG research project SCRMF, directed by Sophie Prévost (Lattice) and Achim Steim (Stuttgart, ILR), December 2008-February 2012.

SRCMF was born from the observation that there is no syntactically annotated corpus available for Medieval French, unlike for other languages, e.g. English. The project’s goals were twofold : on the one hand produced a large annotated corpus for research and on the other hand promote the creation of tools for the automatic annotation of Old French.

The annotation was performed using two existing databases of Medieval French texts (each made of around 3 million words) : la Base du Français Médiéval (ENS Lyon : ICAR, UMR 5191) and the New Amsterdam Corpus (ILR, University of Stuttgart), using an original unified syntactic model. The dependency-based model was developed in the framework of the project ; the project also included part-of-speech tagging of the texts.

As a result, the annotated corpus is made of 260 000 words with part-of-speech as well as syntactic annotations, and the project also included the development of new re-usable tools.

The whole set of resources is available online from the project website http://srcmf.org. Additionnally, the resource will soon be available through the TXM platform developed at ENS Lyon (demo version : http://txm.risc.cnrs.fr/demo).

Voir en ligne : SRCMF Project Website