en
en

Creation of National Corpus of Crimean Tatar Language underway in Ukraine

The Ministry for Reintegration of Temporarily Occupied Territories has initiated the creation of the National Corpus of the Crimean Tatar Language (NCCTL) as part of the implementation of the Strategy for the Development of the Crimean Tatar Language for 2022-2032. The NCCTL is an online platform for language research that will be based on data from textual materials in Crimean Tatar.

The collection of printed and electronic texts in the Crimean Tatar language for the Corpus began in October 2022.

In almost 8 months, more than 800 materials by more than 200 authors have been added to the catalogue. The recognition and formatting procedure has already been completed for 54% of the materials. 25% of the planned volume of materials has been prepared for uploading to the Corpus platform.

The testing of the platform’s software elements is currently being completed. A manual for its future users is also being prepared.

It should be noted that the project of the National Corpus of the Crimean Tatar Language was presented at the 17th conference of the European Chapter of the Association for Computational Linguistics, which took place this month in Croatia.

The project is being implemented with the support of the Ministry for Reintegration of Temporarily Occupied Territories, the Swiss-Ukrainian EGAP Programme implemented by the East Europe Foundation, and Taras Shevchenko National University of Kyiv.