This is the website of the project A parallel treebank of speeches by Narendra Modi, Prime Minister of India.
On this website, you can find the latest updates on the project and everything you need to know to become part of it.
The project aims to construct a parallel treebank featuring speeches delivered by Narendra Modi, the Prime Minister of India, in several languages of India. The project is embarking on its initial phase, which involves incorporating speeches in Hindi, Marathi, Telugu, and English. However, as the project progresses, additional languages will be introduced in subsequent phases. The motivation behind this initiative is to create a valuable linguistic resource that spans multiple languages spoken in India. By providing syntactic and morphological annotations, the treebank facilitates the analysis of cross-lingual similarities and differences in morphology, syntax, and lexicon. Thus, the Treebank aims to be a valuable resource for linguistic research, offering insights into language variation and usage in the context of Prime Minister Modi's speeches.
The uniqueness of Modi's speeches lies in their availability on the Indian government website, where hundreds of speeches are transcribed and translated into multiple official languages of India. This project addresses the scarcity of freely accessible parallel treebanks, especially for Indian languages. While other parallel sources exist, Modi's speeches offer distinct advantages:
- Contemporary Nature: Unlike traditional parallel sources like the Bible, Modi's speeches are contemporary.
- Natural Language: In contrast to legal texts, the speeches exhibit a more natural language, enabling the study of trends in contemporary standard Hindi across various linguistic levels.
The project involves downloading transcriptions of Prime Minister Modi's speeches, publicly accessible on the Indian government website (PM India). The transcriptions are available in all official languages of India. The collected data undergoes syntactic and morphological annotation by a team of annotators, with subsequent alignment to ensure comparability across languages.
If you are an expert or a native speaker of any of the official languages of India and are interested in contributing to the project, please let us know by filling out the this form.
Stay tuned for updates as we progress through the various phases of this exciting multilingual treebank initiative.
For any inquiries, please contact:
- Andrea Drocco (Ca' Foscari University of Venice): andrea.drocco@unive.it
- Erica Biagetti (University oof Pavia): erica.biagetti@unipv.it
- Luca Brigada Villa (University of Pavia): luca.brigadavilla@unibg.it
- Lucrezia Carnesale (Univesity of Pavia): lucrezia.carnesale@unipv.it