Published: 02nd April 2021
IIT KGP researchers develop AI system to translate complex Sanskrit text. Here's how it works
Sanskrit has its own challenges when it comes to natural language processing. IIT KGP researchers have developed a system to make it easier to process these texts
As the new National Education Policy (NEP) focuses on returning to our roots, there's been a renewed interest in the Sanskrit language at academic institutions starting from schools to higher educational institutions. Due to technological advancement and digital accessibility, Sanskrit is being promoted in various ways, however, it still presents unique challenges in automated computational processing. To solve this problem, Indian Institute of Technology Kharagpur (IIT KGP) researchers have developed a digital infrastructure for the efficient processing of Sanskrit texts, by effectively combining state-of-the-art machine learning techniques and traditional linguistic knowledge from the language.
"We were working on the problem of making computers adept at translating languages and thus creating useful applications for the same. We are trying to develop algorithms by which we can process Sanskrit text. If a user provides us with a text in Sanskrit or I want it translated from a book, for instance, our digital infrastructure will be able to efficiently process those," said Dr Pawan Goyal, who has been leading the research. Their work has been accepted for publication in the Computational Linguistics Journal published by the MIT Press. Research scholar Dr Amrith Krishna, currently, a postdoc at the University of Cambridge, was also a part of this research, while it was supervised by Dr Pawan Goyal.
Their research paper currently addresses the tasks of word segmentation (संधि विच्छेद), morphological parsing (पद विश्लेषण), dependency parsing (कारक विश्लेषण) and poetry-to-prose conversion of Sanskrit text (अन्वय). "Word segmentation is prevalent in the Sanskrit language, it has been passed over to generations majorly in oral mediums and not through other forms, this thus resulted in the word segmentation being retained in the written form as well. When we start analysing, we found there is no word boundary, which could have been extremely helpful for language processing," explains Dr Goyal.
Sanskrit as a classical, age-old language has a rich literary tradition spanning milleniums. Works in Sanskrit include millions of manuscripts of extensive epics, subtle and intricate philosophical, mathematical, and scientific treatises, and also literary, poetic, and dramatic texts. "We have used Artificial Intelligence to find out the best solution for processing these texts. The proposed AI-based system used along with interactive tools such as the Sanskrit Heritage reader can help users in the easier analysis of these manuscripts. We can come up with word-by-word analysis and translation, the relation between words, poetry to prose conversion, search and question answering, etc. There are several applications we can also build such as automatic speech recognition in Sanskrit just like we do for Apple's Siri and Google Assistant. Such is not available for this language, and we are still working on building something on this," concludes Dr Goyal.