IIIT Bengaluru students develop web search in colloquial Kannada using community radio

Users can type in the English script and translate it into Kannada or directly type in Kannada, which will display results from the audio corpus

IIIT Bangalore | (Pic: IIIT Bangalore)

Published on:

08 Jan 2024, 5:19 am

Rural areas in India contain rich knowledge which is often lost or becomes inaccessible on the internet as it is primarily oral. To bridge this gap, Aparna Mahava, pursuing her Master's in Data Science at the International Information Institue of Technology (IIIT) Bengaluru has introduced, Graama-Kannada Audio Search: a search engine built on the foundation of colloquial audio content in the Kannada language, stated a report in The New Indian Express.

The application is based on community radio recordings by an organisation, Namma Halli Radio. The audio recordings include interaction with villagers and the community residing in Tumkuru district in the state. “Small communities have a lot of knowledge but they don’t store it formally or write books on it. This audio corpus created through the local radio show greatly helps even for people with low literacy levels,” explained Aparna.

She added that the audio search engine is developed on the principles of Large Language Models (LLMs) and tweaked a bit to train it using an optimal approach under the guidance of Srinath Srinivasa, Professor and Dean (R&D), Web Science Lab, IIIT-Bangalore.

"We fine-tune state-of-the-art Automatic Speech Recognition (ASR) models using limited audio data to reduce the Word Error Rate (WER) for colloquial audio data to acquire transcripts for the audio, followed by creating an interface to search for keywords using the simple fuzzy matching technique for n-gram inputs,” Aparna said. The fuzzy match takes into consideration spelling errors and still gives out appropriate responses.

Through this model the team aimed to develop a search engine that individuals can use, entering relevant keywords. Popular concepts as options have also been compiled to make the search easy. The web application has been developed on five hours of audio recording from the local radio and created around 150 pages of text.

Users can type in the English script and translate it into Kannada or directly type in Kannada, which will display results from the audio corpus. Explaining the real-time impact of this model Aparna said if an individual in the area wants to know more about a nearby temple, villagers can just search for it. “It will display all the times the temple has been mentioned and learn more about it. From education, and health to ancient remedies and bedtime stories, the audio will help villagers develop their local knowledge.

"The next step is to work on a voice-based search mechanism and not just typing as it is meant for rural areas. These individuals have very little formal education, through this feature we want to reach the last mile,” said Aparna.

Team members:

Sharath Srivatsa (PhD Scholar, Web Science Lab, IIIT Bangalore)

Sai Madhavan G. (iMTECH student, IIIT Bangalore)

T. B. Dinesh (iruWay Rural Research Lab, Janastu)

research