Bengaluru-based artificial intelligence startup Sarvam AI launched the Sarvam 2B, a two-billion-parameter open source large language model specifically designed for Indian languages.
Nadan Nilekani, co-founder of India's second-largest tech giant Infosys Ltd. hailed the launch in the spirit of independence, given that it is trained on an internal database, and will be able to perform specific tasks in 10 Indian languages with great efficiency.
The launch of Sarvam 2B is part of a broader initiative by Sarvam AI to introduce a suite of products tailored for both enterprise usage and open-source communities. The firm's full-stack GenAI platform is designed to support a wide range of tasks and features voice-enabled functionalities.
It will support 10 Indian languages, as well as in English. The Indian languages it currently supports are: Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Oriya, Punjabi, Tamil, and Telugu.
The model will be trained a data mixture containing equal parts of English and Indic tokens.
It also launched Shuka v1, which will natively understand audio in Indian languages, and is build by combining two models: Sarvam AI's in-house audio encoder, Saaras v1, and Meta's Llama3-8B-Instruct as the decoder.