Microsoft Corp. has used Karya to source local speech data for its AI products. And Alphabet Inc.’s Google is leaning on Karya and other local partners to gather speech data in 85 Indian districts. Google plans to expand to every district to include the majority language or dialect spoken and build a generative AI model for 125 Indian languages. “Over 70 Indian languages spoken by over a million people each had zero digital corpus. Microsoft, he learned, had been paying a hefty amount for collecting speech data, albeit of poor quality, to feed its AI systems and research.
Source: Mint November 03, 2023 18:55 UTC