
The appointment comes at a time when the company is moving towards building proprietary AI models after finding off-the-shelf tools insufficient for its audio storytelling platform.
Sharma has spent 13 years working in areas such as computer vision and natural language processing at companies including Meta, Tesla, and Amazon Lab126. He will lead Pocket FM’s AI research and applied innovation initiatives. He has published over 90 papers at top-tier AI conferences and contributed to Meta projects including Llama, Chameleon, and DINOv2.
Pocket FM is an audio series platform that produces episodic, long-form storytelling content such as audio dramas and serialised novels.
The technical hurdle
The company, which is based in Los Angeles and Bengaluru-based company, claims to have over 250 million listeners globally. It plans to develop specialised AI models for creative writing and audio production rather than rely on general-purpose AI systems from Google and other providers.
“The generic models from Google or other companies are pretty good at doing a basic job,” said Prateek Dixit, Co-founder of Pocket FM, in an interview to YourStory. “But when it comes to specialised tasks and increasing quality from 80% to 95% or 99%, we face challenges there.”
Vasu Sharma, the newly appointed head of AI, said generic AI models fall short on multiple fronts for Pocket FM’s needs. According to him, current systems lack the dramatic flair essential to storytelling, are designed to sound neutral rather than emotional, and perform poorly in regional languages beyond English.
“These models are trained to be more tone agnostic and more neutral sounding,” said Sharma. “They’re intentionally non-dramatised, which is literally against the core ethos of what writing brings to the table.”
The platform also finds it challenging to manage extremely long-form content with generic models. Some of Pocket FM’s audio series span over 1,000 hours, requiring models that can maintain narrative consistency across thousands of chapters—a capability current foundation models lack.
“Present models can handle maybe a million tokens on the input side and maximum 100,000 tokens on the output side,” Sharma explained. “That’s not even close to good enough for us.”
The company must solve problems like ensuring a concept introduced in chapter one can be recalled and utilized in chapter 1,000, while tracking individual character arcs and narrative threads across extended storylines.
Building from scratch—sort of
Pocket FM won’t train foundation models from the ground up, a process Sharma estimates costs between $10 million and $1 billion. Instead, the company will build on existing open-source models with extensive customization.
“These things do not work for us out of the box, and even with basic levels of fine-tuning, it just doesn’t work,” Sharma said. The team plans substantial modifications to address context length limitations, multilingual performance, and narrative quality.
Beyond writing models, Pocket FM is also developing proprietary text-to-speech (TTS) systems. While English TTS models are adequate, Sharma said even leading providers perform poorly on regional Indian languages, including Hindi. The company requires narration capable of dramatic dialogue delivery, which cannot be met by standard conversational AI, which may be adequate for customer service interactions.
Pocket FM plans to handle the need for emotional expression through reinforcement learning and reward function design that prioritise dramatic speech, according to Sharma.
Reward function design involves assigning numerical values (rewards or penalties) to an agent’s actions, guiding it to learn optimal behaviour by maximixing cumulative reward.
The company’s advantage, Sharma said, is access to substantial training data from its existing audio content library.
.thumbnailWrapper{
width:6.62rem !important;
}
.alsoReadTitleImage{
min-width: 81px !important;
min-height: 81px !important;
}
.alsoReadMainTitleText{
font-size: 14px !important;
line-height: 20px !important;
}
.alsoReadHeadText{
font-size: 24px !important;
line-height: 20px !important;
}
}

Expansion plans
Proprietary AI infrastructure will power two tools: a public-facing platform simple enough for a 10-year-old to use, and a sophisticated internal system for the company’s network of over 3 lakh writers. The latter will help with narrative structures, scene definition, and other technical aspects of long-form storytelling.
Pocket FM is also eyeing expansion into comics, anime, and eventually feature films as AI video generation improves, though Sharma acknowledged current models aren’t yet capable of supporting those formats.
Despite high inference costs for AI-generated content, the economics remain favourable, he said. “It is still a lot cheaper and faster, or even in some cases, it’s almost impossible to do it without AI,” he said.
The company emphasises it’s pursuing quality over pure volume. “We just don’t want to generate millions and millions of hours of content with pure AI very quickly,” Sharma said. “The goal is to generate high-quality content a lot faster than what we’ve been able to do conventionally.”
Pocket FM has implemented ‘content moderation systems’ to prevent misuse of its AI tools, said Sharma, as part of a “very strict QC process” to ensure platform integrity.
Edited by Swetha Kannan
Discover more from News Link360
Subscribe to get the latest posts sent to your email.
