Scaling GenAI: Production Challenges Unveiled

S1 E013

|

DEVOPS

Jan 11, 2024

About Speaker

Sreedhar Gade

Vice President, Engineering, Freshworks

Sreedhar, the VP of Engineering at Freshworks, holds an impressive career spanning nearly two decades in leading technology, financial, and internet organizations like Yahoo! Inc and Microsoft. As a recognized industry leader, entrepreneur, speaker, and author, he's known for his insights on emerging trends and leadership. Sreedhar actively shares his expertise through blogs, social channels, and conferences, focusing on career guidance and leadership. He collaborates with global industry leaders, emphasizing rapid career growth strategies. He believes in democratizing success principles, ensuring equal access to information, execution plans, and mentorship for all, regardless of location or financial status.

About Host

Sathya Narayanan Nagarajan

Co-founder and CTO, Amnic

Sathya is an experienced technologist with over two decades in Artificial Intelligence (AI), Electric Vehicles (EV), and Distributed Systems. As the Co-founder and CTO of Amnic, he drives the development of a cloud Intelligence Platform, emphasizing efficiency, cost reduction, and reliability. Sathya's leadership spans roles at Ola Electric Mobility, Ola Cabs, Yahoo and many internet companies. With 11 patents in AI, EV, and Distributed Systems, he is committed to knowledge sharing and guiding industry thought leaders.

Summary of Podcast

In this podcast, Shareedhar Gade discusses the complexities of scaling Generative AI (GenAI) for production, focusing on capacity planning, continuous integration, and cost optimization. During the bootstrapping phase, businesses identify use cases and determine which hyper skills to engage with; however, in production, they must focus on continuous scaling, monetization, and staying competitive by adapting to new models. Sreedhar addresses the challenges of implementing Continuous Integration, Continuous Delivery (CI/CD), and Continuous Training for GenAI models, emphasizing the need for custom solutions to handle functionality and performance issues. In this podcast, Sreedhar also discusses strategies for optimizing large language model usage, such as adjusting token usage based on modelling success and categorizing use cases for specialized models. The section identifies capacity planning, model performance, and cost as the major challenges in scaling GenAI for production. To address these challenges, Sreedhar suggests strategies like switching between models, dual deployment methods, and iterative model pipelines. Sreedhyar Gade recommends monitoring various metrics, such as several tokens, cost, performance, system governance, acceptance rate, and time taken for model responses to effectively manage and deploy public and private models together. They will continue discussing scaling and governance in the next episode.

About Amnic

Amnic is a cloud cost observability platform, helping businesses measure and rightsize their cloud costs. Amnic helps businesses visualize, analyze and optimize their cloud spends, in turn building a lean cloud infrastructure. Amnic offers out of the box solutions that help breakdown cloud bills and provide greater visibility and understanding into cloud costs along with recommendations to lower spends, alerts and anomaly detection.

Amnic delivers a wide range of features including K8s visibility, cost analyzer, alerts and custom reporting, budgeting, forecasting and smart tagging. DevOps and SRE teams rely on Amnic to deliver a simplified view into their cloud costs, allowing them to maintain governance and build a culture of cost optimization. Setup in 5-minutes and get 30-days of free trial.

Visit www.amnic.com to get started.

MORE EPISODES