Picture
SEARCH
What are you looking for?
Need help finding what you are looking for? Contact Us
Compare

PUBLISHER: IDC | PRODUCT CODE: 1817382

Cover Image

PUBLISHER: IDC | PRODUCT CODE: 1817382

Beyond GenAI Model Training: Reducing Cost and Latency and Improving Scalability of AI Inferencing Workloads in Production

PUBLISHED:
PAGES: 18 Pages
DELIVERY TIME: 1-2 business days
SELECT AN OPTION
PDF (Single User License)
USD 7500

Add to Cart

The IDC Perspective explores the challenges and innovations in scaling generative AI (GenAI) inference workloads in production, emphasizing cost reduction, latency improvement, and scalability. It highlights techniques like model compression, batching, caching, and parallelization to optimize inference performance. Vendors such as AWS, DeepSeek, Google, IBM, Microsoft, NVIDIA, Red Hat, Snowflake, and WRITER are driving advancements to enhance GenAI inference efficiency and sustainability. The document advises organizations to align inference strategies with use cases, regularly review costs, and partner with experts to ensure reliable, scalable AI deployment."Optimizing AI inference isn't just about speed," says Kathy Lange, research director, AI Software, IDC. "It's about engineering the trade-offs between cost, scalability, and sustainability to unlock the potential of generative AI in production, where innovation meets business impact."

Product Code: US52959725

Executive Snapshot

Situation Overview

  • What Is AI Inference, and Why Is It Important?
  • Growing Demand for Efficient AI Inference
    • The GenAI Inference Infrastructure Stack
    • Factors That Influence GenAI Inference Performance
      • Model Compression Techniques
      • Data Batching Techniques
      • Caching and Memorization Techniques
      • Efficient Data Loading and Preprocessing
      • Reducing Input and Output Sizes
        • Parallelization
        • Model Routing
        • Which Software Platform Optimization Techniques Are Considered Most Effective?
        • Test-Time Compute (aka Inference-Time Compute)
        • An Emerging Field of Research
    • Technology Supplier Innovation

Advice for the Technology Buyer

Learn More

  • Related Research
  • Synopsis
Have a question?
Picture

Jeroen Van Heghe

Manager - EMEA

+32-2-535-7543

Picture

Christine Sirois

Manager - Americas

+1-860-674-8796

Questions? Please give us a call or visit the contact form.
Hi, how can we help?
Contact us!