Technology

Red Hat Expands AWS Partnership to Deliver Faster, Cost-Efficient AI Inference at Enterprise Scale

Red Hat AI Inference Server support for AWS AI silicon will enter developer preview in January 2026.

Red Hat has announced an expanded collaboration with Amazon Web Services (AWS) to accelerate enterprise adoption of generative AI by optimizing AI inference across AWS infrastructure. The partnership brings together Red Hat AI, AWS AI silicon (Inferentia2, Trainium3) and Red Hat OpenShift to deliver high-performance, cost-efficient AI deployments for hybrid cloud environments.

Meeting Enterprise Demand for Scalable AI
As enterprises rapidly adopt generative AI, the demand for optimized compute power continues to rise. IDC forecasts that by 2027, 40% of organizations will use custom silicon—including ARM processors and AI-specific chips—to improve performance and reduce costs. Red Hat and AWS aim to meet this demand through a full-stack AI strategy that supports scalability, flexibility and cost efficiency.

Joe Fernandes, vice president and general manager, AI Business Unit, Red Hat said, “By enabling our enterprise-grade Red Hat AI Inference Server, built on the innovative vLLM framework, with AWS AI chips, we’re empowering organizations to deploy and scale AI workloads with enhanced efficiency and flexibility. Building on Red Hat’s open source heritage, this collaboration aims to make generative AI more accessible and cost-effective across hybrid cloud environments.”

Key Enhancements Under the Red Hat–AWS Collaboration
1. Red Hat AI Inference Server Optimized for AWS AI Chips

  • Red Hat AI Inference Server—powered by the vLLM framework—will run on AWS Inferentia2 and Trainium3 chips, enabling:
  • 30–40% better price-performance than comparable GPU-based EC2 instances
  • Lower latency and faster inference for production AI workloads
  • Support for a broad range of generative AI models

2. Native AI Support on Red Hat OpenShift

  • Red Hat and AWS have jointly created an AWS Neuron operator for:
  • Red Hat OpenShift
  • Red Hat OpenShift AI
  • Red Hat OpenShift Service on AWS

This integration offers enterprises a seamless way to run AI workloads using AWS accelerators within their Kubernetes environments.

3. Simplified AI Deployment & Automation
Red Hat is making access to AWS AI chips easier through:

  • Integrated support for high-demand AI accelerators
  • The new amazon.ai Certified Ansible Collection for automating AI services on AWS

Colin Brace, Vice President, Annapurna Labs, AWS said, “Enterprises demand solutions that deliver exceptional performance, cost efficiency, and operational choice for mission-critical AI workloads. AWS designed its Trainium and Inferentia chips to make high-performance AI inference and training more accessible and cost-effective. Our collaboration with Red Hat provides customers with a supported path to deploying generative AI at scale, combining the flexibility of open source with AWS infrastructure and purpose-built AI accelerators to accelerate time-to-value from pilot to production.”

Both companies are contributing to upstream optimization for vLLM, which powers Red Hat’s inference engine and the open-source llm-d project used for large-scale model serving. The AWS Neuron operator is now live in the Red Hat OpenShift OperatorHub.

Related posts

DXC Technology Launches Xponential — A Scalable Blueprint to Accelerate Enterprise AI Transformation

NewzOnClick

Egypt Accelerates Tech Expansion Beyond Cairo

NewzOnClick

eScan Enterprise DLP Introduces Industry-First QR Code Scanning to Prevent Hidden Data Leaks

NewzOnClick

Leave a Comment

error: Content is protected !!