OpenAI and Broadcom unveil Jalapeño chip, cutting inference costs by 50%

OpenAI's first custom chip, co-developed with Broadcom in nine months, promises to halve inference costs and reduce dependence on Nvidia GPUs.

OpenAI and Broadcom unveiled Jalapeño, a custom inference chip that Broadcom's chief executive said cuts costs by about 50%, threatening Nvidia's dominance in AI silicon.

"By designing more of the stack ourselves, we can serve more intelligence with greater efficiency," Greg Brockman, president and co-founder of OpenAI, said in a statement. "Jalapeño is part of our long-term full-stack infrastructure strategy to make compute more abundant."

The chip, developed from initial design to tape-out in nine months, is a blank-slate architecture for large language model inference rather than an adaptation of earlier AI accelerators. Engineering samples are already running GPT-5.3-Codex-Spark at production target frequency and power, with early testing showing "substantially better" performance per watt than current state-of-the-art chips, according to OpenAI. Broadcom shares rose about 2% following the announcement, though they later traded down about 3% as the broader semiconductor sector fell.

The partnership marks a strategic shift for OpenAI, which has been one of Nvidia's largest GPU buyers since the generative AI boom began in 2022. By designing its own silicon, OpenAI aims to reduce procurement costs at a time when demand for inference compute is exploding. Initial deployment of Jalapeño-based systems is expected by the end of 2026, with plans to scale to gigawatt-level data centers alongside Microsoft and other partners.

Jalapeño is an application-specific integrated circuit, or ASIC, designed specifically for LLM inference. Unlike Nvidia's general-purpose graphics processing units, which handle training and inference across diverse workloads, an ASIC trades flexibility for efficiency on targeted tasks. OpenAI said the architecture reduces data movement and balances compute, memory, and networking resources to achieve utilization "much closer to theoretical peak performance." Broadcom contributed its Tomahawk networking silicon and chip implementation expertise, while Celestica handled board, rack, and system integration.

The chip is the first in a planned multi-generation compute platform. OpenAI has also struck deals with Amazon Web Services for Trainium chips, as well as with Advanced Micro Devices and Cerebras, as part of a deliberate strategy to diversify away from Nvidia. The company said the nine-month development cycle may be the fastest ASIC development ever achieved in high-performance semiconductors, accelerated in part by OpenAI's own models helping to design and optimize the chip.

For investors, the implications cut both ways. Broadcom, whose shares have multiplied nearly sevenfold since the end of 2022, secures a high-volume custom chip customer in OpenAI, diversifying its AI revenue beyond networking. Nvidia, which has dominated the AI chip market with its GPUs, faces a top customer building an alternative for inference — the fastest-growing segment of AI compute. OpenAI did not disclose the total cost of the program or the per-chip price, but Broadcom Chief Executive Hock Tan described the collaboration as "just the beginning of a multi-generation roadmap" enabling gigawatt-scale data center deployments beginning in 2026.

This article is for informational purposes only and does not constitute investment advice.