Arm Holdings has introduced a groundbreaking program, known as Arm Neoverse Compute Subsystems (CSS), which is set to streamline and expedite the integration of Arm Neoverse-based technology into innovative computing solutions. This innovative program was officially unveiled at the prestigious Hot Chips 2023 technical conference held at Stanford University.
Neoverse represents Arm’s cutting-edge server-side technology, designed to offer high-performance capabilities while retaining the power efficiency for which Arm’s mobile components are renowned. The CSS program empowers Arm’s partners to construct specialized silicon with greater cost-efficiency and speed compared to previous discrete IP solutions.
The inaugural CSS offering, Arm CSS N2, is rooted in the Neoverse N2 platform initially introduced in 2020. CSS N2 empowers Arm’s partners with a customizable compute subsystem, allowing them to concentrate their efforts on enhancing memory, I/O operations, acceleration, and other critical aspects.
Unlike conventional chip manufacturers such as Intel, Arm distinguishes itself by providing designs, with licensees being responsible for manufacturing their own silicon. Typically, this entails substantial work on the processor’s underlying architecture, rigorous validation, seamless integration, and comprehensive testing with any custom IP components they wish to incorporate.
CSS is devised to furnish a more comprehensive design approach, significantly reducing the workload typically shouldered by licensees. The primary objective is to simplify and expedite the development of data-center-grade processors rooted in the Neoverse framework.
As Jeff Defilippi, senior director of product management at Arm, emphasized during a pre-briefing conference call, “If we take a step back and look at 2018 when we introduced our Neoverse product line… a majority of the cloud workloads were running on general-purpose servers.” These servers, he noted, employed general-purpose processors intended to cater to a wide range of workloads. However, a fundamental shift towards specialized processing, capable of handling sophisticated cloud-native workloads, was essential. CSS N2 is architected to meet this precise demand.
The CSS N2 design offers flexibility, with configurations ranging from 24 to 64 cores, operating at speeds ranging from 2.1GHz to 3.6GHz. Each core boasts 1MB L2 cache, contributing to a total shared system cache of 64MB. Furthermore, the design supports up to eight channels of DDR5 memory and 64 lanes of PCIe 5.0 connectivity. CSS N2 is tailor-made for silicon designs targeting scale-out clouds, AI applications, 5G infrastructure, data processing units (DPUs/SmartNICs), and networking equipment.
Arm Neoverse Compute Subsystems are readily available from Arm Holdings, offering a compelling option for those interested in crafting custom silicon solutions.
The need for faster innovation and deployment of compute engines in the field is evident, with increasing demand from both customers and chip manufacturers for improved performance and cost-effectiveness. Arm’s unveiling of the “Genesis” N2 Compute Subsystems, or CSS intellectual package, at Hot Chips 2023 underscores the company’s commitment to accelerating the deployment of Arm CPUs.
Over the years, Arm has progressively moved towards providing comprehensive CPU solutions, allowing customers greater flexibility in designing and expediting their path to market. Initially, server chip designers faced significant challenges, as an Arm architectural license necessitated substantial investment in chip development, which was both costly and time-consuming.
Arm’s Neoverse initiative, launched in October 2018, marked a significant step in easing the chip development process. It offered not only a roadmap for server chip cores but also reference architectures, incorporating various Arm intellectual properties such as on-die mesh interconnects, third-party memory, PCI-Express controllers, and Ethernet controllers. These Neoverse designs were optimized for specific process nodes at Taiwan Semiconductor Manufacturing Co., further streamlining the development process for server chip manufacturers.
The CSS intellectual property package from Arm is designed to accelerate chip design, ultimately saving both time and money. By providing a pre-validated, production-ready RTL deliverable, along with essential implementation components like floor plans, implementation scripts, and physical IP libraries, Arm empowers customers to optimize performance and power efficiency on leading-edge technology.
Moreover, Arm supplies a comprehensive software reference stack, encompassing firmware, power management, system management, and runtime security components. This ensures that software development can commence immediately, providing customers with a solid starting point. In addition, Arm’s commitment to incorporating leading-edge technologies, such as CXL memory expansion pooling, further enhances the value of the CSS package.
For regions or organizations lacking a wealth of skilled engineers experienced in advanced server CPU design or the necessary design and testing tools, the CSS approach offers a compelling solution. It not only expedites chip development but also makes it feasible for chips to be manufactured, bridging the gap in CPU innovation.
The potential savings of 80 engineer-years are substantial, particularly when considering the continued scope for customization. Therefore, the value proposition of CSS-designed chips compared to traditional chip designs is evident. This initiative aligns with the growing demand for specialized silicon, especially as the era of Moore’s Law draws to a close. By simplifying and accelerating chip development, CSS aims to reduce costs and time-to-market, providing a competitive edge for companies seeking customized CPU solutions.
Several critical questions arise in evaluating the value of CSS designs versus those requiring extensive chip maker efforts. These inquiries revolve around the cost-effectiveness of bringing a chip from conceptualization to deployment in server, network devices, or storage arrays compared to the use of Intel or AMD’s X86 servers or Arm chips from Ampere Computing. Is it worth the effort?
The answer appears to be a resounding “yes.” Major players like AWS and Alibaba are venturing into crafting their own Arm chips, with Google reportedly considering a similar move. Additionally, industry giants like Microsoft, Tencent, Baidu, Alibaba, Google, and Oracle are embracing Ampere Computing’s Altra Arm chips. Arm CPUs are becoming a significant component of their server fleets, offering substantial cost savings. These companies gain more control through direct efforts and establish indirect control via close partnerships with Ampere Computing.
Despite this trend, hyperscalers and cloud builders will continue procuring Intel and AMD CPUs, primarily to support legacy Windows Server and, occasionally, Linux applications. They deliberately charge a premium for instances relying on these CPUs, as do Intel and AMD for the underlying chips. While no collusion is occurring, Intel and AMD are content with ceding portions of the hyperscale and cloud market—15%, 20%, and potentially 25%—to Arm, while securing the lion’s share—85%, 80%, and 75%—of a larger market without engaging in a price war.
The CSS implementation centered around the “Perseus” N2 core offers scalability from 24 to 64 cores, with the possibility of combining four cores into a package to achieve 256 cores in a socket using UCI-Express or proprietary interconnects between chiplets, according to customer preferences.
Given the anticipated demand for high-performance computing (HPC) and artificial intelligence (AI) vector calculations in modern processors, the absence of a CSS for V2 designs is regrettable. Hopefully, such an offering will emerge in the future, particularly with the advent of V3 designs.
Currently, Arm is initiating the CSS initiative exclusively with the N2 designs, strategically positioned in the middle of the roadmap.
Prepare for an insightful look into the Genesis CSS N2 package, featuring detailed schematics and block diagrams, as presented by Anitha Kona, an Arm Fellow and a lead system architect at the chip IP designer.
It’s worth noting that the CSS N2 package complies with SystemReady standards, including Arm Base System Architecture 1.0, Arm Server Base System Architecture 6.1, and Arm Server Base Boot Requirements 1.2.
While the N2 core represents Arm’s initial Armv9 implementation, the V2 core isn’t far behind, considering Nvidia’s use of the V2 core in its Grace CPUs. There’s a possibility of collaboration between Nvidia and Arm in designing the V2 core, similar to the partnership between Fujitsu and Arm for the tongue-in-cheek “V0 core.”
Although the N2 core features two SVE2 128-bit vectors, the V2 core boasts four of them. Consequently, there’s a growing need for a CSS V2 offering in the near future, ideally without a code name like “Exodus.”
In summary, Anitha Kona asserts that Genesis IP package licensees can leverage the CSS N2 stack to differentiate their products based on memory, I/O, accelerators, and physical topology. This enables them to transition from project kickoff to functional silicon in an astonishing 13 months, resulting in significant savings equivalent to 80 engineering-years of development effort. It’s essential to note that these statistics originate from distinct Arm partners who were early adopters of the Genesis initiative, emphasizing the substantial time and cost savings inherent in Arm CPU chip design.