ARMv9 Architecture
Recently, Arm released the latest generation of architecture ArmV9, this architecture is in current has been widely used based on ArmV8, a new generation of architecture for the next decade. Arm setting the stage for what Arm hopes will be the next 300 billion chip computing platform over the next decade.

The ArmV9 architecture has three families, the A-series for general-purpose computing, the R-series for real-time processors, and the M-series for microcontrollers, and is expected to deliver performance gains of more than 30 percent over the next two generations of mobile infrastructure CPUs. The first mobile processors based on the Armv9 architecture CPUs will be available as soon as the end of this year.
It’s been nearly 10 years since Arm first announced its Armv8 architecture in October 2011, and it’s been a pretty eventful decade in computing as the instruction set architecture has seen increasing adoption through the mobile space to the server space, and is now starting to gain popularity in consumer device markets like laptops and upcoming desktops. Over the years, Arm has made various updates and extensions to the ISA, some important and some perhaps easily overlooked.

What exactly is the difference between ArmV9 and ArmV8 that makes such a big jump in ISA naming?
Media analysis says that to be honest, purely from an ISA perspective, v9 may not be as fundamental a jump as V8 was compared to V7. V8 had introduced a completely different execution model and instruction set with AArch64, which has greater microarchitectural impact than AArch32, such as extended registers, 64-bit virtual address space, and many other improvements.
Armv9 continues the use of AArch64 as the baseline instruction set, yet adds some very important extensions in functionality that are worthy of an incremental architectural numbering that may allow Arm to also achieve a kind of software rebase, not only with the new v9 features but also with the various v8 extensions we have seen released over the years.
Armv9 has three new pillars, and Arm sees the main goals of the new architecture as security, AI, and improved vector and DSP capabilities. Security is a very big topic for v9 and deserves a more in-depth look at the new extensions and feature details, but when it comes to DSP and AI capabilities, it can be pretty straightforward. Probably the biggest new feature promised for the new Armv9-compatible CPU that developers and users will see right away is SVE2 as the baseline for NEON’s successor.

Scalable Vector Extensions, or SVE, was first announced back in 2016 when it was first implemented and first implemented in Fujitsu’s A64FX CPU core and is now used in Japan’s world’s number one supercomputer, Fukagu. The problem with SVE is that the first iteration of this new variable vector length SIMD instruction set is quite limited in scope and is more targeted at HPC workloads, missing many of the more general-purpose instructions that are still covered by NEON.
SVE2 was announced back in April 2019 and hopes to address this issue by supplementing the new extensible SIMD instruction set with the instructions needed to serve the more diverse DSP-like workloads that still use NEON (meaning the 128-bit SIMD single instruction multiple data extension architecture).
The benefit of SVE and SVE2, in addition to adding a variety of modern SIMD (single instruction multiple data) capabilities, is their variable vector size, ranging from 128b to 2048b, allowing variable 128b granularity of vectors regardless of where the actual hardware is running. From a pure vector processing and programming perspective, this means that software developers only need to compile code once, and if future CPUs will have, say, a native 512b SIMD execution pipeline, the code will already be able to take advantage of the full width of the unit.
Besides, the same code can run on more conservative designs with lower hardware execution width capabilities, which is important to Arm as they design CPUs for everything from IoT to mobile devices to data centers. Besides, it can do all of this within the 32b coding space of the Arm architecture, whereas other implementations on x86 would have to add new extensions and instructions based on vector size.

Machine learning is also seen as an important part of Armv9, as Arm sees more and more ML workloads becoming commonplace in the coming years. Running ML workloads on dedicated gas pedals will naturally remain a performance or power efficiency critical requirement, however, there will still be significant adoption of new smaller-scale ML workloads that will run on the CPU.
The matrix multiplication instruction is key here and will represent an important step in seeing larger-scale adoption across the ecosystem as a baseline feature for v9 CPUs.
In general, SVE2 is probably the most important factor in guaranteeing the jump to v9 nomenclature, as it is a more explicit ISA feature that distinguishes the v8 CPU in everyday use, which will ensure that the software ecosystem goes to differentiate the existing v8 stack. This has actually become quite an issue for Arm in the server space, as the software ecosystem is still based on v8.0-based packages, unfortunately missing the all-important v8.1 large system extensions.
Getting the entire software ecosystem moving forward and being able to assume that the new v9 hardware can scale with the new architecture would help move things along and possibly solve some of the current situations.
However v9 is not just about SVE2 and new instructions, it also has a very large focus on security, where we will see some more radical changes. The new Arm Confidential Computing Architecture (CCA) attempts to protect sensitive data with hardware-based security protections. These so-called “fields” can be created dynamically to protect important data and code from the rest of the system.
As a result, the Armv9 architecture has done several things in terms of security. The first is the introduction of the Arm Confidential Compute Architecture (CCA), where confidential computing is performed by creating a secure hardware-based operating environment that protects portions of code and data from being accessed or modified, even from privileged software.
Arm CCA will introduce the concept of dynamically created Realms, which are available to all applications and run independently of a secure or non-secure environment, to secure data. For example, in business applications, Realms can protect commercially confidential data and code in the system, whether they are being used, idle, or in transit.
Memory tag extensions are another security technology for the Armv9 architecture,” said Richard Grisenthwaite. “After analyzing the large number of security issues reported in software around the world, we found that the root cause of many of the problems was actually related to old problems with memory security in the past. These problems have plagued the computing space for 50 years, and two memory security issues have been particularly common for many years – cache overflows and post-release reuse. A large part of the problem is that these memory security vulnerabilities can be identified before the problem is exploited, which is a critical step in improving global software security.”
Arm’s ongoing collaboration with Google on “memory tag extension” technology looks for spatial and temporal memory security issues in software, allowing software to associate a pointer to memory with a tag and check that the tag is correct when the pointer is used. According to Richard, memory tag extensions are an integral part of the first generation of Armv9 CPUs that will be available next year. Software supporting memory tag extensions is also being introduced to Android 11 and open source.
In addition to these more specific improvements, Arm is promising more comprehensive performance enhancements on top of Armv9. The company expects to increase CPU performance by more than 30 percent over the next two iterations, with further performance improvements through software and hardware optimizations. arm says all existing software can run on Armv9-based processors without any problems.
After the release of the v9 instruction set, Samsung Electronics, Xiaomi, Google, OPPO, and others appeared in the official congratulation list, but some sources said that Apple is expected to become the first ARM v9, after all, the world’s first 64-bit cell phone processor is Apple’s 2013 A7 (equipped with the iPhone 5S).
Apple has always taken the practice of buying instruction set licenses but developing its IP cores, that is, without the ARM public version, which in theory would be much faster. In the release of v9, ARM gave the statement is the end of the 2021 commercial, it seems that the A15 processor is not able to catch up, perhaps the M-series Mac processor can spell.