Developing Software Optimized for AWS Graviton Processors
While most scale-out applications run seamlessly on AWS Graviton, optimizing code to target the architecture unlocks additional performance and efficiency gains.
Here are the key compiler flags, libraries and architectural capabilities to leverage for maximizing Graviton’s price/performance advantages:
- Enable Arm Neoverse N1 Tuning
Specify explicit support for N1 cores to vectorize math operations and scale efficiently across all cores using:
– -mcpu=neoverse-n1
– -march=armv8.2-a+simext
- Vectorize Data Parallel Code
Utilize 128/256 bit SIMD registers and math libraries to parallelize floating point operations including matrix math, encoding/decoding and cryptography.
- Scale Thread Count
Spanning compute across high core counts requires testing different thread counts for optimal contention avoidance. Profile thread pools, queue depths etc.
- Configure Optimal Cache Usage
Graviton’s large L1/L2 caches reduce trips to memory. Ensure data structure sizing, layout and access patterns maximize cache hits.
- Offload to Specialist Hardware
Further acceleration possible via Graviton’s tensor processing units for machine learning tasks. Or distribute networking using integrated DPUs.
Doing upfront performance testing and optimization analysis ensures unlocking the full capabilities of the underlying Graviton hardware. Our teams can assist in tailoring code to reach peak efficiency.
Get in touch to assess optimization headroom or conduct joint performance benchmarking for your apps on Graviton!