Developing Software Optimized for AWS Graviton Processors

Developing Software Optimized for AWS Graviton Processors

While most scale-out applications run seamlessly on AWS Graviton, optimizing code to target the architecture unlocks additional performance and efficiency gains.


Here are the key compiler flags, libraries and architectural capabilities to leverage for maximizing Graviton’s price/performance advantages:



  1. Enable Arm Neoverse N1 Tuning


Specify explicit support for N1 cores to vectorize math operations and scale efficiently across all cores using:


– -mcpu=neoverse-n1

– -march=armv8.2-a+simext



  1. Vectorize Data Parallel Code


Utilize 128/256 bit SIMD registers and math libraries to parallelize floating point operations including matrix math, encoding/decoding and cryptography.



  1. Scale Thread Count


Spanning compute across high core counts requires testing different thread counts for optimal contention avoidance. Profile thread pools, queue depths etc.



  1. Configure Optimal Cache Usage


Graviton’s large L1/L2 caches reduce trips to memory. Ensure data structure sizing, layout and access patterns maximize cache hits.



  1. Offload to Specialist Hardware


Further acceleration possible via Graviton’s tensor processing units for machine learning tasks. Or distribute networking using integrated DPUs.



Doing upfront performance testing and optimization analysis ensures unlocking the full capabilities of the underlying Graviton hardware. Our teams can assist in tailoring code to reach peak efficiency.


Get in touch to assess optimization headroom or conduct joint performance benchmarking for your apps on Graviton!

Add a Comment

Your email address will not be published.