Benjamin0 Comments

Developing Software Optimized for AWS Graviton Processors

While most scale-out applications run seamlessly on AWS Graviton, optimizing code to target the architecture unlocks additional performance and efficiency gains.

Here are the key compiler flags, libraries and architectural capabilities to leverage for maximizing Graviton’s price/performance advantages:

Enable Arm Neoverse N1 Tuning

Specify explicit support for N1 cores to vectorize math operations and scale efficiently across all cores using:

– -mcpu=neoverse-n1

– -march=armv8.2-a+simext

Vectorize Data Parallel Code

Utilize 128/256 bit SIMD registers and math libraries to parallelize floating point operations including matrix math, encoding/decoding and cryptography.

Scale Thread Count

Spanning compute across high core counts requires testing different thread counts for optimal contention avoidance. Profile thread pools, queue depths etc.

Configure Optimal Cache Usage

Graviton’s large L1/L2 caches reduce trips to memory. Ensure data structure sizing, layout and access patterns maximize cache hits.

Offload to Specialist Hardware

Further acceleration possible via Graviton’s tensor processing units for machine learning tasks. Or distribute networking using integrated DPUs.

Doing upfront performance testing and optimization analysis ensures unlocking the full capabilities of the underlying Graviton hardware. Our teams can assist in tailoring code to reach peak efficiency.

Get in touch to assess optimization headroom or conduct joint performance benchmarking for your apps on Graviton!

Developing Software Optimized for AWS Graviton Processors

Add a Comment
Cancel reply

About

Newsletter