Hardware-Software Codesign

VEXP: A Low-Cost RISC-V ISA Extension for Accelerated Softmax Computation in Transformers

We design a custom RISC-V ISA extension for Bfloat16 exponentiation with 1% area overhead, achieving 162.7x latency reduction and 74.3x energy savings for Softmax, and enabling efficient end-to-end Transformer inference on multi-cluster MCU systems.