On-device neural network tuning is essential for adapting pre-trained models to individual users and environments while preserving data privacy. However, ultra-low-power edge devices face significant computational and memory constraints that make …
We design a custom RISC-V ISA extension for Bfloat16 exponentiation with 1% area overhead, achieving 162.7x latency reduction and 74.3x energy savings for Softmax, and enabling efficient end-to-end Transformer inference on multi-cluster MCU systems.