Transformer

TrainDeeploy: Hardware-Accelerated Parameter-Efficient Fine-Tuning of Small Transformer Models at the Extreme Edge

On-device neural network tuning is essential for adapting pre-trained models to individual users and environments while preserving data privacy. However, ultra-low-power edge devices face significant computational and memory constraints that make …

VEXP: A Low-Cost RISC-V ISA Extension for Accelerated Softmax Computation in Transformers

We design a custom RISC-V ISA extension for Bfloat16 exponentiation with 1% area overhead, achieving 162.7x latency reduction and 74.3x energy savings for Softmax, and enabling efficient end-to-end Transformer inference on multi-cluster MCU systems.