Project Description
Improved speculative decoding on edge GPUs through architectural bias to induce n-gram statistics in draft model
– Distilled drafters learn n-gram patterns of target models, leading to improved token acceptance rate
Research Classification
- Electrical engineering, computer engineering, and information engineering
Research Interests
- Efficient AI
- TinyML
- Sustainable Computing
Faculty
Faculty of Applied Science