Tutorial T5

Title: Embedded Machine Learning: Design, Optimizations, and Applications

Abstract: By integrating AI into small embedded systems, we can use the power of billions of devices that we already use in our lives without depending on extra costly equipment. We can build cheaper devices that adapt to our daily lives and have a high impact on how we deal with the environment around us. In this 3-hour tutorial, three speakers will cover the hardware/software co-design of accelerators, performance autotuning in AI chips, and some novel applications.

List of topics and speakers

Topic 1: Fundamentals of Machine Learning Algorithm and Acceleration Co-design and its Applications in Autonomous System (45 mins + 15 mins Q&A)

Abstract: High quality machine learning (ML) solutions require joint optimization of algorithms and their hardware accelerators. In the first talk, we introduce the fundamentals of both ML accelerator  and algorithm design, which pave the road to efficient, practical, and system-level co-design in real applications. In particular, the talk is arranged as follows:

  • Part 1 (10 – 12 min), we introduce the recipe and guidelines for high-performance ML accelerator design, such as data type and precision, parallelization, and memory system optimizations.
  • Part 2 (10 – 12 min), we discuss the prevailing approaches for automated high-quality ML algorithm search, i.e., neural architecture search (NAS). Example approaches include reinforcement learning, differentiable, and evolutionary algorithms.
  • Part 3 (5 min), we march into the joint area of accelerator and algorithm co-design and present the classic and representative approaches, and discuss its exciting applications in autonomous systems. More advanced techniques and applications will be discussed in the second talk.

Speaker: Dr. Cong (Callie) Hao is an Assistant Professor in the School of Electrical and Computer Engineering at Georgia Institute of Technology (GaTech), Atlanta, USA, and she currently holds the Sutterfield Family Early Career Professorship. She is the primary investigator of the software/hardware co-design lab (Sharc Lab) 1. She received her Ph.D. degree in Electrical Engineering from Waseda University, Japan in 2017, and her M.S. and B.S. degrees in Computer Science and Engineering from Shanghai Jiao Tong University. She was a postdoctoral researcher in ECE, GaTech, from September 2020 to August 2021, and a postdoctoral researcher in ECE at the University of Illinois at Urbana-Champaign (UIUC). Her primary research interests lie in the joint area of efficient hardware design and machine learning algorithms. She is passionate about reconfigurable and high-efficiency computing and building useful electronic design automation tools.

Topic 2: Practice on Performance Autotuning in AI Compute Chip (45 mins + 15 mins Q&A)

Abstract: Modern AI compute SOCs have abundant computation cores, interconnects & high-bandwidth memory (HBM) resources. When facing enormous different neural network architectures, how to program AI compute chips to make full use of the system resources becomes challenging. In this talk, I will share an automated code generation framework “Autotuning through design space pruning & schedule templates” to program a commercial AI compute chip to achieve improvements in both accelerator performance and coding efficiency.

Speaker: Dr. Peipei Zhou joined the University of Pittsburgh, ECE department as a Tenure-Track Assistant Professor starting September 2021. She obtained my Ph.D. in Computer Science from University of California, Los Angeles in 2019 supervised by Prof. Jason Cong, who leads UCLA VAST (VLSI Architecture, Synthesis and Technology) Group and CDSC (The Center for Domain-Specific Computing). Her major interest is in Customized Computer Architecture and Programming Abstraction for Applications including Healthcare, e.g., Precision Medicine and Artificial Intelligence. She has received “Outstanding Recognition in Research” from UCLA Samueli School of Engineering in 2019, 2019 TCAD Donald O. Pederson Best Paper Award, 2018 ICCAD Best Paper Nominee, and 2018 ISPASS Best Paper Nominee.

Topic 3: Towards Independent On-Device AI: Inference without Battery and Learning without Labels (45 mins + 15 mins Q&A)

Abstract: This talk consists of two parts: Inference without battery and learning without labels for on-device machine learning algorithms. Both directions are working towards an more independent ondevice AI. The maturation of energy harvesting (EH) technology and the recent emergence of intermittent computing, which stores harvested energy in energy storage and supports an episode of program execution during each power cycle, creates the opportunity to build sophisticated battery-less energy-neutral sensors. By deploying lightweight DNNs onto EH-powered devices, persistent, eventdriven sensing and decision capabilities can be achieved. However, harvested energy is usually weak and unpredictable and even lightweight DNNs take multiple power cycles to finish one inference. To eliminate the indefinite long wait to accumulate energy for one inference and to optimize the accuracy, we developed a power trace-aware and exit-guided network compression algorithm to compress and deploy multi-exit neural networks to EH-powered microcontrollers (MCUs) and select exits during execution according to available energy.

Another challenge for on-device AI is how to automatically adapt to new environments without excessive interaction with human. After a model is deployed on edge devices, it is desirable to learn from unlabeled data to continuously improve accuracy. Contrastive learning has demonstrated its great potential in learning from unlabeled data. However, the online input data are usually none independent and identically distributed (non-iid) and edge devices’ storages are usually too limited to store enough representative data from different data classes. We developed a framework to automatically select the most representative data from the unlabeled input stream, which only requires a small data buffer for dynamic learning.

Speaker: Dr. Jingtong Hu is currently an Associate Professor and William Kepler Whiteford Faculty Fellow in the Department of Electrical and Computer Engineering at University of Pittsburgh, Pittsburgh, PA, USA. Before that, he was an Assistant Professor at Oklahoma State University from 2013 to 2017. He received his Ph.D. in Computer Science from University of Texas at Dallas in 2013 and his B.E. in Computer Science and Technology from Shandong University, China in 2007. His current research interests include hardware/software co-design for machine learning algorithms, on-device AI, and embedded systems. His works have received Donald O. Pederson Best Paper Award from IEEE Transactions on Computer-Aided Design of Circuits and Systems in 2021 and 5 best paper nominations from DAC, ASP-DAC, and ESWEEK, etc. He is also the recipient of Oklahoma State University Outstanding New Faculty Award, Air Force Summer Faculty Fellowship, and ACM SIGDA Meritorious Service Award. He has served as TPC track chair for CASES, GLSVLSI, ASP-DAC, DAC, SAC, and on the TPC of many other international conferences such as DATE, ESWEEK, CPS-IoT Week, etc. He served as a guest editor for Sensors, IEEE Transactions on Computers, ACM Transactions on Cyber-Physical Systems, and is currently serving as executive committee member and education chair for ACM SIGDA, associate editor for IEEE Embedded Systems Letters, the Journal of Systems Architecture: Embedded Software Design, and ACM Transactions on Cyber-Physical Systems.