ESweek 2025 Tutorials
1) Deep Software Stack Optimization for AI-Enabled Embedded Systems
Organizers: Prof. Seongsoo Hong, SNU
Email: sshong@redwood.snu.ac.kr
Full day
Abstract:
The rapid deployment of AI-enabled embedded systems across industries necessitates highly efficient deep learning (DL) inference on resource-constrained hardware. To achieve optimal performance, developers must deeply understand and optimize the system software stack. This tutorial aims to bridge the gap by providing attendees with an in-depth understanding of AI-enabled embedded systems, including DL inference engine internals and model optimization techniques. Specifically, this tutorial will:
- Explain the overall architecture of the system software stack for AI-enabled embedded systems.
- Introduce the internals of the LiteRT DL inference engine.
- Teach optimization techniques for deep neural networks (DNNs) using a model slicer and LiteRT internal modifications.
- Provide hands-on training to reinforce concepts with practical experience with RUBIK Pi.
2) Design Automation for ML-enabled Cyber-Physical Systems: From Verification to Synthesis
Speakers:
- Samarjit Chakraborty, UNC Chapel Hill (samarjit@cs.unc.edu)
- Jingtong Hu, University of Pittsburgh (jthu@pitt.edu)
- Qi Zhu, Northwestern University (qzhu@northwestern.edu)
Half day
Abstract:
Recent advances in machine learning are transforming the design of cyber-physical systems (CPS), but they also introduce new complexity. Current design approaches develop neural networks (NNs) and control algorithms independently, akin to the separate design of controllers and their implementation platforms. Research on co-design between NNs, control algorithms, and architectures is in a nascent stage, offering significant benefits but also considerable challenges, including in tackling large design spaces, training candidate NNs, and verifying NN+controller safety. The goal of this tutorial is to provide an introduction to this nascent but highly consequential area of NAS/CPS co-design, with applications in autonomous vehicles, robotics, and industrial automation. Topics include controller design, notions of safety in CPS, neural architecture search algorithms, and reachability-based safety verification of NN-augmented CPS.
3) CEDR: A Holistic Software and Hardware Design Environment for Hardware Agnostic Application Development and Deployment on FPGA-Integrated Heterogeneous Systems
Organizers: Serhan Gener, Sahil Hassan, Ali Akoglu
Affiliation: Department of Electrical and Computer Engineering, University of Arizona
Contact: {gener,sahilhassan,akoglu}@arizona.edu
Half day
Abstract:
Overview: As the FPGAs are being embedded in all layers of computing infrastructure from edge to HPC scale, system designers continue to explore design methodologies that leverage increased levels of heterogeneity to push performance within the target performance goals or constraints. In line with this objective, we have developed CEDR, an open-source, unified compilation, and runtime framework designed for FPGA-integrated heterogeneous systems, as part of the DARPA DSSoC program. Our framework empowers users to seamlessly develop, compile, and deploy applications on off-the-shelf heterogeneous computing platforms. Importantly, this framework is portable across a wide range of Linux-based systems, ensuring that effort to migrate across systems is minimal. This tutorial builds upon the education class we conducted on the open-source CEDR framework during ESWEEK’23. Since then, CEDR has matured substantially, and we have shared it with the community through interactive tutorials in ISFPGA’24 and ISFPGA’25. For ESWEEK’25, we have partnered with AMD to deliver the participants with a seamless engagement experience with our framework through hands-on exercises, interaction with FPGA-based SoCs, and productive discussions around challenges in heterogeneous computing.
4) Design, Model, and Explore Approximate Arithmetic Operators for Energy-efficient AI Inference
Organizers:
- Salim Ullah (Chair of Embedded Systems, Ruhr-Universität Bochum, Germany), email: Salim.Ullah@rub.de
- Siva Satyendra Sahoo (Interuniversity Microelectronics Centre, Leuven, Belgium), email: Siva.Satyendra.Sahoo@imec.be
- Akash Kumar (Chair of Embedded Systems, Ruhr-Universität Bochum, Germany), email: Akash.Kumar@rub.de
Half day
Abstract:
As AI/ML algorithms get adopted across application domains, the demand for scalable AI/ML systems is growing rapidly. Arithmetic operations form an integral part of AI computing. Hence, reducing the power, performance, and area (PPA) costs of arithmetic operations can lead to better scalability of AI models, especially for resource-constrained edge devices. Approximate Computing (AxC) is being actively explored as a potential method for providing application-specific optimizations and thereby reducing the energy consumption of implementing AI models. While AxC can be implemented across multiple layers of the computing stack, approximate arithmetic operators have a direct impact on the PPA cost of arithmetic operations during AI inference. To this end, approximate arithmetic operators that provide disproportionate gains in PPA while leveraging the error tolerance in AI models are being actively researched.
Further, the diversity in the application domains and resulting model complexity have made FPGA platforms an attractive option for deploying edge AI. However, circuit and architecture-level optimizations for ASICs, especially in the implementation of approximate arithmetic operators, may not scale proportionately to FPGA-based systems. To this end, this tutorial provides a detailed overview of the design of approximate arithmetic operators on FPGAs. The tutorial will cover topics related to the LUT-level design optimizations for arithmetic operators, modeling approximate operator designs to enable automated design generation, and some advanced methods for DSE of approximate arithmetic. In this tutorial, we will discuss and conduct hands-on experiments with different methodologies to demonstrate the synthesis of novel application-specific approximate operators providing different accuracy-performance trade-offs.
5) Hardware-Aware Compilation and Simulation for In-Memory Computing
Presenters: Asif Ali Khan (TU Dresden), Joao Paulo Cardoso de Lima (TU Dresden), Hamid Farzaneh (TU Dresden), Jeronimo Castrillon (TU Dresden), Hadjer Benmeziane (IBM Zurich), William Simon (IBM Zurich), Corey Lammie (IBM Zurich), Abu Sebastian (IBM Zurich), Zheyu Yan (Zhejiang University), Yiyu Shi (University of Notre Dame), Sharon Hu (University of Notre Dame)
Contact: Asif Ali Khan (asif_ali.khan@tu-dresden.de)
Half day
Abstract:
Compute-in-memory (CIM) systems have been demonstrated to outperform traditional Von Neumann architectures by orders of magnitude in both performance and energy efficiency, attracting significant research interest and relevance as of late. However, despite significant technological progress, their widespread adoption and efficient utilization remain a challenge. This is primarily due to two reasons: (i) CIM are often performed in the analog domain using non-volatile memory devices, and are subject to circuit non-idealities and device noise, e.g., temporal variations, some of which are inherently stochastic. (ii) CIM systems typically offer only low-level programming models, making them accessible mainly to hardware experts. Furthermore, as CIM systems continue to evolve, selecting the right memory technology and CIM system architecture, and efficiently mapping applications onto these systems also significantly impact performance, energy consumption, and accuracy. In this tutorial, we will first provide an overview lecture discussing existing solutions for the compilation of different types of CIM systems, then present an end-to-end deployment pipeline for Deep Learning (DL) models onto heterogeneous CIM-based accelerators, which addresses these challenges. The pipeline takes a hardware-agnostic input application representation of a DL model, e.g., a PyTorch model description, performs hardware-aware training and heterogeneous weight mapping, compiles the model within a multi-level intermediate representation (MLIR) framework, and evaluates it with an accurate CIM simulator. Critically, the tutorial will have hands-on sessions covering various components of this pipeline. Participants will have the opportunity to adjust different parameters in different modules of the flow and observe the impacts on key output metrics, such as performance, energy consumption, and application accuracy.