Tutorials
Tutorial 1. Disruptive Memory Technologies: A Tutorial and Unified Simulation Framework
- Jian-Jia Chen (Organizer; TU Dortmund)
- Jörg Henkel (Organizer; Karlsruhe Institute of Technology)
- Lokesh Siddhu (Organizer; Karlsruhe Institute of Technology)
- Mehdi Tahoori (Contributor; Karlsruhe Institute of Technology)
- Jürgen Teich (Contributor; Friedrich-Alexander-University Erlangen-Nurnberg)
- Jeronimo Castrillon (Contributor; TU Dresden)
This tutorial explores disruptive memory technologies and their impact on embedded systems, offering both research and practical insights. Memory has long been central to computing, and recent advancements, such as non-volatile memory (NVM) and in-memory computing, have introduced new trade-offs in energy efficiency, performance, and design. These technologies influence the entire computing stack, from programming and operating systems to micro-architectures.
The tutorial aims to demonstrate how embedded architectures can utilize these emerging technologies for improved performance, power consumption, and efficiency. A 60-minute lecture will introduce participants to the state-of-the-art in memory technologies, followed by a hands-on session using a unified simulation framework developed by research groups from leading institutions.
Participants will be actively involved in four practical exercises: trace-based system analysis, in-memory computing extensions, NVM cache simulations, and DRAM/NVM main memory modeling. Each exercise is designed to deepen their understanding and evaluation of the impact of these technologies on system performance and design choices. This interactive approach will enable attendees to gain practical skills for the exploration and modeling of embedded systems with advanced memory technologies.
Tutorial 2. Low Code, High Performance Embedded AI with MATLAB & Arm IP Explorer
The intersection of AI and embedded systems represents a frontier of technological innovation. With the exponential growth in IoT devices and the advancement of AI models, there is a pressing need for professionals who can effectively deploy AI in resource-constrained environments. The goal of this tutorial is to present a low-code end-to-end workflow in an interactive hands-on format.
The tutorial focuses on design, optimization, and deployment of AI algorithms on Arm processors, typically used in power-conscious embedded devices. Using MATLAB, participants will learn to start from a high-level algorithmic design, optimize it, and auto-generate optimized C code. Arm IP Explorer complements this by offering the detailed insights into the performance metrics. This synergy allows for easy fine-tuning of applications to achieve performant solutions using reliable benchmarks.
Access the prework one week before the tutorial: https://github.com/Brenda-MW/Low-Code-eAI-with-MATLAB-ARM-IPX
Tutorial 3. AMD’s Ryzen AI Neural Processing Unit Hands-on Tutorial
In this tutorial we will describe the AMD machine learning solutions with the Ryzen AI™ platform, discuss the Neural Processing Units (NPUs), and present Riallto, an open-source exploration framework for first time users of the NPU developed by teams from the AMD Research and Advanced Development group and the AMD University Program. AMD Ryzen AI is the world’s first built-in AI engine on select x86 computers. This dedicated engine is built on the AMD XDNA™ spatial dataflow NPU architecture consisting of a tiled array of AI Engine processors and is designed to offer lower latency and better energy efficiency. This integration optimizes efficiency by offloading specific AI processing tasks such as background blur, facial detection, and eye gaze correction, freeing up CPU and GPU cycles and enhancing system efficiency. With Ryzen AI-powered laptops or miniPCs, you can develop innovative applications spanning creative solutions like media editing and studio effects or productivity solutions like Information search, summarization, transcription and so much more. Ryzen AI also caters to the gaming industry providing a platform to create real-time audio/video effects, Image enhancement, NPC Agents, RL, and Rendering applications.
Tutorial 4. Privacy Preserving Primitive for Heath Data
- Francesco Regazzoni (University of Amsterdam)
- Paulo Palmieri (University College Cork)
- Apostolos P. Fournaris (ISI)
Privacy preserving technologies plays a critical role in the development of the next generation of medical application. Privacy of health data is of utmost importance to convince users and in most of the countries these data are also protected by legislation. Several initiatives and research efforts are currently going on with the attempt of improving the performance and the type of devices where these technologies can be used. The SECURED project is an Horizon Europe project devoted to the scaling up of privacy preserving technologies for health data and medical application. To expose the community to the main current research results and best practices in this research area,, and to foster the exchange of ideas between all the involved stakeholders, we propose a tutorial presenting relevant cases of study and the latest achievements in the field of privacy preserving technologies for health data. The Tutorial will be organized by the SECURED consortium members.
In this tutorial, we will cover the needed background on privacy preserving techniques that can be used to handle and analyze health data, and we will show, by means of relevant health use case, how state of the art implementation of these primitives can be used in various computing devices targeting medical applications. In particular, we will introduce Homomorphic Encryption and secure multiparty computation concepts, we will discuss the recent advance in these technologies, and we will present indetail how these technologies can help in health related machine learning tasks, signal processing and time series analysis, showing practical instances. Further, we will show optimizations that would make these technologies suitable for embedded devices and we will discuss the limitations. Each practical instance will begin with a detailed introduction of the needed concepts, to allow also attendees not familiar with the topic to be able to successfully following the whole tutorial.
Tutorial 5. Novel Toolkits toward AI for Science on Resource-Constrained Computing Systems
- Weiwen Jiang (George Mason University)
- Youzuo Lin (University of North Carolina at Chapel Hill)
- Lei Yang (George Mason University)
Full Waveform Inversion (FWI) is a technique used to visualize and analyze wave propagation through a medium in order to infer its physical properties. This method relies on computational models and algorithms to simulate and interpret the behavior of waves—such as sound, electromagnetic, or seismic waves—as they travel through different materials. By analyzing how these waves are reflected, refracted, or absorbed by the medium, FWI can provide detailed information about the medium’s internal structure, composition, and physical properties, such as density, elasticity, or internal defects. The traditional process typically involves:
- Wave Simulation: Using physics-based models to simulate how waves propagate through a medium. This may involve solving complex differential equations that describe wave behavior in different contexts
- Data Acquisition: Collecting data on wave interactions with the medium using sensors or other measurement devices. This could include data on wave speed, direction, amplitude, and phase changes
- Image Reconstruction: Applying computational techniques, such as inverse problems or tomographic reconstruction, to create images or maps of the medium based on the acquired wave data.
- Analysis: Interpreting the reconstructed images to deduce the physical properties of the medium. This can involve identifying features like boundaries, interfaces, or anomalies within the medium.
Further details are available on the associated website.
Tutorial 6. Large-Scale Spiking Neuromorphic Architectural Exploration using SANA-FE
Neuromorphic computing uses brain-inspired concepts to accelerate and efficiently execute a wide range of applications, such as mimicking biological circuits, solving NP-hard optimization problems and accelerating machine learning at the edge. In particular, neuromorphic architectures to efficiently execute Spiking Neural Networks (SNNs) have gained popularity. SNNs extend artificial neural networks (ANNs) by encoding information in time as either rates or delays between spiking events, shared between neurons via their weighted connections. SNN-based platforms are event-driven, resulting in naturally sparse, noise-tolerant and power-efficient computation.
In this half-day tutorial, we will present the state-of-the-art in scalable digital and analog spiking neuromorphic system architectures, and discuss current research trends within the neuromorphic architecture field at the system level. We will further introduce our SANA-FE tool for Simulation of Advanced Neuromorphic Architectures for Fast Exploration, which has been developed as part of a collaboration between the University of Texas at Austin and Sandia National Laboratories. SANA-FE allows for modeling and performance-power prediction of different spiking hardware architectures executing SNN applications to support rapid, early system-level design-space exploration, hardware-aware application development and system architecture co-design. The tutorial will include a hands-on component in which SANA-FE’s capabilities will be demonstrated and used to perform system design and application mapping case studies.
Before attending this tutorial, we recommend installing Docker desktop and downloading the SANA-FE Docker image (jamesaboyle/sana-fe), which includes all required binaries, files and scripts for this session. Docker desktop can be downloaded at: https://www.docker.com/products/docker-desktop/ and in-depth tutorial instructions will be available online at: https://github.com/SLAM-Lab/SANA-FE/blob/main/tutorial/TUTORIAL.md.
Tentative schedule
The schedule for our tutorial will be as follows:
- Overview (90 mins)
- Introduction to large-scale spiking architectures (60 mins)
- Overview of our neuromorphic hardware simulator SANA-FE (30 mins)
- Hands-on component (90 mins)
- Walk-through of SANA-FE installation and setup using Docker (15 mins)
- Overview of SANA-FE’s architecture and file formats (45 mins)
- Design-space exploration demonstration using SANA-FE (30 mins)
Tutorial 7. Deploying Acoustic-Based Predictive AI for Machine Health using Model-Based Design Tools
What if machines could talk? What if the hums and thrums of motors, pumps, and conveyors could speak up when trouble brews? Acoustic-based diagnostic techniques allow us to “listen” to machines and train AI models to interpret the “voice”, turning the complex sounds they make into actionable insights.
Participants will learn to train machine learning models that can interpret complex sensor data, effectively filtering out background noise to accurately predict machinery conditions. This approach leverages the MBD tool suite to create, test, and implement algorithms tailored for embedded systems, emphasizing the importance of developing small, efficient network architectures that can perform complex tasks with minimal computational resources. The exercises are designed to offer practical foundation of the underlying principles and technology.
Access the prework one week before the tutorial: https://github.com/Brenda-MW/ESWeek-Acoustic-AI-with-MBD
Tutorial 8. Understand Your FPGA Designs Better: From Rapid Simulation to On-board Profiling
- Callie Hao (Georgia Institute of Technology)
- Rishov Sarkar (Georgia Institute of Technology)
- Jiho Kim (Georgia Institute of Technology)
Understanding and optimizing FPGA design performance is critical for achieving desired outcomes in latency and throughput. Performance is typically evaluated through simulated metrics, which can be obtained via HLS synthesis reports or C/RTL co-simulation. While synthesis reports offer fast but often inaccurate estimates, C/RTL co-simulation provides more accurate results at the cost of significant time and computational resources. To bridge this gap, we introduce LightningSim, an open-source simulation tool that combines speed and accuracy, offering performance simulations that are orders of magnitude faster than traditional C/RTL co-simulation. Additionally, to address the challenge of discrepancies between simulated and real on-FPGA performance, we introduce RealProbe, an automated on-board profiling tool that precisely measures on-chip cycle counts by simply annotating HLS source code. Together, LightningSim and RealProbe empower designers with efficient and accurate tools for optimizing FPGA designs throughout the development process.
Tentative Schedule
13:30–13:40 | Opening |
13:40–13:55 | Motivation: the hidden details of your FPGA design performance |
13:55–14:30 | Introduction to the LightningSim too |
14:30–15:00 | Hands-on experiments using LightningSim |
15:00–15:30 | Break |
15:30–16:00 | Troubleshooting |
16:00–16:15 | Introduction to the RealProbe tool |
16:15–16:50 | Hands-on experiments using RealProbe |
16:50–17:00 | Summary and closing |
Tutorial 9. Generative AI for Next-generation EDA Tool-flows
There are ever-increasing demands on complexity and production timelines for integrated circuits. This puts pressure on chip designers and design processes, and ultimately results in buggy designs with potentially exploitable mistakes. When computer chips underpin every part of modern life, enabling everything from your cell phone to your car, traffic lights to pacemakers, coffee machines to wireless headphones, then mistakes have significant consequences. This unfortunate combination of demand and increasing difficulty has resulted in shortages of qualified engineers, with some reports indicating that there are 67,000 jobs in the field yet unfilled.
Fortunately, there is a path forward. For decades, the Electronic Design Automation (EDA) field has applied the ever-increasing capabilities from the domains of machine learning and artificial intelligence to steps throughout the chip design flow. Steps from layouts, power and performance analysis and estimation, and physical design are all improved by programs taught rather than programmed.
In this tutorial we will explore what’s coming next: EDA applications from the newest type of artificial intelligence, generative pre-trained transformers (GPTs), also known as Large Language Models. We will show how models like the popular ChatGPT can be applied to tasks such as writing HDL, searching for and repairing bugs, and even applying itself to the production of complex debugging tasks like producing assertions. Rather than constrain oneself just to commercial and closed-source tooling, we’ll also show how you can train your own language models and produce designs in a fully open-source manner. We’ll discuss how commercial operators are beginning to make moves in this space (GitHub Copilot, Cadence JedAI) and reflect on the consequences of this in education and industry (will our designs become buggier? Will our graduating VLSI students know less?). We’ll cover all of this using a representative suite of examples both simple (basic shift registers) to complex (AXI bus components and microprocessor designs).
Tutorial 10. Efficient Large Language Model Tuning on the Edge
Large language models (LLMs) have shown increasing power on various NLP tasks. Typically, these models are trained on a diverse range of text from books, articles, and websites to gain a broad understanding of human language and are known as the pre-trained language models (PLMs). However, task-specific data is often required to adapt PLMs to perform specific tasks or be more accurate in real-world scenarios. This fine-tuning process relies heavily on user-generated data on devices, providing a wealth of contextual insights and nuanced use cases that reflect actual human interaction and needs. In practice, it is challenging to use these devices and data securely. On-device tuning is always necessary to preserve users’ data privacy. However, finetuning LLMs introduces extremely heavy memory and computational costs, which are unacceptable to edge devices, especially commercial devices with limited onboard resources. Our tutorial will focus on efficient LLM tuning on the edge to solve these challenges. Through this tutorial, audiences can learn the background and development of LLM tuning methods. The instructors will also introduce the advanced techniques that enable efficient LLM tuning on edge devices, including back-propagation-free optimizations (e.g., zeroth-order optimization). Some instructors will also provide a live hands-on demo to let the audience conduct efficient LLM tuning.