A short course, “Radiation Considerations for Board-Level Computing Systems”, will be presented at the 2023 IEEE Nuclear and Space Radiation Effects Conference. The ultimate purpose of the radiation effects community is to enable successful system operation in radiation environments. System-level success stems from integrating an understanding of fundamental mechanisms, and sub-component and component-level responses to radiation, with system-level analysis. A spacecraft board-level computing system represents a commonly used exemplar comprised of multiple complex components.
The short course is organized into four sections, all featuring introductory material and advanced topics. The first section introduces spaceflight computing needs and challenges, considering various architectures beyond just traditional CPUs. The second topic covers FPGAs, which are widely used due to low development cost and schedule, and have increased in both capability and complexity to become bona fide Systems on Chip. The third section addresses data links, which are critical for communication between system components, including both electrical and optical connections. Finally, the last course covers artificial neural networks used for AI applications, addressing both GPUs and specialized accelerators. The topics covered should benefit people new to the field as well as experienced engineers and scientists, by providing up-to-date material and insights.
The short course is intended for radiation effects engineers, component specialists, system designers, and other technical and management personnel involved in developing reliable systems designed to operate in radiation environments. It provides a unique opportunity for IEEE NSREC attendees to benefit from the expertise of excellent instructors, along with a critical review of state-of-the-art knowledge in the field. Electronic copies of detailed course notes will be provided to each participant.
Continuing Education Units (CEUs)
Continuing Education Units (CEUs) will be available. For the interested attendees, an exam will be given at the end of the short course. The course is valued at 0.6 CEUs, and is endorsed by the IEEE and by the International Association for Continuing Education and Training (IACET).
Short Course Chairman
The Boeing Company
Short Course Chair
Ethan Cannon is Manager of the Advanced Microsystems Technology team in the Boeing Research & Technology—Solid-State Electronics Development organization, where his team develops revolutionary capabilities for Systems on Chip that meet current and future Military-Aerospace mission system needs. His research interests include extreme environments, high reliability applications, and hardware security. He has a Ph.D. in physics from the University of Illinois at Urbana-Champaign.
Dr. Tyler Lovely is the Principal Investigator for Space Computing within the Space Electronics Technology program at the U.S. Air Force Research Laboratory (AFRL), where his research focuses on advancing on-board computing capabilities for next-generation space systems. He has worked in the area of aerospace and defense electronics and computing for over 14 years. His previous experience includes serving as a research group leader at the NSF Center for Space, High-Performance, and Resilient Computing (SHREC), supporting AFRL as a contractor with the Universities Space Research Association, and working for United Space Alliance supporting the NASA Space Shuttle program. He holds a Ph.D. in Electrical and Computer Engineering (ECE) from the University of Florida, and a faculty title with the Department of ECE at the University of New Mexico.
PART I – ADVANCEMENTS AND CHALLENGES WITH RADIATION- TOLERANT SPACEFLIGHT COMPUTERS
Dr. Tyler Lovely
US Air Force Research Laboratory
On-board computing demands for space systems are continually increasing due to the need for real-time sensor and autonomous processing combined with limited communication bandwidth to ground stations. Although massive investments have been made by the electronics industry to advance the state-of-the-art in computing technologies, radiation- hardened technology requires longer lead times due to funding constraints, greater design complexity, and rigorous radiation testing and qualification requirements. Thus, the capabilities of radiation-hardened processors typically lag several technology generations behind commercial state-of-the-art technology. Due to changes in the spectrum of risk tolerance and a pivot from large and expensive long-duration missions to shorter-duration missions with more rapid technology refresh, increasing numbers of programs are considering and using small satellites, leading to high interest in leveraging commercial electronics. However, there exists little data quantifying the ability of commercial processors to operate reliably in a space radiation environment. Furthermore, it remains highly challenging to keep up with the broad, diverse, and rapidly changing landscape of available architectures such as CPUs, GPUs, FPGAs, SoCs, AI/ML accelerators, and others. During this module, Dr. Tyler Lovelly, US Air Force Research Laboratory, will provide an overview of the spaceflight computing technology area including recent advancements and challenges in designing, manufacturing, evaluating, and deploying radiation-tolerant computers to support the next generation of space systems.
Nadia Rezzak is the Senior Manager of Radiation Effects Technology and Development for the FPGA Business Unit at Microchip Technology, where she managers the radiation effects team and leads the development and validation of commercial and radiation tolerant FPGAs. She has over 10 years of experience with radiation effects and reliability and has over 30 conference presentations and journal publications. She received MS EE from Polytech Montpellier and MS and Ph.D. EE from Vanderbilt University.
Pierre Maillard joined AMD’s Adaptive Embedded Computing Group (AECG) in 2013, where he is currently leading the Radiation Effects & RAS team. The team focuses on the architecture, development, and validation of commercial and rad. tolerant FPGA/ACAP solutions, for the Terrestrial (Telecom, Avionics, Automotive, Datacenter, etc.) and Space markets. He has over 20 presentations and publications in industry leading conferences and journals. He holds 13 issued patents in the field of radiation effects on electronics. He received his M.S. in Electrical Engineering (EE) from the universities of Montpellier II and M.S. and Ph.D. EE in from Vanderbilt University.
PART II – RADIATION EFFECTS IN FPGAS AND SOCS
Dr. Nadia Rezzak and Dr. Pierre Maillard
Microchip Technology and AMD, respectively
The ability to implement complex designs and evolving algorithms in reconfigurable devices makes Field Programmable Gate Arrays (FPGAs) attractive for many Terrestrial and Space applications, compared to fixed function Application Specific Integrated Circuits (ASICs).
Dr. Nadia Rezzak (Microchip, Inc.) and Dr. Pierre Maillard (AMD, Inc.) will discuss Radiation Effects in FPGAs and SoCs. The first part of the course will address the basics of SRAM and non-volatile based FPGAs architecture and their evolution to modern/complex System On Chip(SoC) and Adaptive Compute Acceleration Platform (ACAP) devices. Then we will discuss Single Event Effects (SEE) and Total Ionizing Dose(TID) mechanisms, errors classification, test methodologies and representative results. The final section will focus on mitigation techniques and challenges to address requirements for Terrestrial (telecom, automotive, datacenters, avionics, etc.), Defense and Space markets.
Zachary Diggins is the founder of Cyclo Technologies, Inc., a company created in 2022 that provides cloud software and engineering consulting services supporting electronics design for radiation environments. Previously, he was the lead radiation effects engineer for SpaceX’s Starlink satellite program, working 6 years on the project from pre-prototype through system deployment and activation. His interests include up-screening of commercial-of-the-shelf components and modeling system risk. He holds a Ph.D. from Vanderbilt University in Electrical Engineering, with a thesis focused on probabilistic modeling of radiation effects on systems.
PART III – RADIATION EFFECTS IN DATA LINKS
Dr. Zachary Diggins
Advances in sensor and networking payloads place ever increasing demands on data links. Additionally, reliable communication between different components on a spacecraft are critical for safe operation, while also potentially contributing to the spacecraft power and weight through harnessing and PCB requirements, making data links a critical design consideration. In this course, Dr. Zachary Diggins, from Cyclo Technologies, will cover the radiation effects for the various data links on a single-board computer, from basic mechanisms through part selection considerations and testing strategies. Specifically, radiation effects in SerDes links for inter-chip communication will be reviewed, including clock generation and distribution considerations. Satellite bus communication protocols will be evaluated, including options for redundancy and wireless buscommunication. A focused section will be included on optical communication technologies, including fiber based and inter-satellite data links, which have total-ionizing dose and displacement damage concerns. Finally, comparisons will be made to state-of-the-art terrestrial data center architectures.
Paolo Rech received his master and Ph.D. degrees from Padova University, Padova, Italy, in 2006 and 2009, respectively. He was then a Post Doc at LIRMM in Montpellier, France. Since 2022 Paolo is an associate professor at Università di Trento, in Italy and since 2012 he is an associate professor at UFRGS in Brazil. He is the 2019 Rosen Scholar Fellow at the Los Alamos National Laboratory, he received the 2020 impact in society award from the Rutherford Appleton Laboratory, UK. In 2020 Paolo was awarded the Marie Curie Fellowship at Politecnico di Torino, in Italy. His main research interests include the evaluation and mitigation of radiation-induced effects in autonomous vehicles for automotive applications and space exploration, in large-scale HPC centers, and quantum computers.
PART IV – EXPERIMENTAL EVALUATION OF ARTIFICIAL NEURAL NETWORKS RELIABILITY: FROM GPUS TO LOW-POWER ACCELERATORS
Prof. Paolo Rech
UFRGS (Brazil) and University of Trento (Italy)
Artificial Neural Networks are among the greatest advancements in computer science and engineering and are today used to classify or detect objects in a frame and to enable autonomous vehicles. Since neural networks are heavily used in safety-critical applications, such as automotive and aerospace, their reliability must be paramount. However, the reliability evaluation of neural networks systems is extremely challenging due to the complexity of the software, which is composed of hundreds of layers, and of the underlying hardware, typically a powerful parallel device.
In this course, Prof. Paolo Rech, from UFRGS (Brazil) and University of Trento (Italy) will review fundamental concepts of Artificial Intelligence, Artificial Neural Networks, and parallel computing devices. Then, the course will detail the experimental setup required to have a deep and accurate reliability evaluation of an Artificial Neural Networks system. In particular, the guidelines for a successful neutron or heavy ion test of Graphics Processing Units (GPUs) and low-power accelerators, such as Tensor Processing Unit (TPU) or Systolic Arrays, will be provided. Specific attention will be given to the choice of the software, the neural network configuration, the input dataset, and to the experimental results analysis.