Fault injection and Lockstep evaluation on Zynq Ultrascale+ MPSoC
Τεχνητή εισαγωγή σφαλμάτων και αξιολόγηση λειτουργίας Lockstep στο Zynq UltraScale+ MPSoC

Master Thesis
Author
Alampasis, Nikolaos
Αλαμπάσης, Νικόλαος
Date
2025-12Advisor
Psarakis, MichaelΨαράκης, Μιχαήλ
View/ Open
Keywords
FaultiInjection ; Lockstep ; Cortex-R5 ; MPSoC reliability ; PMU–RPU signaling ; Soft error mitigation ; ECC protectionAbstract
This dissertation presents an experimental study of fault behavior and fault-handling mechanisms in the AMD Zynq UltraScale+ MPSoC using the Ultra96-V2 platform. The focus is on soft-error-related events and architectural mechanisms that enhance dependability in safety-oriented embedded systems, with emphasis on the dual-core Arm Cortex-R5 RPU operating in lockstep mode, ECC protection of on-chip memory (OCM), and coordination by the Platform Management Unit (PMU).
A hands-on methodology is followed, where controlled faults are injected and their effects are analyzed through firmware logs, runtime behavior, and PMU-to-RPU signaling. A FreeRTOS-based RPU application was developed to provide deterministic execution and liveness monitoring, while PMU firmware was extended to detect ECC events and notify the RPU via Inter-Processor Interrupts (IPIs), enabling end-to-end validation of fault reporting and acknowledgement.
Three test cases are examined: correctable ECC faults handled transparently, uncorrectable ECC faults leading to escalated reporting, and lockstep mismatches triggering fail-safe recovery through RPU reset. The results highlight the distinction between recoverable memory faults and unrecoverable processor-level lockstep mismatches. Future work includes extending fault injection to additional memory regions and enhancing software recovery mechanisms.

