Hi,
I have a sort of generic Arm architecture question that's not directly related to TF-A (other than that TF-A controls some of the registers involved in this decision), but I'm hoping that one of the experts here can still help me out or at least refer me to someone who can.
I'm trying to figure out how exception routing for SError aborts works in EL2. Specifically, I have a bootloader (BL33) running in NS-EL2 and I want the "simple" setup that it manages all its own exceptions, the same way that an OS kernel normally manages all exceptions at EL1. I assumed that I could achieve that simply by installing exception handlers, unmasking all exceptions in PSTATE, and leaving all the special trap feature bits in the MSRs at 0 (disabled).
This seems to work for synchronous exceptions and external aborts, but not for SErrors. Looking at the architecture reference manual (revision L.b), table D1-14 in section D1.3.6.3 (page D1-6114), I can see that my case is represented by the first line (all special trap bits 0), which shows that SErrors caused by EL0 and EL1 would be routed to EL1 as expected (though even when PSTATE.A is 1 which seems odd?), but SErrors caused by EL2 will get ignored and remain pending (with no regard to PSTATE.A). Instead, the "default" behavior I expect (aborts get routed to the EL that caused them if PSTATE.A is 0) seems to require me to enable SCTLR_EL2.NMEA. But if you're looking at the description of SCTLR_EL2.NMEA, it says that it controls whether PSTATE.A masks SError exceptions at EL2 (and that if it is 0, SError exceptions are not taken at EL2 if PSTATE.A == 1). Doesn't that imply that SError exceptions *are* taken at EL2 if PSTATE.A == 0? What does a control that seems to be about trapping masked aborts from a lower EL have to do with unmasked aborts from my current EL?
Basically, I think what I'm asking is: is that table really correct as printed (some behavior we've observed seems to indicate it is), and if so, why? Why do SError exceptions seem to behave differently by default in EL1 and EL2 (in regards to unmasked exceptions taken from the same exception level)? Why does the PSTATE.A bit only seem to apply to EL0 and EL1, not EL2 and EL3, even for exceptions taken from the same level, when this peculiarity seems to not be mentioned anywhere else in the manual? Why do SError exceptions get treated so differently from external aborts in EL2/EL3, when in EL1 they seem to mostly count as the same? Is the current description of the NMEA bit in the SCTLR_EL2 register documentation really accurate, if it also seems to make fundamental changes to cases not really mentioned in that description? Is there any way for EL2 to only handle its own SError exceptions without interfering with EL1's exception handling when FEAT_DoubleFault2 is not implemented (other than flipping HCR_EL2.AMO on every EL2 entry/exit)? And am I the only one who finds this all incredibly inconsistent and confusing?
I feel like I'm missing some critical insight in how you were meant to think about this to make it make sense, would appreciate any help in that regard!
Thanks, Julius