Hi RagHu,

Really appreciate your help.

I was downloaded this software stack from git.linaro.org.  This software stack include ATF, kernel, edk2 and so on.
The user guide i used from linaro is:https://git.linaro.org/landing-teams/working/arm/arm-reference-platforms.git/about/docs/rdn1edge/user-guide.rst#obtaining-the-rd-n1-edge-and-rd-n1-edge-dual-fast-model 

1) What platform you are running on? Can this issue be reproduced 
outside your testing environment, perhaps on FVP or QEMU?
A: I am running on ARM N1-Edge FVP platform. It can reproduced on this FVP platform.

2) What version of TF-A and StandaloneMM is being used? Preferably the 
commit-id, so that we can be sure we are looking at the same code.
A: TF-A: https://git.linaro.org/landing-teams/working/arm/arm-tf.git   tag:RD-INFRA-20191024-RC0 
StandloneMM seems build from edk2 & edk2-platform. so i just put edk2 and edk2-platform version information. if anything i missed, please let me know.
edk2: https://git.linaro.org/landing-teams/working/arm/edk2.git                                             tag:RD-INFRA-20191024-RC0
edk2-platform: https://git.linaro.org/landing-teams/working/arm/edk2-platforms.git              tag:RD-INFRA-20191024-RC0

3) What version of the kernel and sdei driver is being used?
A: kernel-release: https://git.linaro.org/landing-teams/working/arm/kernel-release.git             tag:RD-INFRA-20191024-RC0
The sdei driver was included in kernel, do i need to provide sdei driver version?  If need please let me know.
4) I can't tell from looking at the log but do you know if writing 0x123 
to sde_ras_poison causes a DMC620 interrupt or an SError or external 
abort through memory access ?
A: Sorry, linaro only refered it will inject the DMC-620 single-bit RAS error. So I am also not sure which exception type it will trigger.

BRs,
Bin Wu

------------------原始邮件 ------------------
发件人:TF-A <tf-a-bounces@lists.trustedfirmware.org>
发送时间:Tue Apr 14 01:25:47 2020
收件人:Raghu Krishnamurthy via TF-A <tf-a@lists.trustedfirmware.org>
主题:Re: [TF-A] [RAS] BL32 UnRecognized Event - 0xC4000061 and BL31 Crashed
Hello,

 >>Does BL31 need to send 0xC4000061 event to BL32 again?

I don't think it will. It is really odd that 
0xC4000061(SP_EVENT_COMPLETE_AARCH64) ever reaches the BL32/MM handler. 
This is from looking at the upstream code quickly but it definitely 
depends on the platform you are running, what version of TF-A you are 
using, build options used. Is it possible that the unhandled exception 
is occurring after successful handling of the DMC620 error but there is 
a following issue that occurs right after, causing the crash?
 From the register dump it looks like there was an Instruction abort 
exception at address 0 while running in EL3. Something seems to have 
gone seriously wrong to have 0xC4000061 ever go back to BL32 and to get 
an instruction abort at address 0.

 >>Does current TF-A support to run RAS test? It seems BL31 will crash.
See above. The answer really depends on the factors mentioned above.

The following would be helpful to know:
1) What platform you are running on? Can this issue be reproduced 
outside your testing environment, perhaps on FVP or QEMU?
2) What version of TF-A and StandaloneMM is being used? Preferably the 
commit-id, so that we can be sure we are looking at the same code.
3) What version of the kernel and sdei driver is being used?
4) I can't tell from looking at the log but do you know if writing 0x123 
to sde_ras_poison causes a DMC620 interrupt or an SError or external 
abort through memory access ?

Thanks
Raghu


On 4/13/20 12:16 AM, 吴斌(郅隆) via TF-A wrote:
> Dear Friends,

> I am using TF-A to test RAS feature.
> When I triggered DMC620 RAS error in Linux(echo 0x123 > 
> /sys/kernel/debug/sdei_ras_poison).
> BL32 will recieve 
> UnRecognized Event - 0xC4000061(SP_EVENT_COMPLETE_AARCH64) and finally 
> BL31 crashed.

> In my understanding, this 0xC4000061 should consumed by BL31, not send 
> it to BL32 again.

> A piece of error log as below:

> *************************************

> CperWrite - CperAddress@0xFF610064
> CperWrite - 1 Section@FFBE91A8, Length 80, SectionType@FFBE9138
> CperWrite - Got Error Section: Platform Memory.
> MmEntryPoint Done
> Received delegated event
> X0 :  0xC4000061
> X1 :  0x0
> X2 :  0x0
> X3 :  0x0
> Received event - 0xC4000061 on cpu 0
> UnRecognized Event - 0xC4000061
> Failed delegated event 0xC4000061, Status 0x2
> Unhandled Exception in EL3.
> x30            = 0x0000000000000000
> x0             = 0x00000000ff007e00
> x1             = 0xfffffffffffffffe
> x2             = 0x00000000600003c0
> x3             = 0x0000000000000000
> x4             = 0x0000000000000000
> x5             = 0x0000000000000000
> x6             = 0x00000000ff015080
> x7             = 0x0000000000000000
> x8             = 0x00000000c4000061
> x9             = 0x0000000000000021
> x10            = 0x0000000000000040
> x11            = 0x00000000ff00f2b0
> x12            = 0x00000000ff0118c0
> x13            = 0x0000000000000002
> x14            = 0x00000000ff016b70
> x15            = 0x00000000ff003f20
> x16            = 0x0000000000000044
> x17            = 0x00000000ff010430
> x18            = 0x0000000000000e3c
> x19            = 0x0000000000000000
> More error log please refer to attachment.

> My question is,
> 1. Does BL31 need to send 0xC4000061 event to BL32 again?
> 2. Does current TF-A support to run RAS test? It seems BL31 will crash.

> Appreciate your help.

> BRs,
> Bin Wu

-- 
TF-A mailing list
TF-A@lists.trustedfirmware.org
https://lists.trustedfirmware.org/mailman/listinfo/tf-a