-----Original Message----- From: TF-A tf-a-bounces@lists.trustedfirmware.org On Behalf Of Raghu Krishnamurthy via TF-A Sent: 30 April 2020 02:33 To: Manish Badarkhe Manish.Badarkhe@arm.com; tf- a@lists.trustedfirmware.org Cc: nd nd@arm.com Subject: Re: [TF-A] Need input on Errata implementation
Hi Manish,
Really appreciate you for taking time to respond to my concerns/questions.
What about this situation? NS-EL2 makes an SMC call to EL3 to get some basic information like GET_SOC_INFORMATION. This is a simple SMC and there is no call to context save or context restore. During the SMC call, if there is a speculative AT instruction on a lower EL(say NS-EL2), there could be a bad cached translation. Do you not need to apply the errata in this situation ? If not, why?
We can't simply apply this errata on reset and just leave the system.
[RK]Totally agree. See CPU_E_HANDLER_FUNC. It is not necessary that cpu_ops are only called during reset and power down. CPU_E_HANDLER_FUNC is called at runtime due EA's.
We thought of taking different approach for this errata
implementation >>where anybody disable this workaround using macro as this errata is >>applicable for most of the CPUs (by default enabled) and can't be >>placed in cpu_ops.
[RK]This is a poor approach in my view. Most CPU's is not all CPU's. The reason the errata framework exists is to apply CPU specific erratas by checking for them dynamically. Different stepping's of the same CPU's may or may not have the errata and typically you check the MIDR to know if the errata applies or not. Linux does not apply the errata to all CPU's since "most" CPU's have the issue. They check for its existence at runtime and only then apply it. TF-A should not hold itself to a lower standard.
Hi Raghu I guess this depends on what the errata workaround involves. Since this workaround applies bit setting on an out of context register, it was not expected to affect the EL3 execution performance (or the lower level EL because the bits are restored on return). Also it was thought that the act of searching through the list of compiled CPUs and checking if the workaround is applicable might be more detrimental than the unilateral application of the workaround for this case (assuming no extra barriers are added since the code path it is inserted in have them already later in the sequence).
But I agree it is more elegant to have this coupled into CPU_OPS framework. I think Manish has some ideas for this.
Best Regards Soby Mathew
-Raghu
On 4/29/20 1:35 AM, Manish Badarkhe wrote:
Hi Raghu
Just to add/correct one more thing from my previous emails that this errata
workaround proposed is
applied to both normal and secure world switches to EL3.
Thanks Manish Badarkhe
On 29/04/2020, 12:25, "TF-A on behalf of Manish Badarkhe via TF-A" <tf-a-
bounces@lists.trustedfirmware.org on behalf of tf-a@lists.trustedfirmware.org> wrote:
Hi Raghu On 29/04/2020, 02:00, "Raghu K" <raghu.ncstate@icloud.com> wrote: Hi Manish, Thanks. >> we don’t have any AT instances in minimum execution window after
context switching from S-EL(1/2)
>> to EL3 and before updating TCR register. 1) What is the minimum execution window? Does that not change based
on micro-architecture?
Not sure about exact minimum execution window. IMO, it really depend
upon when "context_save" gets called after
entering into EL3 from S-EL1/2. It may changed upon micro-architecture.
Need some experts comment here.
2) Do we know that the "execution window" is exactly the same for all
the CPU's this errata applies to?
It may be but we should not worry on that if we don’t have any AT
instruction execution in that window.
Also, it appears we are only talking about switching from S-EL1/2 to EL3.
The same issue can happen when you go from NS-EL1/EL2 to EL3 as well. There also seems to be an assumption in the patch you submitted that this errata happens only during a so called context-switch. From my reading, the cortex- Ax errata notices don't limit the errata to occur only during "context-switches" in the "conditions" section and can occur while executing ANY code, although the work around section does muddy the waters a bit.
In Linux, at NS-EL2 this workaround is already in place. Hence we just
thought of considering cases from Secure EL side to put this workaround.
Yes, errata should not limit to particular conditional section but this
particular errata is not straight-forward like another errata placed in the code currently. We can't simply apply this errata on reset and just leave the system.
Back to problem, AT instruction speculative execution using out-of-
context regime that results in page table walk and generate the incorrect
translation which are cached in TLB. To avoid this issue we thought of
disabling PTW for that particular EL.
for e.g. If AT instruction execution for EL1 present in EL3 then we have to
make sure speculative behaviour of this AT should not result in incorrect translation cached in TLB. If system is always in EL3 (if we loop-in in EL3 always without going back and forth to/from lower EL) then in that case
there is no need of this workaround. Hence we thought to put this workaround over boundary context of
context switches. When "context save" (close to EL3 entry) happened we meticulously save all EL system registers (S-EL1/S-EL2) with PTW disabled and continue EL3 execution with PTW disabled ensuring we should not cache any incorrect translation for (S-EL1/S-EL2) and during "context restore" (i.e. close to EL3 exit) again we disabled PTW, restore all system registers for EL (S-EL1/S- EL2) except TCR and then restore TCR.
3) Has there been any work done to actually reproduce this issue and
also to see that this actually fixes the issue?
No this issue is hard to reproduce. 4) Has the CPU errata framework(cpu_ops etc.) been considered to
possibly implement the errata? Sprinkling erratas through common framework code does not seem like a good idea.
We thought of taking different approach for this errata implementation
where anybody disable this workaround using macro as this errata is applicable for most of the CPUs (by default enabled) and can't be placed in cpu_ops.
Thanks Raghu Thanks Manish Badarkhe On 4/28/20 1:44 AM, Manish Badarkhe wrote: > Hi Raghu > > Please see my replies inline. > > Regards > Manish Badarkhe > > On 28/04/2020, 11:29, "Raghu Krishnamurthy"
raghu.ncstate@icloud.com wrote:
> > Hi Manish, > > Understood. > > >>Hence before entering in EL3, we ensured that PTW is disabled
(at
> context save) > > The context save and restore functions are executed in EL3. So how
are
> you disabling PTW before entering EL3 ? > > Yes, I put it wrongly. We thought "context_save/restore" is best place
to disable PTW without much affecting the
> code because we don’t have any AT instances in minimum execution
window after context switching from S-EL(1/2)
> to EL3 and before updating TCR register. > > -Raghu > > Thanks > Manish Badarkhe > > On 4/27/20 10:53 PM, Manish Badarkhe wrote: > > Hi Raghu > > > > This workaround is specifically need for speculative AT instruction
behaviour in out of context regime.
> > That means executing AT instruction for lower ELs (S-EL1/S-EL2) in
higher EL i.e. EL3.
> > > > Behaviour of AT instruction is unaltered when it get executed in
same regime (when AT instruction executed for same EL
> > where it is executing) and there is no possibility to execute AT
instruction for higher EL in lower EL.
> > > > Hence before entering in EL3, we ensured that PTW is disabled (at
context save) and restore PTW back during
> > exit of EL3. (at context restore). > > > > Thanks > > Manish Badarkhe > > > > On 28/04/2020, 01:23, "Raghu K" <raghu.ncstate@icloud.com>
wrote:
> > > > Hi Manish, > > > > >>Hence proposed solution will work as it is > > > > [RK]If you are sure go ahead. I'm not convinced, but that may
be because
> > i don't understand the errata fully/correctly. > > > > >>This workaround is very specific during context switching > > > > [RK] Context switching has many meanings depending on the
context(OS,
> > hypervisor, TF-A world switch etc). The errata document i saw
does not
> > elaborate on this. Perhaps clarifying this will help understand
why the
> > solution you proposed will work. > > > > The solution below in points 2 and 3 have the same problem on
entry and
> > exit, mentioned in my first email. Before you call > > el1_sysregs_context_save, an AT instruction could have
speculatively
> > executed through speculation of branches that occur BEFORE
you call this
> > function, when TCR still has the enable bit set. The fact that you
don't
> > have an AT instruction in the context save routine or any
routine for
> > that matter, does not guarantee that the hardware did not
speculate
> > through some other means to reach an AT instruction. The
same applies to
> > the context_restore routines. There is no guarantee right after
you
> > finish the restore routing(and hence TCR has the enable bit set),
that
> > the CPU cannot speculate to an AT instruction. > > So i'm not clear how you can say for certain that there was no > > speculative AT instruction with the proposal below. > > > > Thanks > > Raghu > > > > On 4/27/20 10:08 AM, Manish Badarkhe wrote: > > > Hi All, > > > > > > Just update/correct details. > > > > > > Thanks > > > Manish Badarkhe > > > > > > On 27/04/2020, 22:13, "TF-A on behalf of Manish Badarkhe
via TF-A" <tf-a-bounces@lists.trustedfirmware.org on behalf of tf- a@lists.trustedfirmware.org> wrote:
> > > > > > Hi Raghu > > > > > > Please ignore my answer on question 2. > > > > > > With internal discussion came to below conclusion: > > > 1. This workaround is very specific during context
switching.
> > > 2 . If you check in context save routine
(el1_sysregs_context_save or el2_sysregs_context_save),
> > > As per proposed solution, First step performed is to
disable page table walk and we don’t have
> > > any AT instruction execution in context save routine. > > > This ensures that there will be no possibility of
speculative AT instruction execution without TCR update.
> > > 3. If you check in context restore routine
(el1_sysregs_context_restore or el2_sysregs_context_restore),
> > > As per proposed solution, first step performed is to
disable page table walk and we don’t have any
> > > AT instruction execution in context restore routine. > > > This ensures that there will be no possibility of
speculative AT instruction execution without TCR update.
> > > > > > Hence proposed solution will work as it is ensuring no
caching of translations in TLB while speculative AT instruction execution.
> > > > > > Thanks > > > Manish Badarkhe > > > > > > On 27/04/2020, 13:38, "TF-A on behalf of Manish Badarkhe
via TF-A" <tf-a-bounces@lists.trustedfirmware.org on behalf of tf- a@lists.trustedfirmware.org> wrote:
> > > > > > Hi Raghu > > > > > > Please see my answers inline > > > > > > On 25/04/2020, 06:38, "TF-A on behalf of Raghu K via TF-
A" <tf-a-bounces@lists.trustedfirmware.org on behalf of tf- a@lists.trustedfirmware.org> wrote:
> > > > > > Hi Manish, > > > > > > Before I agree or disagree with the suggested fix, the
following would
> > > be interesting to know/discuss. Please feel free to
correct me if i've
> > > misunderstood something. > > > > > > 1) Are "speculative" AT instructions subject to TCR_ELx
control bits for
> > > all the listed CPU's? I imagine the answer is yes but
would be good to
> > > get confirmation. I could not find any evidence in the
instruction
> > > description or psuedocode in the ARMv8 ARM. It is
possible to play many
> > > tricks on speculative execution of instructions such as
skipping checks
> > > and doing them only when the CPU knows the
instruction will be
> > > committed. If this is the case, changing TCR_ELx bits
may not work. The
> > > errata document is vague about how to fix it. > > > > > > The speculative AT instruction may behave as you
mentioned. We need more
> > > opinion on this. > > > Proposed fix I mentioned by referring linux workaround
for the same errata.
> > > Linux workaround is available in mainline kernel as
below:
> > >
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h... v5.7-rc3&id=bd227553ad5077f21ddb382dcd910ba46181805a
> > > > > > 2) Assuming the answer to question 1 is yes, your
proposal may not work
> > > as is. In the worst case, as soon as you enter EL3, the
very first thing
> > > that may happen, before you ever operate/write to
TCR_ELx, is a
> > > speculative AT instruction that caches a bad translation
in the TLB's.
> > > The same thing can happen on the exit path. As soon as
you restore the
> > > TCR_ELx register, the first thing that can happen is a
speculative AT
> > > that caches a bad translation. However, the el3_exit
path does have DSB
> > > before ERET, so we will not speculate to an AT
instruction if there are
> > > no branches between the instruction that sets TCR_ELx
and the ERET.
> > > Somewhere in between, it looks like we will need a
TLBI NSH to be
> > > certain there are no bad translation cached. This
obviously has a
> > > potential performance cost on the lower EL's. Every
entry into EL3
> > > flushes the TLB for lower EL's. > > > > > > Yes, this seems to be valid case during entry and exit path. > > > I am not quite sure in that case where we need to avoid
PTW.
> > > Also "TLBI NSH" works but it may cause performance
issue.
> > > Need some more opinion/thoughts on this. > > > > > > Just thinking, can sequence mentioned for context save
does not ensure that
> > > PTW is disabled? > > > Something as below as last step in ELx(1/2) context save
(elaborated more):
> > > > ·Save TCR register with PTW enable (EPD=0). (Just to
enable PTW during
> > > > restore context). Do not operate TCR_EL1x register
here just save its value to restore.
> > > > This ensures that during entry in EL3 there will be no
chance of PTW
> > > >. while executing AT instruction. > > > > > > Thanks > > > > > > Raghu > > > > > > Thanks > > > Manish Badarkhe > > > > > > On 4/24/20 2:56 AM, Manish Badarkhe via TF-A wrote: > > > > > > > > Hi All > > > > > > > > We are trying to implement errata which is applicable
for below CPUs:
> > > > > > > > <CPUs> : <Errata No.> > > > > > > > > Cortex-A53: 1530924 > > > > > > > > Cortex-A76: 1165522 > > > > Cortex-A72: 1319367 > > > > Cortex-A57: 1319537 > > > > Cortex-A55: 1530923 > > > > > > > > *Errata Description:* > > > > > > > > A speculative Address Translation (AT) instruction
translates using
> > > > registers that are associated with an out-of-context
translation
> > > > regime and caches the resulting translation in the TLB.
A subsequent
> > > > translation request that is generated when the out-
of-context
> > > > translation regime is current uses the previous cached
TLB entry
> > > > producing an incorrect virtual to physical mapping. > > > > > > > > *Probable solution is to implement below fix in
context.S file:*
> > > > > > > > *During ELx (1 or 2) context save:* > > > > > > > > ·Operate TCR_ELx(1/2) to disable page table walk by
operating EPD bits
> > > > > > > > oThis will avoid any page table walk for S-EL1 or S-EL2.
This will
> > > > help in avoiding caching of translations in TLB > > > > > > > > for S-EL1/S-EL2 in EL3. > > > > > > > > ·Save all system registers (which is already available)
except TCR
> > > > > > > > ·Clear EPD bits of TCR and then save. (Just to enable
PTW during
> > > > restore context). > > > > > > > > *During ELx (1 or 2) context restore:* > > > > > > > > * Operate TCR_ELx(1/2) to disable page table walk
by operating EPD bits
> > > > * Restore all system registers (which are saved
during context save)
> > > > except TCR register. > > > > * Restore TCR_ELx(1/2) register (which enable back
PTW).
> > > > > > > > With above we ensured that there will be no page
table walk for S-EL1
> > > > and S-EL2 in EL3. > > > > > > > > is this proper other way to fix this problem? Need
some suggestion/use
> > > > cases where and all we need this workaround in TF-A
code.
> > > > > > > > Thanks > > > > > > > > Manish Badarkhe > > > > > > > > IMPORTANT NOTICE: The contents of this email and
any attachments are
> > > > confidential and may also be privileged. If you are not
the intended
> > > > recipient, please notify the sender immediately and
do not disclose
> > > > the contents to any other person, use it for any
purpose, or store or
> > > > copy the information in any medium. Thank you. > > > > > > > > > > -- > > > TF-A mailing list > > > TF-A@lists.trustedfirmware.org > > > https://lists.trustedfirmware.org/mailman/listinfo/tf-a > > > > > > IMPORTANT NOTICE: The contents of this email and any
attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
> > > -- > > > TF-A mailing list > > > TF-A@lists.trustedfirmware.org > > > https://lists.trustedfirmware.org/mailman/listinfo/tf-a > > > > > > IMPORTANT NOTICE: The contents of this email and any
attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
> > > -- > > > TF-A mailing list > > > TF-A@lists.trustedfirmware.org > > > https://lists.trustedfirmware.org/mailman/listinfo/tf-a > > > > > > IMPORTANT NOTICE: The contents of this email and any
attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
> > > > > > IMPORTANT NOTICE: The contents of this email and any
attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
> > > > IMPORTANT NOTICE: The contents of this email and any attachments
are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
-- TF-A mailing list TF-A@lists.trustedfirmware.org https://lists.trustedfirmware.org/mailman/listinfo/tf-a
-- TF-A mailing list TF-A@lists.trustedfirmware.org https://lists.trustedfirmware.org/mailman/listinfo/tf-a
Thanks Soby. If the fix involves looping through a list of CPU's, the approach may be fine. I missed the case where you could have multiple CPU MIDR's on the same SoC(big.little's???). The ideal solution would be SoC's with CPU's *without* this errata have zero penalty(no checking for the errata, perhaps by being compiled out), and for SoC's that contain CPU's with the errata have a fairly quick check or an always compiled in errata fix. A combination of platform specific compile time flags and/or per-cpu variables or global variables should be able to achieve this.
Also do you have any insights on if the workaround needs to be applied in the scenario i mentioned below(simple SMC call without any call to context save/restore)? I'm trying to understand why the errata applies only if we hit the context-save/restore path.
-Raghu
On 5/1/20 7:02 AM, Soby Mathew wrote:
-----Original Message----- From: TF-A tf-a-bounces@lists.trustedfirmware.org On Behalf Of Raghu Krishnamurthy via TF-A Sent: 30 April 2020 02:33 To: Manish Badarkhe Manish.Badarkhe@arm.com; tf- a@lists.trustedfirmware.org Cc: nd nd@arm.com Subject: Re: [TF-A] Need input on Errata implementation
Hi Manish,
Really appreciate you for taking time to respond to my concerns/questions.
What about this situation? NS-EL2 makes an SMC call to EL3 to get some basic information like GET_SOC_INFORMATION. This is a simple SMC and there is no call to context save or context restore. During the SMC call, if there is a speculative AT instruction on a lower EL(say NS-EL2), there could be a bad cached translation. Do you not need to apply the errata in this situation ? If not, why?
We can't simply apply this errata on reset and just leave the system.
[RK]Totally agree. See CPU_E_HANDLER_FUNC. It is not necessary that cpu_ops are only called during reset and power down. CPU_E_HANDLER_FUNC is called at runtime due EA's.
We thought of taking different approach for this errata
implementation >>where anybody disable this workaround using macro as this errata is >>applicable for most of the CPUs (by default enabled) and can't be >>placed in cpu_ops.
[RK]This is a poor approach in my view. Most CPU's is not all CPU's. The reason the errata framework exists is to apply CPU specific erratas by checking for them dynamically. Different stepping's of the same CPU's may or may not have the errata and typically you check the MIDR to know if the errata applies or not. Linux does not apply the errata to all CPU's since "most" CPU's have the issue. They check for its existence at runtime and only then apply it. TF-A should not hold itself to a lower standard.
Hi Raghu I guess this depends on what the errata workaround involves. Since this workaround applies bit setting on an out of context register, it was not expected to affect the EL3 execution performance (or the lower level EL because the bits are restored on return). Also it was thought that the act of searching through the list of compiled CPUs and checking if the workaround is applicable might be more detrimental than the unilateral application of the workaround for this case (assuming no extra barriers are added since the code path it is inserted in have them already later in the sequence).
But I agree it is more elegant to have this coupled into CPU_OPS framework. I think Manish has some ideas for this.
Best Regards Soby Mathew
-Raghu
On 4/29/20 1:35 AM, Manish Badarkhe wrote:
Hi Raghu
Just to add/correct one more thing from my previous emails that this errata
workaround proposed is
applied to both normal and secure world switches to EL3.
Thanks Manish Badarkhe
On 29/04/2020, 12:25, "TF-A on behalf of Manish Badarkhe via TF-A" <tf-a-
bounces@lists.trustedfirmware.org on behalf of tf-a@lists.trustedfirmware.org> wrote:
Hi Raghu On 29/04/2020, 02:00, "Raghu K" <raghu.ncstate@icloud.com> wrote: Hi Manish, Thanks. >> we don’t have any AT instances in minimum execution window after
context switching from S-EL(1/2)
>> to EL3 and before updating TCR register. 1) What is the minimum execution window? Does that not change based
on micro-architecture?
Not sure about exact minimum execution window. IMO, it really depend
upon when "context_save" gets called after
entering into EL3 from S-EL1/2. It may changed upon micro-architecture.
Need some experts comment here.
2) Do we know that the "execution window" is exactly the same for all
the CPU's this errata applies to?
It may be but we should not worry on that if we don’t have any AT
instruction execution in that window.
Also, it appears we are only talking about switching from S-EL1/2 to EL3.
The same issue can happen when you go from NS-EL1/EL2 to EL3 as well. There also seems to be an assumption in the patch you submitted that this errata happens only during a so called context-switch. From my reading, the cortex- Ax errata notices don't limit the errata to occur only during "context-switches" in the "conditions" section and can occur while executing ANY code, although the work around section does muddy the waters a bit.
In Linux, at NS-EL2 this workaround is already in place. Hence we just
thought of considering cases from Secure EL side to put this workaround.
Yes, errata should not limit to particular conditional section but this
particular errata is not straight-forward like another errata placed in the code currently. We can't simply apply this errata on reset and just leave the system.
Back to problem, AT instruction speculative execution using out-of-
context regime that results in page table walk and generate the incorrect
translation which are cached in TLB. To avoid this issue we thought of
disabling PTW for that particular EL.
for e.g. If AT instruction execution for EL1 present in EL3 then we have to
make sure speculative behaviour of this AT should not result in incorrect translation cached in TLB. If system is always in EL3 (if we loop-in in EL3 always without going back and forth to/from lower EL) then in that case
there is no need of this workaround. Hence we thought to put this workaround over boundary context of
context switches. When "context save" (close to EL3 entry) happened we meticulously save all EL system registers (S-EL1/S-EL2) with PTW disabled and continue EL3 execution with PTW disabled ensuring we should not cache any incorrect translation for (S-EL1/S-EL2) and during "context restore" (i.e. close to EL3 exit) again we disabled PTW, restore all system registers for EL (S-EL1/S- EL2) except TCR and then restore TCR.
3) Has there been any work done to actually reproduce this issue and
also to see that this actually fixes the issue?
No this issue is hard to reproduce. 4) Has the CPU errata framework(cpu_ops etc.) been considered to
possibly implement the errata? Sprinkling erratas through common framework code does not seem like a good idea.
We thought of taking different approach for this errata implementation
where anybody disable this workaround using macro as this errata is applicable for most of the CPUs (by default enabled) and can't be placed in cpu_ops.
Thanks Raghu Thanks Manish Badarkhe On 4/28/20 1:44 AM, Manish Badarkhe wrote: > Hi Raghu > > Please see my replies inline. > > Regards > Manish Badarkhe > > On 28/04/2020, 11:29, "Raghu Krishnamurthy"
raghu.ncstate@icloud.com wrote:
> > Hi Manish, > > Understood. > > >>Hence before entering in EL3, we ensured that PTW is disabled
(at
> context save) > > The context save and restore functions are executed in EL3. So how
are
> you disabling PTW before entering EL3 ? > > Yes, I put it wrongly. We thought "context_save/restore" is best place
to disable PTW without much affecting the
> code because we don’t have any AT instances in minimum execution
window after context switching from S-EL(1/2)
> to EL3 and before updating TCR register. > > -Raghu > > Thanks > Manish Badarkhe > > On 4/27/20 10:53 PM, Manish Badarkhe wrote: > > Hi Raghu > > > > This workaround is specifically need for speculative AT instruction
behaviour in out of context regime.
> > That means executing AT instruction for lower ELs (S-EL1/S-EL2) in
higher EL i.e. EL3.
> > > > Behaviour of AT instruction is unaltered when it get executed in
same regime (when AT instruction executed for same EL
> > where it is executing) and there is no possibility to execute AT
instruction for higher EL in lower EL.
> > > > Hence before entering in EL3, we ensured that PTW is disabled (at
context save) and restore PTW back during
> > exit of EL3. (at context restore). > > > > Thanks > > Manish Badarkhe > > > > On 28/04/2020, 01:23, "Raghu K" <raghu.ncstate@icloud.com>
wrote:
> > > > Hi Manish, > > > > >>Hence proposed solution will work as it is > > > > [RK]If you are sure go ahead. I'm not convinced, but that may
be because
> > i don't understand the errata fully/correctly. > > > > >>This workaround is very specific during context switching > > > > [RK] Context switching has many meanings depending on the
context(OS,
> > hypervisor, TF-A world switch etc). The errata document i saw
does not
> > elaborate on this. Perhaps clarifying this will help understand
why the
> > solution you proposed will work. > > > > The solution below in points 2 and 3 have the same problem on
entry and
> > exit, mentioned in my first email. Before you call > > el1_sysregs_context_save, an AT instruction could have
speculatively
> > executed through speculation of branches that occur BEFORE
you call this
> > function, when TCR still has the enable bit set. The fact that you
don't
> > have an AT instruction in the context save routine or any
routine for
> > that matter, does not guarantee that the hardware did not
speculate
> > through some other means to reach an AT instruction. The
same applies to
> > the context_restore routines. There is no guarantee right after
you
> > finish the restore routing(and hence TCR has the enable bit set),
that
> > the CPU cannot speculate to an AT instruction. > > So i'm not clear how you can say for certain that there was no > > speculative AT instruction with the proposal below. > > > > Thanks > > Raghu > > > > On 4/27/20 10:08 AM, Manish Badarkhe wrote: > > > Hi All, > > > > > > Just update/correct details. > > > > > > Thanks > > > Manish Badarkhe > > > > > > On 27/04/2020, 22:13, "TF-A on behalf of Manish Badarkhe
via TF-A" <tf-a-bounces@lists.trustedfirmware.org on behalf of tf- a@lists.trustedfirmware.org> wrote:
> > > > > > Hi Raghu > > > > > > Please ignore my answer on question 2. > > > > > > With internal discussion came to below conclusion: > > > 1. This workaround is very specific during context
switching.
> > > 2 . If you check in context save routine
(el1_sysregs_context_save or el2_sysregs_context_save),
> > > As per proposed solution, First step performed is to
disable page table walk and we don’t have
> > > any AT instruction execution in context save routine. > > > This ensures that there will be no possibility of
speculative AT instruction execution without TCR update.
> > > 3. If you check in context restore routine
(el1_sysregs_context_restore or el2_sysregs_context_restore),
> > > As per proposed solution, first step performed is to
disable page table walk and we don’t have any
> > > AT instruction execution in context restore routine. > > > This ensures that there will be no possibility of
speculative AT instruction execution without TCR update.
> > > > > > Hence proposed solution will work as it is ensuring no
caching of translations in TLB while speculative AT instruction execution.
> > > > > > Thanks > > > Manish Badarkhe > > > > > > On 27/04/2020, 13:38, "TF-A on behalf of Manish Badarkhe
via TF-A" <tf-a-bounces@lists.trustedfirmware.org on behalf of tf- a@lists.trustedfirmware.org> wrote:
> > > > > > Hi Raghu > > > > > > Please see my answers inline > > > > > > On 25/04/2020, 06:38, "TF-A on behalf of Raghu K via TF-
A" <tf-a-bounces@lists.trustedfirmware.org on behalf of tf- a@lists.trustedfirmware.org> wrote:
> > > > > > Hi Manish, > > > > > > Before I agree or disagree with the suggested fix, the
following would
> > > be interesting to know/discuss. Please feel free to
correct me if i've
> > > misunderstood something. > > > > > > 1) Are "speculative" AT instructions subject to TCR_ELx
control bits for
> > > all the listed CPU's? I imagine the answer is yes but
would be good to
> > > get confirmation. I could not find any evidence in the
instruction
> > > description or psuedocode in the ARMv8 ARM. It is
possible to play many
> > > tricks on speculative execution of instructions such as
skipping checks
> > > and doing them only when the CPU knows the
instruction will be
> > > committed. If this is the case, changing TCR_ELx bits
may not work. The
> > > errata document is vague about how to fix it. > > > > > > The speculative AT instruction may behave as you
mentioned. We need more
> > > opinion on this. > > > Proposed fix I mentioned by referring linux workaround
for the same errata.
> > > Linux workaround is available in mainline kernel as
below:
> > >
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h... v5.7-rc3&id=bd227553ad5077f21ddb382dcd910ba46181805a
> > > > > > 2) Assuming the answer to question 1 is yes, your
proposal may not work
> > > as is. In the worst case, as soon as you enter EL3, the
very first thing
> > > that may happen, before you ever operate/write to
TCR_ELx, is a
> > > speculative AT instruction that caches a bad translation
in the TLB's.
> > > The same thing can happen on the exit path. As soon as
you restore the
> > > TCR_ELx register, the first thing that can happen is a
speculative AT
> > > that caches a bad translation. However, the el3_exit
path does have DSB
> > > before ERET, so we will not speculate to an AT
instruction if there are
> > > no branches between the instruction that sets TCR_ELx
and the ERET.
> > > Somewhere in between, it looks like we will need a
TLBI NSH to be
> > > certain there are no bad translation cached. This
obviously has a
> > > potential performance cost on the lower EL's. Every
entry into EL3
> > > flushes the TLB for lower EL's. > > > > > > Yes, this seems to be valid case during entry and exit path. > > > I am not quite sure in that case where we need to avoid
PTW.
> > > Also "TLBI NSH" works but it may cause performance
issue.
> > > Need some more opinion/thoughts on this. > > > > > > Just thinking, can sequence mentioned for context save
does not ensure that
> > > PTW is disabled? > > > Something as below as last step in ELx(1/2) context save
(elaborated more):
> > > > ·Save TCR register with PTW enable (EPD=0). (Just to
enable PTW during
> > > > restore context). Do not operate TCR_EL1x register
here just save its value to restore.
> > > > This ensures that during entry in EL3 there will be no
chance of PTW
> > > >. while executing AT instruction. > > > > > > Thanks > > > > > > Raghu > > > > > > Thanks > > > Manish Badarkhe > > > > > > On 4/24/20 2:56 AM, Manish Badarkhe via TF-A wrote: > > > > > > > > Hi All > > > > > > > > We are trying to implement errata which is applicable
for below CPUs:
> > > > > > > > <CPUs> : <Errata No.> > > > > > > > > Cortex-A53: 1530924 > > > > > > > > Cortex-A76: 1165522 > > > > Cortex-A72: 1319367 > > > > Cortex-A57: 1319537 > > > > Cortex-A55: 1530923 > > > > > > > > *Errata Description:* > > > > > > > > A speculative Address Translation (AT) instruction
translates using
> > > > registers that are associated with an out-of-context
translation
> > > > regime and caches the resulting translation in the TLB.
A subsequent
> > > > translation request that is generated when the out-
of-context
> > > > translation regime is current uses the previous cached
TLB entry
> > > > producing an incorrect virtual to physical mapping. > > > > > > > > *Probable solution is to implement below fix in
context.S file:*
> > > > > > > > *During ELx (1 or 2) context save:* > > > > > > > > ·Operate TCR_ELx(1/2) to disable page table walk by
operating EPD bits
> > > > > > > > oThis will avoid any page table walk for S-EL1 or S-EL2.
This will
> > > > help in avoiding caching of translations in TLB > > > > > > > > for S-EL1/S-EL2 in EL3. > > > > > > > > ·Save all system registers (which is already available)
except TCR
> > > > > > > > ·Clear EPD bits of TCR and then save. (Just to enable
PTW during
> > > > restore context). > > > > > > > > *During ELx (1 or 2) context restore:* > > > > > > > > * Operate TCR_ELx(1/2) to disable page table walk
by operating EPD bits
> > > > * Restore all system registers (which are saved
during context save)
> > > > except TCR register. > > > > * Restore TCR_ELx(1/2) register (which enable back
PTW).
> > > > > > > > With above we ensured that there will be no page
table walk for S-EL1
> > > > and S-EL2 in EL3. > > > > > > > > is this proper other way to fix this problem? Need
some suggestion/use
> > > > cases where and all we need this workaround in TF-A
code.
> > > > > > > > Thanks > > > > > > > > Manish Badarkhe > > > > > > > > IMPORTANT NOTICE: The contents of this email and
any attachments are
> > > > confidential and may also be privileged. If you are not
the intended
> > > > recipient, please notify the sender immediately and
do not disclose
> > > > the contents to any other person, use it for any
purpose, or store or
> > > > copy the information in any medium. Thank you. > > > > > > > > > > -- > > > TF-A mailing list > > > TF-A@lists.trustedfirmware.org > > > https://lists.trustedfirmware.org/mailman/listinfo/tf-a > > > > > > IMPORTANT NOTICE: The contents of this email and any
attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
> > > -- > > > TF-A mailing list > > > TF-A@lists.trustedfirmware.org > > > https://lists.trustedfirmware.org/mailman/listinfo/tf-a > > > > > > IMPORTANT NOTICE: The contents of this email and any
attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
> > > -- > > > TF-A mailing list > > > TF-A@lists.trustedfirmware.org > > > https://lists.trustedfirmware.org/mailman/listinfo/tf-a > > > > > > IMPORTANT NOTICE: The contents of this email and any
attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
> > > > > > IMPORTANT NOTICE: The contents of this email and any
attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
> > > > IMPORTANT NOTICE: The contents of this email and any attachments
are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
-- TF-A mailing list TF-A@lists.trustedfirmware.org https://lists.trustedfirmware.org/mailman/listinfo/tf-a
-- TF-A mailing list TF-A@lists.trustedfirmware.org https://lists.trustedfirmware.org/mailman/listinfo/tf-a
Hi Raghu There would always be a build option that would disable this workaround. The runtime check based on MIDR would be an add-on on top of that. So for big-Little systems, if any one of the CPU is affected, the platform would enable the workaround. The runtime check could potentially skip the unaffected CPU in the system. The benefit of this skip may not justify the additional lookup and complexity needed to pull in the cpu_ops framework for the runtime check. Manish is doing some thinking in this direction.
For the unaffected platform, this runtime check wouldn't matter as the build option to enable the workaround would be disabled.
This is what we currently understand, the errata only happens during a context switch for the out of context translation regime. So for a simple EL1/EL2 SMC call into EL3 and return, this errata will not apply as the EL2/EL1 context is not changed. The TLB caching due to speculation in EL3 should not really matter as the lower EL context on return will be same as on entry.
The context switch happens for a NS ->EL3 -> S-EL1 SMC call. Hence the reasoning behind adding it to context save and restore sequence as this is only invoked for such a switch.
This is as per my understanding of the problem.
Best Regards Soby Mathew
-----Original Message----- From: Raghu K raghu.ncstate@icloud.com Sent: 01 May 2020 19:22 To: Soby Mathew Soby.Mathew@arm.com; Manish Badarkhe Manish.Badarkhe@arm.com Cc: tf-a@lists.trustedfirmware.org Subject: Re: [TF-A] Need input on Errata implementation
Thanks Soby. If the fix involves looping through a list of CPU's, the approach may be fine. I missed the case where you could have multiple CPU MIDR's on the same SoC(big.little's???). The ideal solution would be SoC's with CPU's *without* this errata have zero penalty(no checking for the errata, perhaps by being compiled out), and for SoC's that contain CPU's with the errata have a fairly quick check or an always compiled in errata fix. A combination of platform specific compile time flags and/or per-cpu variables or global variables should be able to achieve this.
Also do you have any insights on if the workaround needs to be applied in the scenario i mentioned below(simple SMC call without any call to context save/restore)? I'm trying to understand why the errata applies only if we hit the context-save/restore path.
-Raghu
On 5/1/20 7:02 AM, Soby Mathew wrote:
-----Original Message----- From: TF-A tf-a-bounces@lists.trustedfirmware.org On Behalf Of Raghu Krishnamurthy via TF-A Sent: 30 April 2020 02:33 To: Manish Badarkhe Manish.Badarkhe@arm.com; tf- a@lists.trustedfirmware.org Cc: nd nd@arm.com Subject: Re: [TF-A] Need input on Errata implementation
Hi Manish,
Really appreciate you for taking time to respond to my concerns/questions.
What about this situation? NS-EL2 makes an SMC call to EL3 to get some basic information like GET_SOC_INFORMATION. This is a simple SMC and there is no call to context save or context restore. During the SMC call, if there is a speculative AT instruction on a lower EL(say NS-EL2), there could be a bad cached translation. Do you not need to apply the errata in this situation ? If not, why?
We can't simply apply this errata on reset and just leave the system.
[RK]Totally agree. See CPU_E_HANDLER_FUNC. It is not necessary that cpu_ops are only called during reset and power down. CPU_E_HANDLER_FUNC is called at runtime due EA's.
We thought of taking different approach for this errata
implementation >>where anybody disable this workaround using macro as this errata is >>applicable for most of the CPUs (by default enabled) and can't be >>placed in cpu_ops.
[RK]This is a poor approach in my view. Most CPU's is not all CPU's. The reason the errata framework exists is to apply CPU specific erratas by checking for them dynamically. Different stepping's of the same CPU's may or may not have the errata and typically you check the MIDR to know if the errata applies or not. Linux does not apply the errata to all CPU's since "most" CPU's have the issue. They check for its existence at runtime and only then apply it. TF-A should not hold itself to
a lower standard.
Hi Raghu I guess this depends on what the errata workaround involves. Since this
workaround applies bit setting on an out of context register, it was not expected to affect the EL3 execution performance (or the lower level EL because the bits are restored on return). Also it was thought that the act of searching through the list of compiled CPUs and checking if the workaround is applicable might be more detrimental than the unilateral application of the workaround for this case (assuming no extra barriers are added since the code path it is inserted in have them already later in the sequence).
But I agree it is more elegant to have this coupled into CPU_OPS framework. I
think Manish has some ideas for this.
Best Regards Soby Mathew
-Raghu
On 4/29/20 1:35 AM, Manish Badarkhe wrote:
Hi Raghu
Just to add/correct one more thing from my previous emails that this errata
workaround proposed is
applied to both normal and secure world switches to EL3.
Thanks Manish Badarkhe
On 29/04/2020, 12:25, "TF-A on behalf of Manish Badarkhe via TF-A" <tf-a-
bounces@lists.trustedfirmware.org on behalf of tf-a@lists.trustedfirmware.org> wrote:
Hi Raghu On 29/04/2020, 02:00, "Raghu K" <raghu.ncstate@icloud.com> wrote: Hi Manish, Thanks. >> we don’t have any AT instances in minimum execution
window after
context switching from S-EL(1/2)
>> to EL3 and before updating TCR register. 1) What is the minimum execution window? Does that not
change based
on micro-architecture?
Not sure about exact minimum execution window. IMO, it really
depend
upon when "context_save" gets called after
entering into EL3 from S-EL1/2. It may changed upon micro-architecture.
Need some experts comment here.
2) Do we know that the "execution window" is exactly the
same for all
the CPU's this errata applies to?
It may be but we should not worry on that if we don’t have any
AT
instruction execution in that window.
Also, it appears we are only talking about switching from S-EL1/2 to
EL3.
The same issue can happen when you go from NS-EL1/EL2 to EL3 as well. There also seems to be an assumption in the patch you submitted that this errata happens only during a so called context-switch. From my reading, the cortex- Ax errata notices don't limit the errata to occur only
during "context-switches"
in the "conditions" section and can occur while executing ANY code, although the work around section does muddy the waters a bit.
In Linux, at NS-EL2 this workaround is already in place. Hence
we just
thought of considering cases from Secure EL side to put this workaround.
Yes, errata should not limit to particular conditional section
but this
particular errata is not straight-forward like another errata placed in the code currently. We can't simply apply this errata on reset and just
leave the system.
Back to problem, AT instruction speculative execution using
out-of-
context regime that results in page table walk and generate the incorrect
translation which are cached in TLB. To avoid this issue we
thought of
disabling PTW for that particular EL.
for e.g. If AT instruction execution for EL1 present in EL3
then we have to
make sure speculative behaviour of this AT should not result in incorrect translation cached in TLB. If system is always in EL3 (if we loop-in in EL3 always without going back and forth to/from lower EL) then in that case
there is no need of this workaround. Hence we thought to put this workaround over boundary context
of
context switches. When "context save" (close to EL3 entry) happened we meticulously save all EL system registers (S-EL1/S-EL2) with PTW disabled and continue EL3 execution with PTW disabled ensuring we should not cache any incorrect translation for (S-EL1/S-EL2) and during "context restore" (i.e. close to EL3 exit) again we disabled PTW, restore all system registers for EL (S-EL1/S- EL2) except TCR and then restore TCR.
3) Has there been any work done to actually reproduce this
issue and
also to see that this actually fixes the issue?
No this issue is hard to reproduce. 4) Has the CPU errata framework(cpu_ops etc.) been
considered to
possibly implement the errata? Sprinkling erratas through common framework code does not seem like a good idea.
We thought of taking different approach for this errata
implementation
where anybody disable this workaround using macro as this errata is applicable for most of the CPUs (by default enabled) and can't be placed in cpu_ops.
Thanks Raghu Thanks Manish Badarkhe On 4/28/20 1:44 AM, Manish Badarkhe wrote: > Hi Raghu > > Please see my replies inline. > > Regards > Manish Badarkhe > > On 28/04/2020, 11:29, "Raghu Krishnamurthy"
raghu.ncstate@icloud.com wrote:
> > Hi Manish, > > Understood. > > >>Hence before entering in EL3, we ensured that PTW is
disabled
(at
> context save) > > The context save and restore functions are executed in EL3. So
how
are
> you disabling PTW before entering EL3 ? > > Yes, I put it wrongly. We thought "context_save/restore"
is best place
to disable PTW without much affecting the
> code because we don’t have any AT instances in minimum
execution
window after context switching from S-EL(1/2)
> to EL3 and before updating TCR register. > > -Raghu > > Thanks > Manish Badarkhe > > On 4/27/20 10:53 PM, Manish Badarkhe wrote: > > Hi Raghu > > > > This workaround is specifically need for speculative AT
instruction
behaviour in out of context regime.
> > That means executing AT instruction for lower ELs (S-EL1/S-EL2)
in
higher EL i.e. EL3.
> > > > Behaviour of AT instruction is unaltered when it get executed
in
same regime (when AT instruction executed for same EL
> > where it is executing) and there is no possibility to execute AT
instruction for higher EL in lower EL.
> > > > Hence before entering in EL3, we ensured that PTW is disabled
(at
context save) and restore PTW back during
> > exit of EL3. (at context restore). > > > > Thanks > > Manish Badarkhe > > > > On 28/04/2020, 01:23, "Raghu K" <raghu.ncstate@icloud.com>
wrote:
> > > > Hi Manish, > > > > >>Hence proposed solution will work as it is > > > > [RK]If you are sure go ahead. I'm not convinced, but that may
be because
> > i don't understand the errata fully/correctly. > > > > >>This workaround is very specific during context switching > > > > [RK] Context switching has many meanings depending on the
context(OS,
> > hypervisor, TF-A world switch etc). The errata document i
saw
does not
> > elaborate on this. Perhaps clarifying this will help
understand
why the
> > solution you proposed will work. > > > > The solution below in points 2 and 3 have the same problem
on
entry and
> > exit, mentioned in my first email. Before you call > > el1_sysregs_context_save, an AT instruction could have
speculatively
> > executed through speculation of branches that occur
BEFORE
you call this
> > function, when TCR still has the enable bit set. The fact that
you
don't
> > have an AT instruction in the context save routine or any
routine for
> > that matter, does not guarantee that the hardware did not
speculate
> > through some other means to reach an AT instruction. The
same applies to
> > the context_restore routines. There is no guarantee right
after
you
> > finish the restore routing(and hence TCR has the enable bit
set),
that
> > the CPU cannot speculate to an AT instruction. > > So i'm not clear how you can say for certain that there was
no
> > speculative AT instruction with the proposal below. > > > > Thanks > > Raghu > > > > On 4/27/20 10:08 AM, Manish Badarkhe wrote: > > > Hi All, > > > > > > Just update/correct details. > > > > > > Thanks > > > Manish Badarkhe > > > > > > On 27/04/2020, 22:13, "TF-A on behalf of Manish Badarkhe
via TF-A" <tf-a-bounces@lists.trustedfirmware.org on behalf of tf- a@lists.trustedfirmware.org> wrote:
> > > > > > Hi Raghu > > > > > > Please ignore my answer on question 2. > > > > > > With internal discussion came to below conclusion: > > > 1. This workaround is very specific during context
switching.
> > > 2 . If you check in context save routine
(el1_sysregs_context_save or el2_sysregs_context_save),
> > > As per proposed solution, First step performed is to
disable page table walk and we don’t have
> > > any AT instruction execution in context save routine. > > > This ensures that there will be no possibility of
speculative AT instruction execution without TCR update.
> > > 3. If you check in context restore routine
(el1_sysregs_context_restore or el2_sysregs_context_restore),
> > > As per proposed solution, first step performed is to
disable page table walk and we don’t have any
> > > AT instruction execution in context restore routine. > > > This ensures that there will be no possibility of
speculative AT instruction execution without TCR update.
> > > > > > Hence proposed solution will work as it is ensuring no
caching of translations in TLB while speculative AT instruction execution.
> > > > > > Thanks > > > Manish Badarkhe > > > > > > On 27/04/2020, 13:38, "TF-A on behalf of Manish
Badarkhe
via TF-A" <tf-a-bounces@lists.trustedfirmware.org on behalf of tf- a@lists.trustedfirmware.org> wrote:
> > > > > > Hi Raghu > > > > > > Please see my answers inline > > > > > > On 25/04/2020, 06:38, "TF-A on behalf of Raghu K via
TF-
A" <tf-a-bounces@lists.trustedfirmware.org on behalf of tf- a@lists.trustedfirmware.org> wrote:
> > > > > > Hi Manish, > > > > > > Before I agree or disagree with the suggested fix, the
following would
> > > be interesting to know/discuss. Please feel free to
correct me if i've
> > > misunderstood something. > > > > > > 1) Are "speculative" AT instructions subject to
TCR_ELx
control bits for
> > > all the listed CPU's? I imagine the answer is yes but
would be good to
> > > get confirmation. I could not find any evidence in
the
instruction
> > > description or psuedocode in the ARMv8 ARM. It is
possible to play many
> > > tricks on speculative execution of instructions such
as
skipping checks
> > > and doing them only when the CPU knows the
instruction will be
> > > committed. If this is the case, changing TCR_ELx bits
may not work. The
> > > errata document is vague about how to fix it. > > > > > > The speculative AT instruction may behave as you
mentioned. We need more
> > > opinion on this. > > > Proposed fix I mentioned by referring linux
workaround
for the same errata.
> > > Linux workaround is available in mainline kernel as
below:
> > >
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/co mmit/?h= v5.7-rc3&id=bd227553ad5077f21ddb382dcd910ba46181805a
> > > > > > 2) Assuming the answer to question 1 is yes, your
proposal may not work
> > > as is. In the worst case, as soon as you enter EL3, the
very first thing
> > > that may happen, before you ever operate/write to
TCR_ELx, is a
> > > speculative AT instruction that caches a bad
translation
in the TLB's.
> > > The same thing can happen on the exit path. As
soon as
you restore the
> > > TCR_ELx register, the first thing that can happen is a
speculative AT
> > > that caches a bad translation. However, the el3_exit
path does have DSB
> > > before ERET, so we will not speculate to an AT
instruction if there are
> > > no branches between the instruction that sets
TCR_ELx
and the ERET.
> > > Somewhere in between, it looks like we will need a
TLBI NSH to be
> > > certain there are no bad translation cached. This
obviously has a
> > > potential performance cost on the lower EL's. Every
entry into EL3
> > > flushes the TLB for lower EL's. > > > > > > Yes, this seems to be valid case during entry and exit
path.
> > > I am not quite sure in that case where we need to
avoid
PTW.
> > > Also "TLBI NSH" works but it may cause performance
issue.
> > > Need some more opinion/thoughts on this. > > > > > > Just thinking, can sequence mentioned for context
save
does not ensure that
> > > PTW is disabled? > > > Something as below as last step in ELx(1/2) context
save
(elaborated more):
> > > > ·Save TCR register with PTW enable (EPD=0). (Just
to
enable PTW during
> > > > restore context). Do not operate TCR_EL1x register
here just save its value to restore.
> > > > This ensures that during entry in EL3 there will be
no
chance of PTW
> > > >. while executing AT instruction. > > > > > > Thanks > > > > > > Raghu > > > > > > Thanks > > > Manish Badarkhe > > > > > > On 4/24/20 2:56 AM, Manish Badarkhe via TF-A
wrote:
> > > > > > > > Hi All > > > > > > > > We are trying to implement errata which is
applicable
for below CPUs:
> > > > > > > > <CPUs> : <Errata No.> > > > > > > > > Cortex-A53: 1530924 > > > > > > > > Cortex-A76: 1165522 > > > > Cortex-A72: 1319367 > > > > Cortex-A57: 1319537 > > > > Cortex-A55: 1530923 > > > > > > > > *Errata Description:* > > > > > > > > A speculative Address Translation (AT) instruction
translates using
> > > > registers that are associated with an out-of-
context
translation
> > > > regime and caches the resulting translation in the
TLB.
A subsequent
> > > > translation request that is generated when the
out-
of-context
> > > > translation regime is current uses the previous
cached
TLB entry
> > > > producing an incorrect virtual to physical mapping. > > > > > > > > *Probable solution is to implement below fix in
context.S file:*
> > > > > > > > *During ELx (1 or 2) context save:* > > > > > > > > ·Operate TCR_ELx(1/2) to disable page table walk
by
operating EPD bits
> > > > > > > > oThis will avoid any page table walk for S-EL1 or S-
EL2.
This will
> > > > help in avoiding caching of translations in TLB > > > > > > > > for S-EL1/S-EL2 in EL3. > > > > > > > > ·Save all system registers (which is already
available)
except TCR
> > > > > > > > ·Clear EPD bits of TCR and then save. (Just to
enable
PTW during
> > > > restore context). > > > > > > > > *During ELx (1 or 2) context restore:* > > > > > > > > * Operate TCR_ELx(1/2) to disable page table
walk
by operating EPD bits
> > > > * Restore all system registers (which are saved
during context save)
> > > > except TCR register. > > > > * Restore TCR_ELx(1/2) register (which enable
back
PTW).
> > > > > > > > With above we ensured that there will be no page
table walk for S-EL1
> > > > and S-EL2 in EL3. > > > > > > > > is this proper other way to fix this problem? Need
some suggestion/use
> > > > cases where and all we need this workaround in
TF-A
code.
> > > > > > > > Thanks > > > > > > > > Manish Badarkhe > > > > > > > > IMPORTANT NOTICE: The contents of this email
and
any attachments are
> > > > confidential and may also be privileged. If you are
not
the intended
> > > > recipient, please notify the sender immediately
and
do not disclose
> > > > the contents to any other person, use it for any
purpose, or store or
> > > > copy the information in any medium. Thank you. > > > > > > > > > > -- > > > TF-A mailing list > > > TF-A@lists.trustedfirmware.org > > >
https://lists.trustedfirmware.org/mailman/listinfo/tf-a
> > > > > > IMPORTANT NOTICE: The contents of this email and
any
attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
> > > -- > > > TF-A mailing list > > > TF-A@lists.trustedfirmware.org > > > https://lists.trustedfirmware.org/mailman/listinfo/tf-a > > > > > > IMPORTANT NOTICE: The contents of this email and any
attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
> > > -- > > > TF-A mailing list > > > TF-A@lists.trustedfirmware.org > > > https://lists.trustedfirmware.org/mailman/listinfo/tf-a > > > > > > IMPORTANT NOTICE: The contents of this email and any
attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
> > > > > > IMPORTANT NOTICE: The contents of this email and any
attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
> > > > IMPORTANT NOTICE: The contents of this email and any
attachments
are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
-- TF-A mailing list TF-A@lists.trustedfirmware.org https://lists.trustedfirmware.org/mailman/listinfo/tf-a
-- TF-A mailing list TF-A@lists.trustedfirmware.org https://lists.trustedfirmware.org/mailman/listinfo/tf-a
Thanks Soby. This makes things clearer.
The benefit of this skip may not justify the additional lookup and complexity needed to pull in the cpu_ops >>framework for the runtime check
[RK]Agree. I was thinking along the lines of having the CPU reset function set a per-cpu or global flag that tells the runtime function whether or not the errata needs to be applied. At runtime, you unconditionally call the errata apply function from context save/restore(when the compile time flag is enabled) and the cpu_ops function can decide based on the flag whether or not the errata needs to be applied. This seems simple(no look ups required other than during cpu reset) although we may not know for sure until it is implemented that way. Anyway, i'll wait for Manish's solution.
So for a simple EL1/EL2 SMC call into EL3 and return, this errata will not apply
[RK] Thanks for clarifying. I did not get this from looking at the errata document. I took a closer look at the kvm patches and it does agree with your assessment. For TF-A, you probably don't require the work around during context save since all we are doing is reading registers and saving them off and there cannot be any inconsistent state. Context-restore is the window during which you can have inconsistent register state at which time you can apply the workaround. Is there a reason to do it during context-save as well(other than it being clean/symmetrical with context-restore)?
Thanks Raghu
On 5/4/20 7:38 AM, Soby Mathew wrote:
Hi Raghu There would always be a build option that would disable this workaround. The runtime check based on MIDR would be an add-on on top of that. So for big-Little systems, if any one of the CPU is affected, the platform would enable the workaround. The runtime check could potentially skip the unaffected CPU in the system. The benefit of this skip may not justify the additional lookup and complexity needed to pull in the cpu_ops framework for the runtime check. Manish is doing some thinking in this direction.
For the unaffected platform, this runtime check wouldn't matter as the build option to enable the workaround would be disabled.
This is what we currently understand, the errata only happens during a context switch for the out of context translation regime. So for a simple EL1/EL2 SMC call into EL3 and return, this errata will not apply as the EL2/EL1 context is not changed. The TLB caching due to speculation in EL3 should not really matter as the lower EL context on return will be same as on entry.
The context switch happens for a NS ->EL3 -> S-EL1 SMC call. Hence the reasoning behind adding it to context save and restore sequence as this is only invoked for such a switch.
This is as per my understanding of the problem.
Best Regards Soby Mathew
-----Original Message----- From: Raghu K raghu.ncstate@icloud.com Sent: 01 May 2020 19:22 To: Soby Mathew Soby.Mathew@arm.com; Manish Badarkhe Manish.Badarkhe@arm.com Cc: tf-a@lists.trustedfirmware.org Subject: Re: [TF-A] Need input on Errata implementation
Thanks Soby. If the fix involves looping through a list of CPU's, the approach may be fine. I missed the case where you could have multiple CPU MIDR's on the same SoC(big.little's???). The ideal solution would be SoC's with CPU's *without* this errata have zero penalty(no checking for the errata, perhaps by being compiled out), and for SoC's that contain CPU's with the errata have a fairly quick check or an always compiled in errata fix. A combination of platform specific compile time flags and/or per-cpu variables or global variables should be able to achieve this.
Also do you have any insights on if the workaround needs to be applied in the scenario i mentioned below(simple SMC call without any call to context save/restore)? I'm trying to understand why the errata applies only if we hit the context-save/restore path.
-Raghu
On 5/1/20 7:02 AM, Soby Mathew wrote:
-----Original Message----- From: TF-A tf-a-bounces@lists.trustedfirmware.org On Behalf Of Raghu Krishnamurthy via TF-A Sent: 30 April 2020 02:33 To: Manish Badarkhe Manish.Badarkhe@arm.com; tf- a@lists.trustedfirmware.org Cc: nd nd@arm.com Subject: Re: [TF-A] Need input on Errata implementation
Hi Manish,
Really appreciate you for taking time to respond to my concerns/questions.
What about this situation? NS-EL2 makes an SMC call to EL3 to get some basic information like GET_SOC_INFORMATION. This is a simple SMC and there is no call to context save or context restore. During the SMC call, if there is a speculative AT instruction on a lower EL(say NS-EL2), there could be a bad cached translation. Do you not need to apply the errata in this situation ? If not, why?
We can't simply apply this errata on reset and just leave the system.
[RK]Totally agree. See CPU_E_HANDLER_FUNC. It is not necessary that cpu_ops are only called during reset and power down. CPU_E_HANDLER_FUNC is called at runtime due EA's.
We thought of taking different approach for this errata
implementation >>where anybody disable this workaround using macro as this errata is >>applicable for most of the CPUs (by default enabled) and can't be >>placed in cpu_ops.
[RK]This is a poor approach in my view. Most CPU's is not all CPU's. The reason the errata framework exists is to apply CPU specific erratas by checking for them dynamically. Different stepping's of the same CPU's may or may not have the errata and typically you check the MIDR to know if the errata applies or not. Linux does not apply the errata to all CPU's since "most" CPU's have the issue. They check for its existence at runtime and only then apply it. TF-A should not hold itself to
a lower standard.
Hi Raghu I guess this depends on what the errata workaround involves. Since this
workaround applies bit setting on an out of context register, it was not expected to affect the EL3 execution performance (or the lower level EL because the bits are restored on return). Also it was thought that the act of searching through the list of compiled CPUs and checking if the workaround is applicable might be more detrimental than the unilateral application of the workaround for this case (assuming no extra barriers are added since the code path it is inserted in have them already later in the sequence).
But I agree it is more elegant to have this coupled into CPU_OPS framework. I
think Manish has some ideas for this.
Best Regards Soby Mathew
-Raghu
On 4/29/20 1:35 AM, Manish Badarkhe wrote:
Hi Raghu
Just to add/correct one more thing from my previous emails that this errata
workaround proposed is
applied to both normal and secure world switches to EL3.
Thanks Manish Badarkhe
On 29/04/2020, 12:25, "TF-A on behalf of Manish Badarkhe via TF-A" <tf-a-
bounces@lists.trustedfirmware.org on behalf of tf-a@lists.trustedfirmware.org> wrote:
Hi Raghu On 29/04/2020, 02:00, "Raghu K" <raghu.ncstate@icloud.com> wrote: Hi Manish, Thanks. >> we don’t have any AT instances in minimum execution
window after
context switching from S-EL(1/2)
>> to EL3 and before updating TCR register. 1) What is the minimum execution window? Does that not
change based
on micro-architecture?
Not sure about exact minimum execution window. IMO, it really
depend
upon when "context_save" gets called after
entering into EL3 from S-EL1/2. It may changed upon micro-architecture.
Need some experts comment here.
2) Do we know that the "execution window" is exactly the
same for all
the CPU's this errata applies to?
It may be but we should not worry on that if we don’t have any
AT
instruction execution in that window.
Also, it appears we are only talking about switching from S-EL1/2 to
EL3.
The same issue can happen when you go from NS-EL1/EL2 to EL3 as well. There also seems to be an assumption in the patch you submitted that this errata happens only during a so called context-switch. From my reading, the cortex- Ax errata notices don't limit the errata to occur only
during "context-switches"
in the "conditions" section and can occur while executing ANY code, although the work around section does muddy the waters a bit.
In Linux, at NS-EL2 this workaround is already in place. Hence
we just
thought of considering cases from Secure EL side to put this workaround.
Yes, errata should not limit to particular conditional section
but this
particular errata is not straight-forward like another errata placed in the code currently. We can't simply apply this errata on reset and just
leave the system.
Back to problem, AT instruction speculative execution using
out-of-
context regime that results in page table walk and generate the incorrect
translation which are cached in TLB. To avoid this issue we
thought of
disabling PTW for that particular EL.
for e.g. If AT instruction execution for EL1 present in EL3
then we have to
make sure speculative behaviour of this AT should not result in incorrect translation cached in TLB. If system is always in EL3 (if we loop-in in EL3 always without going back and forth to/from lower EL) then in that case
there is no need of this workaround. Hence we thought to put this workaround over boundary context
of
context switches. When "context save" (close to EL3 entry) happened we meticulously save all EL system registers (S-EL1/S-EL2) with PTW disabled and continue EL3 execution with PTW disabled ensuring we should not cache any incorrect translation for (S-EL1/S-EL2) and during "context restore" (i.e. close to EL3 exit) again we disabled PTW, restore all system registers for EL (S-EL1/S- EL2) except TCR and then restore TCR.
3) Has there been any work done to actually reproduce this
issue and
also to see that this actually fixes the issue?
No this issue is hard to reproduce. 4) Has the CPU errata framework(cpu_ops etc.) been
considered to
possibly implement the errata? Sprinkling erratas through common framework code does not seem like a good idea.
We thought of taking different approach for this errata
implementation
where anybody disable this workaround using macro as this errata is applicable for most of the CPUs (by default enabled) and can't be placed in cpu_ops.
Thanks Raghu Thanks Manish Badarkhe On 4/28/20 1:44 AM, Manish Badarkhe wrote: > Hi Raghu > > Please see my replies inline. > > Regards > Manish Badarkhe > > On 28/04/2020, 11:29, "Raghu Krishnamurthy"
raghu.ncstate@icloud.com wrote:
> > Hi Manish, > > Understood. > > >>Hence before entering in EL3, we ensured that PTW is
disabled
(at
> context save) > > The context save and restore functions are executed in EL3. So
how
are
> you disabling PTW before entering EL3 ? > > Yes, I put it wrongly. We thought "context_save/restore"
is best place
to disable PTW without much affecting the
> code because we don’t have any AT instances in minimum
execution
window after context switching from S-EL(1/2)
> to EL3 and before updating TCR register. > > -Raghu > > Thanks > Manish Badarkhe > > On 4/27/20 10:53 PM, Manish Badarkhe wrote: > > Hi Raghu > > > > This workaround is specifically need for speculative AT
instruction
behaviour in out of context regime.
> > That means executing AT instruction for lower ELs (S-EL1/S-EL2)
in
higher EL i.e. EL3.
> > > > Behaviour of AT instruction is unaltered when it get executed
in
same regime (when AT instruction executed for same EL
> > where it is executing) and there is no possibility to execute AT
instruction for higher EL in lower EL.
> > > > Hence before entering in EL3, we ensured that PTW is disabled
(at
context save) and restore PTW back during
> > exit of EL3. (at context restore). > > > > Thanks > > Manish Badarkhe > > > > On 28/04/2020, 01:23, "Raghu K" <raghu.ncstate@icloud.com>
wrote:
> > > > Hi Manish, > > > > >>Hence proposed solution will work as it is > > > > [RK]If you are sure go ahead. I'm not convinced, but that may
be because
> > i don't understand the errata fully/correctly. > > > > >>This workaround is very specific during context switching > > > > [RK] Context switching has many meanings depending on the
context(OS,
> > hypervisor, TF-A world switch etc). The errata document i
saw
does not
> > elaborate on this. Perhaps clarifying this will help
understand
why the
> > solution you proposed will work. > > > > The solution below in points 2 and 3 have the same problem
on
entry and
> > exit, mentioned in my first email. Before you call > > el1_sysregs_context_save, an AT instruction could have
speculatively
> > executed through speculation of branches that occur
BEFORE
you call this
> > function, when TCR still has the enable bit set. The fact that
you
don't
> > have an AT instruction in the context save routine or any
routine for
> > that matter, does not guarantee that the hardware did not
speculate
> > through some other means to reach an AT instruction. The
same applies to
> > the context_restore routines. There is no guarantee right
after
you
> > finish the restore routing(and hence TCR has the enable bit
set),
that
> > the CPU cannot speculate to an AT instruction. > > So i'm not clear how you can say for certain that there was
no
> > speculative AT instruction with the proposal below. > > > > Thanks > > Raghu > > > > On 4/27/20 10:08 AM, Manish Badarkhe wrote: > > > Hi All, > > > > > > Just update/correct details. > > > > > > Thanks > > > Manish Badarkhe > > > > > > On 27/04/2020, 22:13, "TF-A on behalf of Manish Badarkhe
via TF-A" <tf-a-bounces@lists.trustedfirmware.org on behalf of tf- a@lists.trustedfirmware.org> wrote:
> > > > > > Hi Raghu > > > > > > Please ignore my answer on question 2. > > > > > > With internal discussion came to below conclusion: > > > 1. This workaround is very specific during context
switching.
> > > 2 . If you check in context save routine
(el1_sysregs_context_save or el2_sysregs_context_save),
> > > As per proposed solution, First step performed is to
disable page table walk and we don’t have
> > > any AT instruction execution in context save routine. > > > This ensures that there will be no possibility of
speculative AT instruction execution without TCR update.
> > > 3. If you check in context restore routine
(el1_sysregs_context_restore or el2_sysregs_context_restore),
> > > As per proposed solution, first step performed is to
disable page table walk and we don’t have any
> > > AT instruction execution in context restore routine. > > > This ensures that there will be no possibility of
speculative AT instruction execution without TCR update.
> > > > > > Hence proposed solution will work as it is ensuring no
caching of translations in TLB while speculative AT instruction execution.
> > > > > > Thanks > > > Manish Badarkhe > > > > > > On 27/04/2020, 13:38, "TF-A on behalf of Manish
Badarkhe
via TF-A" <tf-a-bounces@lists.trustedfirmware.org on behalf of tf- a@lists.trustedfirmware.org> wrote:
> > > > > > Hi Raghu > > > > > > Please see my answers inline > > > > > > On 25/04/2020, 06:38, "TF-A on behalf of Raghu K via
TF-
A" <tf-a-bounces@lists.trustedfirmware.org on behalf of tf- a@lists.trustedfirmware.org> wrote:
> > > > > > Hi Manish, > > > > > > Before I agree or disagree with the suggested fix, the
following would
> > > be interesting to know/discuss. Please feel free to
correct me if i've
> > > misunderstood something. > > > > > > 1) Are "speculative" AT instructions subject to
TCR_ELx
control bits for
> > > all the listed CPU's? I imagine the answer is yes but
would be good to
> > > get confirmation. I could not find any evidence in
the
instruction
> > > description or psuedocode in the ARMv8 ARM. It is
possible to play many
> > > tricks on speculative execution of instructions such
as
skipping checks
> > > and doing them only when the CPU knows the
instruction will be
> > > committed. If this is the case, changing TCR_ELx bits
may not work. The
> > > errata document is vague about how to fix it. > > > > > > The speculative AT instruction may behave as you
mentioned. We need more
> > > opinion on this. > > > Proposed fix I mentioned by referring linux
workaround
for the same errata.
> > > Linux workaround is available in mainline kernel as
below:
> > >
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/co mmit/?h= v5.7-rc3&id=bd227553ad5077f21ddb382dcd910ba46181805a
> > > > > > 2) Assuming the answer to question 1 is yes, your
proposal may not work
> > > as is. In the worst case, as soon as you enter EL3, the
very first thing
> > > that may happen, before you ever operate/write to
TCR_ELx, is a
> > > speculative AT instruction that caches a bad
translation
in the TLB's.
> > > The same thing can happen on the exit path. As
soon as
you restore the
> > > TCR_ELx register, the first thing that can happen is a
speculative AT
> > > that caches a bad translation. However, the el3_exit
path does have DSB
> > > before ERET, so we will not speculate to an AT
instruction if there are
> > > no branches between the instruction that sets
TCR_ELx
and the ERET.
> > > Somewhere in between, it looks like we will need a
TLBI NSH to be
> > > certain there are no bad translation cached. This
obviously has a
> > > potential performance cost on the lower EL's. Every
entry into EL3
> > > flushes the TLB for lower EL's. > > > > > > Yes, this seems to be valid case during entry and exit
path.
> > > I am not quite sure in that case where we need to
avoid
PTW.
> > > Also "TLBI NSH" works but it may cause performance
issue.
> > > Need some more opinion/thoughts on this. > > > > > > Just thinking, can sequence mentioned for context
save
does not ensure that
> > > PTW is disabled? > > > Something as below as last step in ELx(1/2) context
save
(elaborated more):
> > > > ·Save TCR register with PTW enable (EPD=0). (Just
to
enable PTW during
> > > > restore context). Do not operate TCR_EL1x register
here just save its value to restore.
> > > > This ensures that during entry in EL3 there will be
no
chance of PTW
> > > >. while executing AT instruction. > > > > > > Thanks > > > > > > Raghu > > > > > > Thanks > > > Manish Badarkhe > > > > > > On 4/24/20 2:56 AM, Manish Badarkhe via TF-A
wrote:
> > > > > > > > Hi All > > > > > > > > We are trying to implement errata which is
applicable
for below CPUs:
> > > > > > > > <CPUs> : <Errata No.> > > > > > > > > Cortex-A53: 1530924 > > > > > > > > Cortex-A76: 1165522 > > > > Cortex-A72: 1319367 > > > > Cortex-A57: 1319537 > > > > Cortex-A55: 1530923 > > > > > > > > *Errata Description:* > > > > > > > > A speculative Address Translation (AT) instruction
translates using
> > > > registers that are associated with an out-of-
context
translation
> > > > regime and caches the resulting translation in the
TLB.
A subsequent
> > > > translation request that is generated when the
out-
of-context
> > > > translation regime is current uses the previous
cached
TLB entry
> > > > producing an incorrect virtual to physical mapping. > > > > > > > > *Probable solution is to implement below fix in
context.S file:*
> > > > > > > > *During ELx (1 or 2) context save:* > > > > > > > > ·Operate TCR_ELx(1/2) to disable page table walk
by
operating EPD bits
> > > > > > > > oThis will avoid any page table walk for S-EL1 or S-
EL2.
This will
> > > > help in avoiding caching of translations in TLB > > > > > > > > for S-EL1/S-EL2 in EL3. > > > > > > > > ·Save all system registers (which is already
available)
except TCR
> > > > > > > > ·Clear EPD bits of TCR and then save. (Just to
enable
PTW during
> > > > restore context). > > > > > > > > *During ELx (1 or 2) context restore:* > > > > > > > > * Operate TCR_ELx(1/2) to disable page table
walk
by operating EPD bits
> > > > * Restore all system registers (which are saved
during context save)
> > > > except TCR register. > > > > * Restore TCR_ELx(1/2) register (which enable
back
PTW).
> > > > > > > > With above we ensured that there will be no page
table walk for S-EL1
> > > > and S-EL2 in EL3. > > > > > > > > is this proper other way to fix this problem? Need
some suggestion/use
> > > > cases where and all we need this workaround in
TF-A
code.
> > > > > > > > Thanks > > > > > > > > Manish Badarkhe > > > > > > > > IMPORTANT NOTICE: The contents of this email
and
any attachments are
> > > > confidential and may also be privileged. If you are
not
the intended
> > > > recipient, please notify the sender immediately
and
do not disclose
> > > > the contents to any other person, use it for any
purpose, or store or
> > > > copy the information in any medium. Thank you. > > > > > > > > > > -- > > > TF-A mailing list > > > TF-A@lists.trustedfirmware.org > > >
https://lists.trustedfirmware.org/mailman/listinfo/tf-a
> > > > > > IMPORTANT NOTICE: The contents of this email and
any
attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
> > > -- > > > TF-A mailing list > > > TF-A@lists.trustedfirmware.org > > > https://lists.trustedfirmware.org/mailman/listinfo/tf-a > > > > > > IMPORTANT NOTICE: The contents of this email and any
attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
> > > -- > > > TF-A mailing list > > > TF-A@lists.trustedfirmware.org > > > https://lists.trustedfirmware.org/mailman/listinfo/tf-a > > > > > > IMPORTANT NOTICE: The contents of this email and any
attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
> > > > > > IMPORTANT NOTICE: The contents of this email and any
attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
> > > > IMPORTANT NOTICE: The contents of this email and any
attachments
are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
-- TF-A mailing list TF-A@lists.trustedfirmware.org https://lists.trustedfirmware.org/mailman/listinfo/tf-a
-- TF-A mailing list TF-A@lists.trustedfirmware.org https://lists.trustedfirmware.org/mailman/listinfo/tf-a
Hi Raghu,
Please see my replies inline.
Thanks Manish Badarkhe
On 04/05/2020, 21:42, "Raghu K" raghu.ncstate@icloud.com wrote:
Thanks Soby. This makes things clearer.
>>The benefit of this skip may not justify the additional lookup and complexity needed to pull in the cpu_ops >>framework for the runtime check
[RK]Agree. I was thinking along the lines of having the CPU reset function set a per-cpu or global flag that tells the runtime function whether or not the errata needs to be applied. At runtime, you unconditionally call the errata apply function from context save/restore(when the compile time flag is enabled) and the cpu_ops function can decide based on the flag whether or not the errata needs to be applied. This seems simple(no look ups required other than during cpu reset) although we may not know for sure until it is implemented that way. Anyway, i'll wait for Manish's solution.
As of now we are thinking to push the patch with compile time option which is default disabled (as phase I) and then will work on design and implementation of runtime check as phase II which will help to cover such kind of erratum cases.
>>So for a simple EL1/EL2 SMC call into EL3 and return, this errata will not apply
[RK] Thanks for clarifying. I did not get this from looking at the errata document. I took a closer look at the kvm patches and it does agree with your assessment. For TF-A, you probably don't require the work around during context save since all we are doing is reading registers and saving them off and there cannot be any inconsistent state. Context-restore is the window during which you can have inconsistent register state at which time you can apply the workaround. Is there a reason to do it during context-save as well(other than it being clean/symmetrical with context-restore)?
Yes, we need to put this workaround only during restore.
Thanks Raghu
On 5/4/20 7:38 AM, Soby Mathew wrote: > Hi Raghu > There would always be a build option that would disable this workaround. The runtime check based on MIDR would be an add-on on top of that. So for big-Little systems, if any one of the CPU is affected, the platform would enable the workaround. The runtime check could potentially skip the unaffected CPU in the system. The benefit of this skip may not justify the additional lookup and complexity needed to pull in the cpu_ops framework for the runtime check. Manish is doing some thinking in this direction. > > For the unaffected platform, this runtime check wouldn't matter as the build option to enable the workaround would be disabled. > > This is what we currently understand, the errata only happens during a context switch for the out of context translation regime. So for a simple EL1/EL2 SMC call into EL3 and return, this errata will not apply as the EL2/EL1 context is not changed. The TLB caching due to speculation in EL3 should not really matter as the lower EL context on return will be same as on entry. > > The context switch happens for a NS ->EL3 -> S-EL1 SMC call. Hence the reasoning behind adding it to context save and restore sequence as this is only invoked for such a switch. > > This is as per my understanding of the problem. > > Best Regards > Soby Mathew > >> -----Original Message----- >> From: Raghu K raghu.ncstate@icloud.com >> Sent: 01 May 2020 19:22 >> To: Soby Mathew Soby.Mathew@arm.com; Manish Badarkhe >> Manish.Badarkhe@arm.com >> Cc: tf-a@lists.trustedfirmware.org >> Subject: Re: [TF-A] Need input on Errata implementation >> >> Thanks Soby. If the fix involves looping through a list of CPU's, the approach >> may be fine. I missed the case where you could have multiple CPU MIDR's on >> the same SoC(big.little's???). The ideal solution would be SoC's with CPU's >> *without* this errata have zero penalty(no checking for the errata, perhaps by >> being compiled out), and for SoC's that contain CPU's with the errata have a >> fairly quick check or an always compiled in errata fix. A combination of >> platform specific compile time flags and/or per-cpu variables or global >> variables should be able to achieve this. >> >> Also do you have any insights on if the workaround needs to be applied in the >> scenario i mentioned below(simple SMC call without any call to context >> save/restore)? I'm trying to understand why the errata applies only if we hit >> the context-save/restore path. >> >> -Raghu >> >> On 5/1/20 7:02 AM, Soby Mathew wrote: >>>> -----Original Message----- >>>> From: TF-A tf-a-bounces@lists.trustedfirmware.org On Behalf Of >>>> Raghu Krishnamurthy via TF-A >>>> Sent: 30 April 2020 02:33 >>>> To: Manish Badarkhe Manish.Badarkhe@arm.com; tf- >>>> a@lists.trustedfirmware.org >>>> Cc: nd nd@arm.com >>>> Subject: Re: [TF-A] Need input on Errata implementation >>>> >>>> Hi Manish, >>>> >>>> Really appreciate you for taking time to respond to my concerns/questions. >>>> >>>> What about this situation? NS-EL2 makes an SMC call to EL3 to get >>>> some basic information like GET_SOC_INFORMATION. This is a simple SMC >>>> and there is no call to context save or context restore. During the >>>> SMC call, if there is a speculative AT instruction on a lower EL(say >>>> NS-EL2), there could be a bad cached translation. Do you not need to >>>> apply the errata in this situation ? If not, why? >>>> >>>> >>We can't simply apply this errata on reset and just leave the system. >>>> >>>> [RK]Totally agree. See CPU_E_HANDLER_FUNC. It is not necessary that >>>> cpu_ops are only called during reset and power down. >>>> CPU_E_HANDLER_FUNC is called at runtime due EA's. >>>> >>>> >>We thought of taking different approach for this errata >>>> implementation >>where anybody disable this workaround using macro as >>>> this errata is >>applicable for most of the CPUs (by default enabled) >>>> and can't be >>placed in cpu_ops. >>>> >>>> [RK]This is a poor approach in my view. Most CPU's is not all CPU's. >>>> The reason the errata framework exists is to apply CPU specific >>>> erratas by checking for them dynamically. Different stepping's of the >>>> same CPU's may or may not have the errata and typically you check the >>>> MIDR to know if the errata applies or not. Linux does not apply the >>>> errata to all CPU's since "most" CPU's have the issue. They check for >>>> its existence at runtime and only then apply it. TF-A should not hold itself to >> a lower standard. >>> Hi Raghu >>> I guess this depends on what the errata workaround involves. Since this >> workaround applies bit setting on an out of context register, it was not >> expected to affect the EL3 execution performance (or the lower level EL >> because the bits are restored on return). Also it was thought that the act of >> searching through the list of compiled CPUs and checking if the workaround is >> applicable might be more detrimental than the unilateral application of the >> workaround for this case (assuming no extra barriers are added since the code >> path it is inserted in have them already later in the sequence). >>> But I agree it is more elegant to have this coupled into CPU_OPS framework. I >> think Manish has some ideas for this. >>> Best Regards >>> Soby Mathew >>> >>>> -Raghu >>>> >>>> On 4/29/20 1:35 AM, Manish Badarkhe wrote: >>>>> Hi Raghu >>>>> >>>>> Just to add/correct one more thing from my previous emails that >>>>> this errata >>>> workaround proposed is >>>>> applied to both normal and secure world switches to EL3. >>>>> >>>>> Thanks >>>>> Manish Badarkhe >>>>> >>>>> On 29/04/2020, 12:25, "TF-A on behalf of Manish Badarkhe via TF-A" >>>>> <tf-a- >>>> bounces@lists.trustedfirmware.org on behalf of >>>> tf-a@lists.trustedfirmware.org> >>>> wrote: >>>>> Hi Raghu >>>>> >>>>> On 29/04/2020, 02:00, "Raghu K" raghu.ncstate@icloud.com wrote: >>>>> >>>>> Hi Manish, >>>>> >>>>> Thanks. >>>>> >>>>> >> we don’t have any AT instances in minimum execution >>>>> window after >>>> context switching from S-EL(1/2) >>>>> >> to EL3 and before updating TCR register. >>>>> >>>>> 1) What is the minimum execution window? Does that not >>>>> change based >>>> on micro-architecture? >>>>> Not sure about exact minimum execution window. IMO, it really >>>>> depend >>>> upon when "context_save" gets called after >>>>> entering into EL3 from S-EL1/2. It may changed upon micro-architecture. >>>> Need some experts comment here. >>>>> 2) Do we know that the "execution window" is exactly the >>>>> same for all >>>> the CPU's this errata applies to? >>>>> It may be but we should not worry on that if we don’t have any >>>>> AT >>>> instruction execution in that window. >>>>> Also, it appears we are only talking about switching from S-EL1/2 to >> EL3. >>>> The same issue can happen when you go from NS-EL1/EL2 to EL3 as well. >>>> There also seems to be an assumption in the patch you submitted that >>>> this errata happens only during a so called context-switch. From my >>>> reading, the cortex- Ax errata notices don't limit the errata to occur only >> during "context-switches" >>>> in the "conditions" section and can occur while executing ANY code, >>>> although the work around section does muddy the waters a bit. >>>>> In Linux, at NS-EL2 this workaround is already in place. Hence >>>>> we just >>>> thought of considering cases from Secure EL side to put this workaround. >>>>> Yes, errata should not limit to particular conditional section >>>>> but this >>>> particular errata is not straight-forward like another errata placed >>>> in the code currently. We can't simply apply this errata on reset and just >> leave the system. >>>>> Back to problem, AT instruction speculative execution using >>>>> out-of- >>>> context regime that results in page table walk and generate the >>>> incorrect >>>>> translation which are cached in TLB. To avoid this issue we >>>>> thought of >>>> disabling PTW for that particular EL. >>>>> for e.g. If AT instruction execution for EL1 present in EL3 >>>>> then we have to >>>> make sure speculative behaviour of this AT should not result in >>>> incorrect translation cached in TLB. If system is always in EL3 (if >>>> we loop-in in EL3 always without going back and forth to/from lower >>>> EL) then in that case >>>>> there is no need of this workaround. >>>>> Hence we thought to put this workaround over boundary context >>>>> of >>>> context switches. When "context save" (close to EL3 entry) happened >>>> we meticulously save all EL system registers (S-EL1/S-EL2) with PTW >>>> disabled and continue EL3 execution with PTW disabled ensuring we >>>> should not cache any incorrect translation for (S-EL1/S-EL2) and >>>> during "context restore" (i.e. close to EL3 exit) again we disabled >>>> PTW, restore all system registers for EL (S-EL1/S- >>>> EL2) except TCR and then restore TCR. >>>>> 3) Has there been any work done to actually reproduce this >>>>> issue and >>>> also to see that this actually fixes the issue? >>>>> No this issue is hard to reproduce. >>>>> >>>>> 4) Has the CPU errata framework(cpu_ops etc.) been >>>>> considered to >>>> possibly implement the errata? Sprinkling erratas through common >>>> framework code does not seem like a good idea. >>>>> We thought of taking different approach for this errata >>>>> implementation >>>> where anybody disable this workaround using macro as this errata is >>>> applicable for most of the CPUs (by default enabled) and can't be >>>> placed in cpu_ops. >>>>> Thanks >>>>> Raghu >>>>> >>>>> Thanks >>>>> Manish Badarkhe >>>>> >>>>> On 4/28/20 1:44 AM, Manish Badarkhe wrote: >>>>> > Hi Raghu >>>>> > >>>>> > Please see my replies inline. >>>>> > >>>>> > Regards >>>>> > Manish Badarkhe >>>>> > >>>>> > On 28/04/2020, 11:29, "Raghu Krishnamurthy" >>>> raghu.ncstate@icloud.com wrote: >>>>> > >>>>> > Hi Manish, >>>>> > >>>>> > Understood. >>>>> > >>>>> > >>Hence before entering in EL3, we ensured that PTW is >> disabled >>>> (at >>>>> > context save) >>>>> > >>>>> > The context save and restore functions are executed in EL3. So >> how >>>> are >>>>> > you disabling PTW before entering EL3 ? >>>>> > >>>>> > Yes, I put it wrongly. We thought "context_save/restore" >>>>> is best place >>>> to disable PTW without much affecting the >>>>> > code because we don’t have any AT instances in minimum >>>>> execution >>>> window after context switching from S-EL(1/2) >>>>> > to EL3 and before updating TCR register. >>>>> > >>>>> > -Raghu >>>>> > >>>>> > Thanks >>>>> > Manish Badarkhe >>>>> > >>>>> > On 4/27/20 10:53 PM, Manish Badarkhe wrote: >>>>> > > Hi Raghu >>>>> > > >>>>> > > This workaround is specifically need for speculative AT >> instruction >>>> behaviour in out of context regime. >>>>> > > That means executing AT instruction for lower ELs (S-EL1/S-EL2) >> in >>>> higher EL i.e. EL3. >>>>> > > >>>>> > > Behaviour of AT instruction is unaltered when it get executed >> in >>>> same regime (when AT instruction executed for same EL >>>>> > > where it is executing) and there is no possibility to execute AT >>>> instruction for higher EL in lower EL. >>>>> > > >>>>> > > Hence before entering in EL3, we ensured that PTW is disabled >> (at >>>> context save) and restore PTW back during >>>>> > > exit of EL3. (at context restore). >>>>> > > >>>>> > > Thanks >>>>> > > Manish Badarkhe >>>>> > > >>>>> > > On 28/04/2020, 01:23, "Raghu K" raghu.ncstate@icloud.com >>>> wrote: >>>>> > > >>>>> > > Hi Manish, >>>>> > > >>>>> > > >>Hence proposed solution will work as it is >>>>> > > >>>>> > > [RK]If you are sure go ahead. I'm not convinced, but that may >>>> be because >>>>> > > i don't understand the errata fully/correctly. >>>>> > > >>>>> > > >>This workaround is very specific during context switching >>>>> > > >>>>> > > [RK] Context switching has many meanings depending on the >>>> context(OS, >>>>> > > hypervisor, TF-A world switch etc). The errata document i >> saw >>>> does not >>>>> > > elaborate on this. Perhaps clarifying this will help >> understand >>>> why the >>>>> > > solution you proposed will work. >>>>> > > >>>>> > > The solution below in points 2 and 3 have the same problem >> on >>>> entry and >>>>> > > exit, mentioned in my first email. Before you call >>>>> > > el1_sysregs_context_save, an AT instruction could have >>>> speculatively >>>>> > > executed through speculation of branches that occur >> BEFORE >>>> you call this >>>>> > > function, when TCR still has the enable bit set. The fact that >> you >>>> don't >>>>> > > have an AT instruction in the context save routine or any >>>> routine for >>>>> > > that matter, does not guarantee that the hardware did not >>>> speculate >>>>> > > through some other means to reach an AT instruction. The >>>> same applies to >>>>> > > the context_restore routines. There is no guarantee right >> after >>>> you >>>>> > > finish the restore routing(and hence TCR has the enable bit >> set), >>>> that >>>>> > > the CPU cannot speculate to an AT instruction. >>>>> > > So i'm not clear how you can say for certain that there was >> no >>>>> > > speculative AT instruction with the proposal below. >>>>> > > >>>>> > > Thanks >>>>> > > Raghu >>>>> > > >>>>> > > On 4/27/20 10:08 AM, Manish Badarkhe wrote: >>>>> > > > Hi All, >>>>> > > > >>>>> > > > Just update/correct details. >>>>> > > > >>>>> > > > Thanks >>>>> > > > Manish Badarkhe >>>>> > > > >>>>> > > > On 27/04/2020, 22:13, "TF-A on behalf of Manish Badarkhe >>>> via TF-A" <tf-a-bounces@lists.trustedfirmware.org on behalf of tf- >>>> a@lists.trustedfirmware.org> wrote: >>>>> > > > >>>>> > > > Hi Raghu >>>>> > > > >>>>> > > > Please ignore my answer on question 2. >>>>> > > > >>>>> > > > With internal discussion came to below conclusion: >>>>> > > > 1. This workaround is very specific during context >>>> switching. >>>>> > > > 2 . If you check in context save routine >>>> (el1_sysregs_context_save or el2_sysregs_context_save), >>>>> > > > As per proposed solution, First step performed is to >>>> disable page table walk and we don’t have >>>>> > > > any AT instruction execution in context save routine. >>>>> > > > This ensures that there will be no possibility of >>>> speculative AT instruction execution without TCR update. >>>>> > > > 3. If you check in context restore routine >>>> (el1_sysregs_context_restore or el2_sysregs_context_restore), >>>>> > > > As per proposed solution, first step performed is to >>>> disable page table walk and we don’t have any >>>>> > > > AT instruction execution in context restore routine. >>>>> > > > This ensures that there will be no possibility of >>>> speculative AT instruction execution without TCR update. >>>>> > > > >>>>> > > > Hence proposed solution will work as it is ensuring no >>>> caching of translations in TLB while speculative AT instruction execution. >>>>> > > > >>>>> > > > Thanks >>>>> > > > Manish Badarkhe >>>>> > > > >>>>> > > > On 27/04/2020, 13:38, "TF-A on behalf of Manish >> Badarkhe >>>> via TF-A" <tf-a-bounces@lists.trustedfirmware.org on behalf of tf- >>>> a@lists.trustedfirmware.org> wrote: >>>>> > > > >>>>> > > > Hi Raghu >>>>> > > > >>>>> > > > Please see my answers inline >>>>> > > > >>>>> > > > On 25/04/2020, 06:38, "TF-A on behalf of Raghu K via >> TF- >>>> A" <tf-a-bounces@lists.trustedfirmware.org on behalf of tf- >>>> a@lists.trustedfirmware.org> wrote: >>>>> > > > >>>>> > > > Hi Manish, >>>>> > > > >>>>> > > > Before I agree or disagree with the suggested fix, the >>>> following would >>>>> > > > be interesting to know/discuss. Please feel free to >>>> correct me if i've >>>>> > > > misunderstood something. >>>>> > > > >>>>> > > > 1) Are "speculative" AT instructions subject to >> TCR_ELx >>>> control bits for >>>>> > > > all the listed CPU's? I imagine the answer is yes but >>>> would be good to >>>>> > > > get confirmation. I could not find any evidence in >> the >>>> instruction >>>>> > > > description or psuedocode in the ARMv8 ARM. It is >>>> possible to play many >>>>> > > > tricks on speculative execution of instructions such >> as >>>> skipping checks >>>>> > > > and doing them only when the CPU knows the >>>> instruction will be >>>>> > > > committed. If this is the case, changing TCR_ELx bits >>>> may not work. The >>>>> > > > errata document is vague about how to fix it. >>>>> > > > >>>>> > > > The speculative AT instruction may behave as you >>>> mentioned. We need more >>>>> > > > opinion on this. >>>>> > > > Proposed fix I mentioned by referring linux >> workaround >>>> for the same errata. >>>>> > > > Linux workaround is available in mainline kernel as >>>> below: >>>>> > > > >>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/co >>>> mmit/?h= v5.7-rc3&id=bd227553ad5077f21ddb382dcd910ba46181805a >>>>> > > > >>>>> > > > 2) Assuming the answer to question 1 is yes, your >>>> proposal may not work >>>>> > > > as is. In the worst case, as soon as you enter EL3, the >>>> very first thing >>>>> > > > that may happen, before you ever operate/write to >>>> TCR_ELx, is a >>>>> > > > speculative AT instruction that caches a bad >> translation >>>> in the TLB's. >>>>> > > > The same thing can happen on the exit path. As >> soon as >>>> you restore the >>>>> > > > TCR_ELx register, the first thing that can happen is a >>>> speculative AT >>>>> > > > that caches a bad translation. However, the el3_exit >>>> path does have DSB >>>>> > > > before ERET, so we will not speculate to an AT >>>> instruction if there are >>>>> > > > no branches between the instruction that sets >> TCR_ELx >>>> and the ERET. >>>>> > > > Somewhere in between, it looks like we will need a >>>> TLBI NSH to be >>>>> > > > certain there are no bad translation cached. This >>>> obviously has a >>>>> > > > potential performance cost on the lower EL's. Every >>>> entry into EL3 >>>>> > > > flushes the TLB for lower EL's. >>>>> > > > >>>>> > > > Yes, this seems to be valid case during entry and exit >> path. >>>>> > > > I am not quite sure in that case where we need to >> avoid >>>> PTW. >>>>> > > > Also "TLBI NSH" works but it may cause performance >>>> issue. >>>>> > > > Need some more opinion/thoughts on this. >>>>> > > > >>>>> > > > Just thinking, can sequence mentioned for context >> save >>>> does not ensure that >>>>> > > > PTW is disabled? >>>>> > > > Something as below as last step in ELx(1/2) context >> save >>>> (elaborated more): >>>>> > > > > ·Save TCR register with PTW enable (EPD=0). (Just >> to >>>> enable PTW during >>>>> > > > > restore context). Do not operate TCR_EL1x register >>>> here just save its value to restore. >>>>> > > > > This ensures that during entry in EL3 there will be >> no >>>> chance of PTW >>>>> > > > >. while executing AT instruction. >>>>> > > > >>>>> > > > Thanks >>>>> > > > >>>>> > > > Raghu >>>>> > > > >>>>> > > > Thanks >>>>> > > > Manish Badarkhe >>>>> > > > >>>>> > > > On 4/24/20 2:56 AM, Manish Badarkhe via TF-A >> wrote: >>>>> > > > > >>>>> > > > > Hi All >>>>> > > > > >>>>> > > > > We are trying to implement errata which is >> applicable >>>> for below CPUs: >>>>> > > > > >>>>> > > > > <CPUs> : <Errata No.> >>>>> > > > > >>>>> > > > > Cortex-A53: 1530924 >>>>> > > > > >>>>> > > > > Cortex-A76: 1165522 >>>>> > > > > Cortex-A72: 1319367 >>>>> > > > > Cortex-A57: 1319537 >>>>> > > > > Cortex-A55: 1530923 >>>>> > > > > >>>>> > > > > *Errata Description:* >>>>> > > > > >>>>> > > > > A speculative Address Translation (AT) instruction >>>> translates using >>>>> > > > > registers that are associated with an out-of- >> context >>>> translation >>>>> > > > > regime and caches the resulting translation in the >> TLB. >>>> A subsequent >>>>> > > > > translation request that is generated when the >> out- >>>> of-context >>>>> > > > > translation regime is current uses the previous >> cached >>>> TLB entry >>>>> > > > > producing an incorrect virtual to physical mapping. >>>>> > > > > >>>>> > > > > *Probable solution is to implement below fix in >>>> context.S file:* >>>>> > > > > >>>>> > > > > *During ELx (1 or 2) context save:* >>>>> > > > > >>>>> > > > > ·Operate TCR_ELx(1/2) to disable page table walk >> by >>>> operating EPD bits >>>>> > > > > >>>>> > > > > oThis will avoid any page table walk for S-EL1 or S- >> EL2. >>>> This will >>>>> > > > > help in avoiding caching of translations in TLB >>>>> > > > > >>>>> > > > > for S-EL1/S-EL2 in EL3. >>>>> > > > > >>>>> > > > > ·Save all system registers (which is already >> available) >>>> except TCR >>>>> > > > > >>>>> > > > > ·Clear EPD bits of TCR and then save. (Just to >> enable >>>> PTW during >>>>> > > > > restore context). >>>>> > > > > >>>>> > > > > *During ELx (1 or 2) context restore:* >>>>> > > > > >>>>> > > > > * Operate TCR_ELx(1/2) to disable page table >> walk >>>> by operating EPD bits >>>>> > > > > * Restore all system registers (which are saved >>>> during context save) >>>>> > > > > except TCR register. >>>>> > > > > * Restore TCR_ELx(1/2) register (which enable >> back >>>> PTW). >>>>> > > > > >>>>> > > > > With above we ensured that there will be no page >>>> table walk for S-EL1 >>>>> > > > > and S-EL2 in EL3. >>>>> > > > > >>>>> > > > > is this proper other way to fix this problem? Need >>>> some suggestion/use >>>>> > > > > cases where and all we need this workaround in >> TF-A >>>> code. >>>>> > > > > >>>>> > > > > Thanks >>>>> > > > > >>>>> > > > > Manish Badarkhe >>>>> > > > > >>>>> > > > > IMPORTANT NOTICE: The contents of this email >> and >>>> any attachments are >>>>> > > > > confidential and may also be privileged. If you are >> not >>>> the intended >>>>> > > > > recipient, please notify the sender immediately >> and >>>> do not disclose >>>>> > > > > the contents to any other person, use it for any >>>> purpose, or store or >>>>> > > > > copy the information in any medium. Thank you. >>>>> > > > > >>>>> > > > >>>>> > > > -- >>>>> > > > TF-A mailing list >>>>> > > > TF-A@lists.trustedfirmware.org >>>>> > > > >> https://lists.trustedfirmware.org/mailman/listinfo/tf-a >>>>> > > > >>>>> > > > IMPORTANT NOTICE: The contents of this email and >> any >>>> attachments are confidential and may also be privileged. If you are >>>> not the intended recipient, please notify the sender immediately and >>>> do not disclose the contents to any other person, use it for any >>>> purpose, or store or copy the information in any medium. Thank you. >>>>> > > > -- >>>>> > > > TF-A mailing list >>>>> > > > TF-A@lists.trustedfirmware.org >>>>> > > > https://lists.trustedfirmware.org/mailman/listinfo/tf-a >>>>> > > > >>>>> > > > IMPORTANT NOTICE: The contents of this email and any >>>> attachments are confidential and may also be privileged. If you are >>>> not the intended recipient, please notify the sender immediately and >>>> do not disclose the contents to any other person, use it for any >>>> purpose, or store or copy the information in any medium. Thank you. >>>>> > > > -- >>>>> > > > TF-A mailing list >>>>> > > > TF-A@lists.trustedfirmware.org >>>>> > > > https://lists.trustedfirmware.org/mailman/listinfo/tf-a >>>>> > > > >>>>> > > > IMPORTANT NOTICE: The contents of this email and any >>>> attachments are confidential and may also be privileged. If you are >>>> not the intended recipient, please notify the sender immediately and >>>> do not disclose the contents to any other person, use it for any >>>> purpose, or store or copy the information in any medium. Thank you. >>>>> > > >>>>> > > >>>>> > > IMPORTANT NOTICE: The contents of this email and any >>>> attachments are confidential and may also be privileged. If you are >>>> not the intended recipient, please notify the sender immediately and >>>> do not disclose the contents to any other person, use it for any >>>> purpose, or store or copy the information in any medium. Thank you. >>>>> > > >>>>> > >>>>> > IMPORTANT NOTICE: The contents of this email and any >>>>> attachments >>>> are confidential and may also be privileged. If you are not the >>>> intended recipient, please notify the sender immediately and do not >>>> disclose the contents to any other person, use it for any purpose, or >>>> store or copy the information in any medium. Thank you. >>>>> -- >>>>> TF-A mailing list >>>>> TF-A@lists.trustedfirmware.org >>>>> https://lists.trustedfirmware.org/mailman/listinfo/tf-a >>>>> >>>> -- >>>> TF-A mailing list >>>> TF-A@lists.trustedfirmware.org >>>> https://lists.trustedfirmware.org/mailman/listinfo/tf-a
-----Original Message----- From: Raghu K raghu.ncstate@icloud.com Sent: 04 May 2020 17:13 To: Soby Mathew Soby.Mathew@arm.com; Manish Badarkhe Manish.Badarkhe@arm.com Cc: tf-a@lists.trustedfirmware.org Subject: Re: [TF-A] Need input on Errata implementation
Thanks Soby. This makes things clearer.
The benefit of this skip may not justify the additional lookup and complexity needed to pull in the cpu_ops >>framework for the runtime check
[RK]Agree. I was thinking along the lines of having the CPU reset function set a per-cpu or global flag that tells the runtime function whether or not the errata needs to be applied. At runtime, you unconditionally call the errata apply function from context save/restore(when the compile time flag is enabled) and the cpu_ops function can decide based on the flag whether or not the errata needs to be applied. This seems simple(no look ups required other than during cpu reset) although we may not know for sure until it is implemented that way. Anyway, i'll wait for Manish's solution.
Hi Raghu, Yes, that would work. The only issue being reset handler runs with caches disabled which means some cache maintenance would have to be done later on prior to read with caches enabled. The approach taken for other runtime erratas has been to allocate a function pointer in the cpu_ops which then detects and applies errata if applicable. Only affected CPUs would populate the function pointer and other unaffected CPUs would leave it NULL and this is statically done at build time.
The other alternative is to introduce a new errata detect() function which will be invoked after caches are enabled after warm/cold reset and this can set the per-cpu flag/global variable.
This would involve some study and new design I expect.
So for a simple EL1/EL2 SMC call into EL3 and return, this errata will not apply
[RK] Thanks for clarifying. I did not get this from looking at the errata document. I took a closer look at the kvm patches and it does agree with your assessment. For TF-A, you probably don't require the work around during context save since all we are doing is reading registers and saving them off and there cannot be any inconsistent state. Context-restore is the window during which you can have inconsistent register state at which time you can apply the workaround. Is there a reason to do it during context-save as well(other than it being clean/symmetrical with context-restore)?
Yes, agree that we only need to do this only during restore. Given that the static option is low impact, we can go with static option for now as Manish indicates and then pickup the new design for runtime detection as an enhancement. It may not matter much for this workaround, but in general would be good to do in preparation for future such runtime errata workarounds.
Best regards Soby Mathew
Thanks Raghu
On 5/4/20 7:38 AM, Soby Mathew wrote:
Hi Raghu There would always be a build option that would disable this workaround.
The runtime check based on MIDR would be an add-on on top of that. So for big-Little systems, if any one of the CPU is affected, the platform would enable the workaround. The runtime check could potentially skip the unaffected CPU in the system. The benefit of this skip may not justify the additional lookup and complexity needed to pull in the cpu_ops framework for the runtime check. Manish is doing some thinking in this direction.
For the unaffected platform, this runtime check wouldn't matter as the
build option to enable the workaround would be disabled.
This is what we currently understand, the errata only happens during a
context switch for the out of context translation regime. So for a simple EL1/EL2 SMC call into EL3 and return, this errata will not apply as the EL2/EL1 context is not changed. The TLB caching due to speculation in EL3 should not really matter as the lower EL context on return will be same as on entry.
The context switch happens for a NS ->EL3 -> S-EL1 SMC call. Hence the
reasoning behind adding it to context save and restore sequence as this is only invoked for such a switch.
This is as per my understanding of the problem.
Best Regards Soby Mathew
-----Original Message----- From: Raghu K raghu.ncstate@icloud.com Sent: 01 May 2020 19:22 To: Soby Mathew Soby.Mathew@arm.com; Manish Badarkhe Manish.Badarkhe@arm.com Cc: tf-a@lists.trustedfirmware.org Subject: Re: [TF-A] Need input on Errata implementation
Thanks Soby. If the fix involves looping through a list of CPU's, the approach may be fine. I missed the case where you could have multiple CPU MIDR's on the same SoC(big.little's???). The ideal solution would be SoC's with CPU's *without* this errata have zero penalty(no checking for the errata, perhaps by being compiled out), and for SoC's that contain CPU's with the errata have a fairly quick check or an always compiled in errata fix. A combination of platform specific compile time flags and/or per-cpu variables or global variables should be able to achieve this.
Also do you have any insights on if the workaround needs to be applied in the scenario i mentioned below(simple SMC call without any call to context save/restore)? I'm trying to understand why the errata applies only if we hit the context-save/restore path.
-Raghu
On 5/1/20 7:02 AM, Soby Mathew wrote:
-----Original Message----- From: TF-A tf-a-bounces@lists.trustedfirmware.org On Behalf Of Raghu Krishnamurthy via TF-A Sent: 30 April 2020 02:33 To: Manish Badarkhe Manish.Badarkhe@arm.com; tf- a@lists.trustedfirmware.org Cc: nd nd@arm.com Subject: Re: [TF-A] Need input on Errata implementation
Hi Manish,
Really appreciate you for taking time to respond to my
concerns/questions.
What about this situation? NS-EL2 makes an SMC call to EL3 to get some basic information like GET_SOC_INFORMATION. This is a simple SMC and there is no call to context save or context restore. During the SMC call, if there is a speculative AT instruction on a lower EL(say NS-EL2), there could be a bad cached translation. Do you not need to apply the errata in this situation ? If not, why?
>We can't simply apply this errata on reset and just leave the
system.
[RK]Totally agree. See CPU_E_HANDLER_FUNC. It is not necessary that cpu_ops are only called during reset and power down. CPU_E_HANDLER_FUNC is called at runtime due EA's.
>We thought of taking different approach for this errata
implementation >>where anybody disable this workaround using
macro
as this errata is >>applicable for most of the CPUs (by default enabled) and can't be >>placed in cpu_ops.
[RK]This is a poor approach in my view. Most CPU's is not all CPU's. The reason the errata framework exists is to apply CPU specific erratas by checking for them dynamically. Different stepping's of the same CPU's may or may not have the errata and typically you check the MIDR to know if the errata applies or not. Linux does not apply the errata to all CPU's since "most" CPU's have the issue. They check for its existence at runtime and only then apply it. TF-A should not hold itself to
a lower standard.
Hi Raghu I guess this depends on what the errata workaround involves. Since this
workaround applies bit setting on an out of context register, it was not expected to affect the EL3 execution performance (or the lower level EL because the bits are restored on return). Also it was thought that the act of searching through the list of compiled CPUs and checking if the workaround is applicable might be more detrimental than the unilateral application of the workaround for this case (assuming no extra barriers are added since the code path it is
inserted in have them already later in the sequence).
But I agree it is more elegant to have this coupled into CPU_OPS framework. I
think Manish has some ideas for this.
Best Regards Soby Mathew
-Raghu
On 4/29/20 1:35 AM, Manish Badarkhe wrote:
Hi Raghu
Just to add/correct one more thing from my previous emails that this errata
workaround proposed is
applied to both normal and secure world switches to EL3.
Thanks Manish Badarkhe
On 29/04/2020, 12:25, "TF-A on behalf of Manish Badarkhe via TF-A" <tf-a-
bounces@lists.trustedfirmware.org on behalf of tf-a@lists.trustedfirmware.org> wrote:
Hi Raghu On 29/04/2020, 02:00, "Raghu K" <raghu.ncstate@icloud.com>
wrote:
Hi Manish, Thanks. >> we don’t have any AT instances in minimum execution
window after
context switching from S-EL(1/2)
>> to EL3 and before updating TCR register. 1) What is the minimum execution window? Does that not
change based
on micro-architecture?
Not sure about exact minimum execution window. IMO, it
really depend
upon when "context_save" gets called after
entering into EL3 from S-EL1/2. It may changed upon micro-
architecture.
Need some experts comment here.
2) Do we know that the "execution window" is exactly
the same for all
the CPU's this errata applies to?
It may be but we should not worry on that if we don’t have
any AT
instruction execution in that window.
Also, it appears we are only talking about switching
from S-EL1/2 to
EL3.
The same issue can happen when you go from NS-EL1/EL2 to EL3 as
well.
There also seems to be an assumption in the patch you submitted that this errata happens only during a so called context-switch. From my reading, the cortex- Ax errata notices don't limit the errata to occur only
during "context-switches"
in the "conditions" section and can occur while executing ANY code, although the work around section does muddy the waters a bit.
In Linux, at NS-EL2 this workaround is already in place.
Hence we just
thought of considering cases from Secure EL side to put this
workaround.
Yes, errata should not limit to particular conditional
section but this
particular errata is not straight-forward like another errata placed in the code currently. We can't simply apply this errata on reset and just
leave the system.
Back to problem, AT instruction speculative execution using
out-of-
context regime that results in page table walk and generate the incorrect
translation which are cached in TLB. To avoid this issue we
thought of
disabling PTW for that particular EL.
for e.g. If AT instruction execution for EL1 present in EL3
then we have to
make sure speculative behaviour of this AT should not result in incorrect translation cached in TLB. If system is always in EL3 (if we loop-in in EL3 always without going back and forth to/from lower EL) then in that case
there is no need of this workaround. Hence we thought to put this workaround over boundary
context of
context switches. When "context save" (close to EL3 entry) happened we meticulously save all EL system registers (S-EL1/S-EL2) with PTW disabled and continue EL3 execution with PTW disabled ensuring we should not cache any incorrect translation for (S-EL1/S-EL2) and during "context restore" (i.e. close to EL3 exit) again we disabled PTW, restore all system registers for EL (S-EL1/S- EL2) except TCR and then restore TCR.
3) Has there been any work done to actually reproduce
this issue and
also to see that this actually fixes the issue?
No this issue is hard to reproduce. 4) Has the CPU errata framework(cpu_ops etc.) been
considered to
possibly implement the errata? Sprinkling erratas through common framework code does not seem like a good idea.
We thought of taking different approach for this errata
implementation
where anybody disable this workaround using macro as this errata is applicable for most of the CPUs (by default enabled) and can't be placed in cpu_ops.
Thanks Raghu Thanks Manish Badarkhe On 4/28/20 1:44 AM, Manish Badarkhe wrote: > Hi Raghu > > Please see my replies inline. > > Regards > Manish Badarkhe > > On 28/04/2020, 11:29, "Raghu Krishnamurthy"
raghu.ncstate@icloud.com wrote:
> > Hi Manish, > > Understood. > > >>Hence before entering in EL3, we ensured that PTW is
disabled
(at
> context save) > > The context save and restore functions are executed in EL3.
So
how
are
> you disabling PTW before entering EL3 ? > > Yes, I put it wrongly. We thought "context_save/restore"
is best place
to disable PTW without much affecting the
> code because we don’t have any AT instances in
minimum execution
window after context switching from S-EL(1/2)
> to EL3 and before updating TCR register. > > -Raghu > > Thanks > Manish Badarkhe > > On 4/27/20 10:53 PM, Manish Badarkhe wrote: > > Hi Raghu > > > > This workaround is specifically need for speculative AT
instruction
behaviour in out of context regime.
> > That means executing AT instruction for lower ELs (S-
EL1/S-EL2)
in
higher EL i.e. EL3.
> > > > Behaviour of AT instruction is unaltered when it get
executed
in
same regime (when AT instruction executed for same EL
> > where it is executing) and there is no possibility to
execute AT
instruction for higher EL in lower EL.
> > > > Hence before entering in EL3, we ensured that PTW is
disabled
(at
context save) and restore PTW back during
> > exit of EL3. (at context restore). > > > > Thanks > > Manish Badarkhe > > > > On 28/04/2020, 01:23, "Raghu K"
wrote:
> > > > Hi Manish, > > > > >>Hence proposed solution will work as it is > > > > [RK]If you are sure go ahead. I'm not convinced, but
that may
be because
> > i don't understand the errata fully/correctly. > > > > >>This workaround is very specific during context
switching
> > > > [RK] Context switching has many meanings depending
on the
context(OS,
> > hypervisor, TF-A world switch etc). The errata
document i
saw
does not
> > elaborate on this. Perhaps clarifying this will help
understand
why the
> > solution you proposed will work. > > > > The solution below in points 2 and 3 have the same
problem
on
entry and
> > exit, mentioned in my first email. Before you call > > el1_sysregs_context_save, an AT instruction could
have
speculatively
> > executed through speculation of branches that occur
BEFORE
you call this
> > function, when TCR still has the enable bit set. The
fact that
you
don't
> > have an AT instruction in the context save routine or
any
routine for
> > that matter, does not guarantee that the hardware
did not
speculate
> > through some other means to reach an AT instruction.
The
same applies to
> > the context_restore routines. There is no guarantee
right
after
you
> > finish the restore routing(and hence TCR has the
enable bit
set),
that
> > the CPU cannot speculate to an AT instruction. > > So i'm not clear how you can say for certain that there
was
no
> > speculative AT instruction with the proposal below. > > > > Thanks > > Raghu > > > > On 4/27/20 10:08 AM, Manish Badarkhe wrote: > > > Hi All, > > > > > > Just update/correct details. > > > > > > Thanks > > > Manish Badarkhe > > > > > > On 27/04/2020, 22:13, "TF-A on behalf of Manish
Badarkhe
via TF-A" <tf-a-bounces@lists.trustedfirmware.org on behalf of tf- a@lists.trustedfirmware.org> wrote:
> > > > > > Hi Raghu > > > > > > Please ignore my answer on question 2. > > > > > > With internal discussion came to below
conclusion:
> > > 1. This workaround is very specific during
context
switching.
> > > 2 . If you check in context save routine
(el1_sysregs_context_save or el2_sysregs_context_save),
> > > As per proposed solution, First step
performed is to
disable page table walk and we don’t have
> > > any AT instruction execution in context save
routine.
> > > This ensures that there will be no possibility
of
speculative AT instruction execution without TCR update.
> > > 3. If you check in context restore routine
(el1_sysregs_context_restore or el2_sysregs_context_restore),
> > > As per proposed solution, first step
performed is to
disable page table walk and we don’t have any
> > > AT instruction execution in context restore
routine.
> > > This ensures that there will be no possibility
of
speculative AT instruction execution without TCR update.
> > > > > > Hence proposed solution will work as it is
ensuring no
caching of translations in TLB while speculative AT instruction
execution.
> > > > > > Thanks > > > Manish Badarkhe > > > > > > On 27/04/2020, 13:38, "TF-A on behalf of Manish
Badarkhe
via TF-A" <tf-a-bounces@lists.trustedfirmware.org on behalf of tf- a@lists.trustedfirmware.org> wrote:
> > > > > > Hi Raghu > > > > > > Please see my answers inline > > > > > > On 25/04/2020, 06:38, "TF-A on behalf of
Raghu K via
TF-
A" <tf-a-bounces@lists.trustedfirmware.org on behalf of tf- a@lists.trustedfirmware.org> wrote:
> > > > > > Hi Manish, > > > > > > Before I agree or disagree with the
suggested fix, the
following would
> > > be interesting to know/discuss. Please feel
free to
correct me if i've
> > > misunderstood something. > > > > > > 1) Are "speculative" AT instructions subject
to
TCR_ELx
control bits for
> > > all the listed CPU's? I imagine the answer is
yes but
would be good to
> > > get confirmation. I could not find any
evidence in
the
instruction
> > > description or psuedocode in the ARMv8
ARM. It is
possible to play many
> > > tricks on speculative execution of
instructions such
as
skipping checks
> > > and doing them only when the CPU knows
the
instruction will be
> > > committed. If this is the case, changing
TCR_ELx bits
may not work. The
> > > errata document is vague about how to fix it. > > > > > > The speculative AT instruction may behave as
you
mentioned. We need more
> > > opinion on this. > > > Proposed fix I mentioned by referring linux
workaround
for the same errata.
> > > Linux workaround is available in mainline
kernel as
below:
> > >
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/ co mmit/?h= v5.7-
rc3&id=bd227553ad5077f21ddb382dcd910ba46181805a
> > > > > > 2) Assuming the answer to question 1 is yes,
your
proposal may not work
> > > as is. In the worst case, as soon as you enter
EL3, the
very first thing
> > > that may happen, before you ever
operate/write to
TCR_ELx, is a
> > > speculative AT instruction that caches a bad
translation
in the TLB's.
> > > The same thing can happen on the exit path.
As
soon as
you restore the
> > > TCR_ELx register, the first thing that can
happen is a
speculative AT
> > > that caches a bad translation. However, the
el3_exit
path does have DSB
> > > before ERET, so we will not speculate to an
AT
instruction if there are
> > > no branches between the instruction that
sets
TCR_ELx
and the ERET.
> > > Somewhere in between, it looks like we will
need a
TLBI NSH to be
> > > certain there are no bad translation cached.
This
obviously has a
> > > potential performance cost on the lower EL's.
Every
entry into EL3
> > > flushes the TLB for lower EL's. > > > > > > Yes, this seems to be valid case during entry
and exit
path.
> > > I am not quite sure in that case where we need
to
avoid
PTW.
> > > Also "TLBI NSH" works but it may cause
performance
issue.
> > > Need some more opinion/thoughts on this. > > > > > > Just thinking, can sequence mentioned for
context
save
does not ensure that
> > > PTW is disabled? > > > Something as below as last step in ELx(1/2)
context
save
(elaborated more):
> > > > ·Save TCR register with PTW enable
(EPD=0). (Just
to
enable PTW during
> > > > restore context). Do not operate TCR_EL1x
register
here just save its value to restore.
> > > > This ensures that during entry in EL3 there
will be
no
chance of PTW
> > > >. while executing AT instruction. > > > > > > Thanks > > > > > > Raghu > > > > > > Thanks > > > Manish Badarkhe > > > > > > On 4/24/20 2:56 AM, Manish Badarkhe via
TF-A
wrote:
> > > > > > > > Hi All > > > > > > > > We are trying to implement errata which is
applicable
for below CPUs:
> > > > > > > > <CPUs> : <Errata No.> > > > > > > > > Cortex-A53: 1530924 > > > > > > > > Cortex-A76: 1165522 > > > > Cortex-A72: 1319367 > > > > Cortex-A57: 1319537 > > > > Cortex-A55: 1530923 > > > > > > > > *Errata Description:* > > > > > > > > A speculative Address Translation (AT)
instruction
translates using
> > > > registers that are associated with an out-
of-
context
translation
> > > > regime and caches the resulting
translation in the
TLB.
A subsequent
> > > > translation request that is generated when
the
out-
of-context
> > > > translation regime is current uses the
previous
cached
TLB entry
> > > > producing an incorrect virtual to physical
mapping.
> > > > > > > > *Probable solution is to implement below
fix in
context.S file:*
> > > > > > > > *During ELx (1 or 2) context save:* > > > > > > > > ·Operate TCR_ELx(1/2) to disable page
table walk
by
operating EPD bits
> > > > > > > > oThis will avoid any page table walk for S-
EL1 or S-
EL2.
This will
> > > > help in avoiding caching of translations in
TLB
> > > > > > > > for S-EL1/S-EL2 in EL3. > > > > > > > > ·Save all system registers (which is already
available)
except TCR
> > > > > > > > ·Clear EPD bits of TCR and then save. (Just
to
enable
PTW during
> > > > restore context). > > > > > > > > *During ELx (1 or 2) context restore:* > > > > > > > > * Operate TCR_ELx(1/2) to disable page
table
walk
by operating EPD bits
> > > > * Restore all system registers (which are
saved
during context save)
> > > > except TCR register. > > > > * Restore TCR_ELx(1/2) register (which
enable
back
PTW).
> > > > > > > > With above we ensured that there will be
no page
table walk for S-EL1
> > > > and S-EL2 in EL3. > > > > > > > > is this proper other way to fix this problem?
Need
some suggestion/use
> > > > cases where and all we need this
workaround in
TF-A
code.
> > > > > > > > Thanks > > > > > > > > Manish Badarkhe > > > > > > > > IMPORTANT NOTICE: The contents of this
and
any attachments are
> > > > confidential and may also be privileged. If
you are
not
the intended
> > > > recipient, please notify the sender
immediately
and
do not disclose
> > > > the contents to any other person, use it
for any
purpose, or store or
> > > > copy the information in any medium.
Thank you.
> > > > > > > > > > -- > > > TF-A mailing list > > > TF-A@lists.trustedfirmware.org > > >
https://lists.trustedfirmware.org/mailman/listinfo/tf-a
> > > > > > IMPORTANT NOTICE: The contents of this email
and
any
attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank
you.
> > > -- > > > TF-A mailing list > > > TF-A@lists.trustedfirmware.org > > >
https://lists.trustedfirmware.org/mailman/listinfo/tf-a
> > > > > > IMPORTANT NOTICE: The contents of this email
and any
attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank
you.
> > > -- > > > TF-A mailing list > > > TF-A@lists.trustedfirmware.org > > >
https://lists.trustedfirmware.org/mailman/listinfo/tf-a
> > > > > > IMPORTANT NOTICE: The contents of this email and
any
attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank
you.
> > > > > > IMPORTANT NOTICE: The contents of this email and any
attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank
you.
> > > > IMPORTANT NOTICE: The contents of this email and any
attachments
are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
-- TF-A mailing list TF-A@lists.trustedfirmware.org https://lists.trustedfirmware.org/mailman/listinfo/tf-a
-- TF-A mailing list TF-A@lists.trustedfirmware.org https://lists.trustedfirmware.org/mailman/listinfo/tf-a
Thanks Soby, Manish. Agree. Patch looks good.
-Raghu
On 5/5/20 1:35 AM, Soby Mathew wrote:
-----Original Message----- From: Raghu K raghu.ncstate@icloud.com Sent: 04 May 2020 17:13 To: Soby Mathew Soby.Mathew@arm.com; Manish Badarkhe Manish.Badarkhe@arm.com Cc: tf-a@lists.trustedfirmware.org Subject: Re: [TF-A] Need input on Errata implementation
Thanks Soby. This makes things clearer.
The benefit of this skip may not justify the additional lookup and complexity needed to pull in the cpu_ops >>framework for the runtime check
[RK]Agree. I was thinking along the lines of having the CPU reset function set a per-cpu or global flag that tells the runtime function whether or not the errata needs to be applied. At runtime, you unconditionally call the errata apply function from context save/restore(when the compile time flag is enabled) and the cpu_ops function can decide based on the flag whether or not the errata needs to be applied. This seems simple(no look ups required other than during cpu reset) although we may not know for sure until it is implemented that way. Anyway, i'll wait for Manish's solution.
Hi Raghu, Yes, that would work. The only issue being reset handler runs with caches disabled which means some cache maintenance would have to be done later on prior to read with caches enabled. The approach taken for other runtime erratas has been to allocate a function pointer in the cpu_ops which then detects and applies errata if applicable. Only affected CPUs would populate the function pointer and other unaffected CPUs would leave it NULL and this is statically done at build time.
The other alternative is to introduce a new errata detect() function which will be invoked after caches are enabled after warm/cold reset and this can set the per-cpu flag/global variable.
This would involve some study and new design I expect.
So for a simple EL1/EL2 SMC call into EL3 and return, this errata will not apply
[RK] Thanks for clarifying. I did not get this from looking at the errata document. I took a closer look at the kvm patches and it does agree with your assessment. For TF-A, you probably don't require the work around during context save since all we are doing is reading registers and saving them off and there cannot be any inconsistent state. Context-restore is the window during which you can have inconsistent register state at which time you can apply the workaround. Is there a reason to do it during context-save as well(other than it being clean/symmetrical with context-restore)?
Yes, agree that we only need to do this only during restore. Given that the static option is low impact, we can go with static option for now as Manish indicates and then pickup the new design for runtime detection as an enhancement. It may not matter much for this workaround, but in general would be good to do in preparation for future such runtime errata workarounds.
Best regards Soby Mathew
Thanks Raghu
On 5/4/20 7:38 AM, Soby Mathew wrote:
Hi Raghu There would always be a build option that would disable this workaround.
The runtime check based on MIDR would be an add-on on top of that. So for big-Little systems, if any one of the CPU is affected, the platform would enable the workaround. The runtime check could potentially skip the unaffected CPU in the system. The benefit of this skip may not justify the additional lookup and complexity needed to pull in the cpu_ops framework for the runtime check. Manish is doing some thinking in this direction.
For the unaffected platform, this runtime check wouldn't matter as the
build option to enable the workaround would be disabled.
This is what we currently understand, the errata only happens during a
context switch for the out of context translation regime. So for a simple EL1/EL2 SMC call into EL3 and return, this errata will not apply as the EL2/EL1 context is not changed. The TLB caching due to speculation in EL3 should not really matter as the lower EL context on return will be same as on entry.
The context switch happens for a NS ->EL3 -> S-EL1 SMC call. Hence the
reasoning behind adding it to context save and restore sequence as this is only invoked for such a switch.
This is as per my understanding of the problem.
Best Regards Soby Mathew
-----Original Message----- From: Raghu K raghu.ncstate@icloud.com Sent: 01 May 2020 19:22 To: Soby Mathew Soby.Mathew@arm.com; Manish Badarkhe Manish.Badarkhe@arm.com Cc: tf-a@lists.trustedfirmware.org Subject: Re: [TF-A] Need input on Errata implementation
Thanks Soby. If the fix involves looping through a list of CPU's, the approach may be fine. I missed the case where you could have multiple CPU MIDR's on the same SoC(big.little's???). The ideal solution would be SoC's with CPU's *without* this errata have zero penalty(no checking for the errata, perhaps by being compiled out), and for SoC's that contain CPU's with the errata have a fairly quick check or an always compiled in errata fix. A combination of platform specific compile time flags and/or per-cpu variables or global variables should be able to achieve this.
Also do you have any insights on if the workaround needs to be applied in the scenario i mentioned below(simple SMC call without any call to context save/restore)? I'm trying to understand why the errata applies only if we hit the context-save/restore path.
-Raghu
On 5/1/20 7:02 AM, Soby Mathew wrote:
-----Original Message----- From: TF-A tf-a-bounces@lists.trustedfirmware.org On Behalf Of Raghu Krishnamurthy via TF-A Sent: 30 April 2020 02:33 To: Manish Badarkhe Manish.Badarkhe@arm.com; tf- a@lists.trustedfirmware.org Cc: nd nd@arm.com Subject: Re: [TF-A] Need input on Errata implementation
Hi Manish,
Really appreciate you for taking time to respond to my
concerns/questions.
What about this situation? NS-EL2 makes an SMC call to EL3 to get some basic information like GET_SOC_INFORMATION. This is a simple SMC and there is no call to context save or context restore. During the SMC call, if there is a speculative AT instruction on a lower EL(say NS-EL2), there could be a bad cached translation. Do you not need to apply the errata in this situation ? If not, why?
>>We can't simply apply this errata on reset and just leave the
system.
[RK]Totally agree. See CPU_E_HANDLER_FUNC. It is not necessary that cpu_ops are only called during reset and power down. CPU_E_HANDLER_FUNC is called at runtime due EA's.
>>We thought of taking different approach for this errata
implementation >>where anybody disable this workaround using
macro
as this errata is >>applicable for most of the CPUs (by default enabled) and can't be >>placed in cpu_ops.
[RK]This is a poor approach in my view. Most CPU's is not all CPU's. The reason the errata framework exists is to apply CPU specific erratas by checking for them dynamically. Different stepping's of the same CPU's may or may not have the errata and typically you check the MIDR to know if the errata applies or not. Linux does not apply the errata to all CPU's since "most" CPU's have the issue. They check for its existence at runtime and only then apply it. TF-A should not hold itself to
a lower standard.
Hi Raghu I guess this depends on what the errata workaround involves. Since this
workaround applies bit setting on an out of context register, it was not expected to affect the EL3 execution performance (or the lower level EL because the bits are restored on return). Also it was thought that the act of searching through the list of compiled CPUs and checking if the workaround is applicable might be more detrimental than the unilateral application of the workaround for this case (assuming no extra barriers are added since the code path it is
inserted in have them already later in the sequence).
But I agree it is more elegant to have this coupled into CPU_OPS framework. I
think Manish has some ideas for this.
Best Regards Soby Mathew
-Raghu
On 4/29/20 1:35 AM, Manish Badarkhe wrote: > Hi Raghu > > Just to add/correct one more thing from my previous emails that > this errata workaround proposed is > applied to both normal and secure world switches to EL3. > > Thanks > Manish Badarkhe > > On 29/04/2020, 12:25, "TF-A on behalf of Manish Badarkhe via TF-A" > <tf-a- bounces@lists.trustedfirmware.org on behalf of tf-a@lists.trustedfirmware.org> wrote: > Hi Raghu > > On 29/04/2020, 02:00, "Raghu K" raghu.ncstate@icloud.com
wrote:
> Hi Manish, > > Thanks. > > >> we don’t have any AT instances in minimum execution > window after context switching from S-EL(1/2) > >> to EL3 and before updating TCR register. > > 1) What is the minimum execution window? Does that not > change based on micro-architecture? > Not sure about exact minimum execution window. IMO, it > really depend upon when "context_save" gets called after > entering into EL3 from S-EL1/2. It may changed upon micro-
architecture.
Need some experts comment here. > 2) Do we know that the "execution window" is exactly > the same for all the CPU's this errata applies to? > It may be but we should not worry on that if we don’t have > any AT instruction execution in that window. > Also, it appears we are only talking about switching > from S-EL1/2 to
EL3.
The same issue can happen when you go from NS-EL1/EL2 to EL3 as
well.
There also seems to be an assumption in the patch you submitted that this errata happens only during a so called context-switch. From my reading, the cortex- Ax errata notices don't limit the errata to occur only
during "context-switches"
in the "conditions" section and can occur while executing ANY code, although the work around section does muddy the waters a bit. > In Linux, at NS-EL2 this workaround is already in place. > Hence we just thought of considering cases from Secure EL side to put this
workaround.
> Yes, errata should not limit to particular conditional > section but this particular errata is not straight-forward like another errata placed in the code currently. We can't simply apply this errata on reset and just
leave the system.
> Back to problem, AT instruction speculative execution using > out-of- context regime that results in page table walk and generate the incorrect > translation which are cached in TLB. To avoid this issue we > thought of disabling PTW for that particular EL. > for e.g. If AT instruction execution for EL1 present in EL3 > then we have to make sure speculative behaviour of this AT should not result in incorrect translation cached in TLB. If system is always in EL3 (if we loop-in in EL3 always without going back and forth to/from lower EL) then in that case > there is no need of this workaround. > Hence we thought to put this workaround over boundary > context of context switches. When "context save" (close to EL3 entry) happened we meticulously save all EL system registers (S-EL1/S-EL2) with PTW disabled and continue EL3 execution with PTW disabled ensuring we should not cache any incorrect translation for (S-EL1/S-EL2) and during "context restore" (i.e. close to EL3 exit) again we disabled PTW, restore all system registers for EL (S-EL1/S- EL2) except TCR and then restore TCR. > 3) Has there been any work done to actually reproduce > this issue and also to see that this actually fixes the issue? > No this issue is hard to reproduce. > > 4) Has the CPU errata framework(cpu_ops etc.) been > considered to possibly implement the errata? Sprinkling erratas through common framework code does not seem like a good idea. > We thought of taking different approach for this errata > implementation where anybody disable this workaround using macro as this errata is applicable for most of the CPUs (by default enabled) and can't be placed in cpu_ops. > Thanks > Raghu > > Thanks > Manish Badarkhe > > On 4/28/20 1:44 AM, Manish Badarkhe wrote: > > Hi Raghu > > > > Please see my replies inline. > > > > Regards > > Manish Badarkhe > > > > On 28/04/2020, 11:29, "Raghu Krishnamurthy" raghu.ncstate@icloud.com wrote: > > > > Hi Manish, > > > > Understood. > > > > >>Hence before entering in EL3, we ensured that PTW is
disabled
(at > > context save) > > > > The context save and restore functions are executed in EL3.
So
how
are > > you disabling PTW before entering EL3 ? > > > > Yes, I put it wrongly. We thought "context_save/restore" > is best place to disable PTW without much affecting the > > code because we don’t have any AT instances in > minimum execution window after context switching from S-EL(1/2) > > to EL3 and before updating TCR register. > > > > -Raghu > > > > Thanks > > Manish Badarkhe > > > > On 4/27/20 10:53 PM, Manish Badarkhe wrote: > > > Hi Raghu > > > > > > This workaround is specifically need for speculative AT
instruction
behaviour in out of context regime. > > > That means executing AT instruction for lower ELs (S-
EL1/S-EL2)
in
higher EL i.e. EL3. > > > > > > Behaviour of AT instruction is unaltered when it get
executed
in
same regime (when AT instruction executed for same EL > > > where it is executing) and there is no possibility to
execute AT
instruction for higher EL in lower EL. > > > > > > Hence before entering in EL3, we ensured that PTW is
disabled
(at
context save) and restore PTW back during > > > exit of EL3. (at context restore). > > > > > > Thanks > > > Manish Badarkhe > > > > > > On 28/04/2020, 01:23, "Raghu K"
wrote: > > > > > > Hi Manish, > > > > > > >>Hence proposed solution will work as it is > > > > > > [RK]If you are sure go ahead. I'm not convinced, but
that may
be because > > > i don't understand the errata fully/correctly. > > > > > > >>This workaround is very specific during context
switching
> > > > > > [RK] Context switching has many meanings depending
on the
context(OS, > > > hypervisor, TF-A world switch etc). The errata
document i
saw
does not > > > elaborate on this. Perhaps clarifying this will help
understand
why the > > > solution you proposed will work. > > > > > > The solution below in points 2 and 3 have the same
problem
on
entry and > > > exit, mentioned in my first email. Before you call > > > el1_sysregs_context_save, an AT instruction could
have
speculatively > > > executed through speculation of branches that occur
BEFORE
you call this > > > function, when TCR still has the enable bit set. The
fact that
you
don't > > > have an AT instruction in the context save routine or
any
routine for > > > that matter, does not guarantee that the hardware
did not
speculate > > > through some other means to reach an AT instruction.
The
same applies to > > > the context_restore routines. There is no guarantee
right
after
you > > > finish the restore routing(and hence TCR has the
enable bit
set),
that > > > the CPU cannot speculate to an AT instruction. > > > So i'm not clear how you can say for certain that there
was
no
> > > speculative AT instruction with the proposal below. > > > > > > Thanks > > > Raghu > > > > > > On 4/27/20 10:08 AM, Manish Badarkhe wrote: > > > > Hi All, > > > > > > > > Just update/correct details. > > > > > > > > Thanks > > > > Manish Badarkhe > > > > > > > > On 27/04/2020, 22:13, "TF-A on behalf of Manish
Badarkhe
via TF-A" <tf-a-bounces@lists.trustedfirmware.org on behalf of tf- a@lists.trustedfirmware.org> wrote: > > > > > > > > Hi Raghu > > > > > > > > Please ignore my answer on question 2. > > > > > > > > With internal discussion came to below
conclusion:
> > > > 1. This workaround is very specific during
context
switching. > > > > 2 . If you check in context save routine (el1_sysregs_context_save or el2_sysregs_context_save), > > > > As per proposed solution, First step
performed is to
disable page table walk and we don’t have > > > > any AT instruction execution in context save
routine.
> > > > This ensures that there will be no possibility
of
speculative AT instruction execution without TCR update. > > > > 3. If you check in context restore routine (el1_sysregs_context_restore or el2_sysregs_context_restore), > > > > As per proposed solution, first step
performed is to
disable page table walk and we don’t have any > > > > AT instruction execution in context restore
routine.
> > > > This ensures that there will be no possibility
of
speculative AT instruction execution without TCR update. > > > > > > > > Hence proposed solution will work as it is
ensuring no
caching of translations in TLB while speculative AT instruction
execution.
> > > > > > > > Thanks > > > > Manish Badarkhe > > > > > > > > On 27/04/2020, 13:38, "TF-A on behalf of Manish
Badarkhe
via TF-A" <tf-a-bounces@lists.trustedfirmware.org on behalf of tf- a@lists.trustedfirmware.org> wrote: > > > > > > > > Hi Raghu > > > > > > > > Please see my answers inline > > > > > > > > On 25/04/2020, 06:38, "TF-A on behalf of
Raghu K via
TF-
A" <tf-a-bounces@lists.trustedfirmware.org on behalf of tf- a@lists.trustedfirmware.org> wrote: > > > > > > > > Hi Manish, > > > > > > > > Before I agree or disagree with the
suggested fix, the
following would > > > > be interesting to know/discuss. Please feel
free to
correct me if i've > > > > misunderstood something. > > > > > > > > 1) Are "speculative" AT instructions subject
to
TCR_ELx
control bits for > > > > all the listed CPU's? I imagine the answer is
yes but
would be good to > > > > get confirmation. I could not find any
evidence in
the
instruction > > > > description or psuedocode in the ARMv8
ARM. It is
possible to play many > > > > tricks on speculative execution of
instructions such
as
skipping checks > > > > and doing them only when the CPU knows
the
instruction will be > > > > committed. If this is the case, changing
TCR_ELx bits
may not work. The > > > > errata document is vague about how to fix it. > > > > > > > > The speculative AT instruction may behave as
you
mentioned. We need more > > > > opinion on this. > > > > Proposed fix I mentioned by referring linux
workaround
for the same errata. > > > > Linux workaround is available in mainline
kernel as
below: > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/ co mmit/?h= v5.7-
rc3&id=bd227553ad5077f21ddb382dcd910ba46181805a
> > > > > > > > 2) Assuming the answer to question 1 is yes,
your
proposal may not work > > > > as is. In the worst case, as soon as you enter
EL3, the
very first thing > > > > that may happen, before you ever
operate/write to
TCR_ELx, is a > > > > speculative AT instruction that caches a bad
translation
in the TLB's. > > > > The same thing can happen on the exit path.
As
soon as
you restore the > > > > TCR_ELx register, the first thing that can
happen is a
speculative AT > > > > that caches a bad translation. However, the
el3_exit
path does have DSB > > > > before ERET, so we will not speculate to an
AT
instruction if there are > > > > no branches between the instruction that
sets
TCR_ELx
and the ERET. > > > > Somewhere in between, it looks like we will
need a
TLBI NSH to be > > > > certain there are no bad translation cached.
This
obviously has a > > > > potential performance cost on the lower EL's.
Every
entry into EL3 > > > > flushes the TLB for lower EL's. > > > > > > > > Yes, this seems to be valid case during entry
and exit
path.
> > > > I am not quite sure in that case where we need
to
avoid
PTW. > > > > Also "TLBI NSH" works but it may cause
performance
issue. > > > > Need some more opinion/thoughts on this. > > > > > > > > Just thinking, can sequence mentioned for
context
save
does not ensure that > > > > PTW is disabled? > > > > Something as below as last step in ELx(1/2)
context
save
(elaborated more): > > > > > ·Save TCR register with PTW enable
(EPD=0). (Just
to
enable PTW during > > > > > restore context). Do not operate TCR_EL1x
register
here just save its value to restore. > > > > > This ensures that during entry in EL3 there
will be
no
chance of PTW > > > > >. while executing AT instruction. > > > > > > > > Thanks > > > > > > > > Raghu > > > > > > > > Thanks > > > > Manish Badarkhe > > > > > > > > On 4/24/20 2:56 AM, Manish Badarkhe via
TF-A
wrote:
> > > > > > > > > > Hi All > > > > > > > > > > We are trying to implement errata which is
applicable
for below CPUs: > > > > > > > > > > <CPUs> : <Errata No.> > > > > > > > > > > Cortex-A53: 1530924 > > > > > > > > > > Cortex-A76: 1165522 > > > > > Cortex-A72: 1319367 > > > > > Cortex-A57: 1319537 > > > > > Cortex-A55: 1530923 > > > > > > > > > > *Errata Description:* > > > > > > > > > > A speculative Address Translation (AT)
instruction
translates using > > > > > registers that are associated with an out-
of-
context
translation > > > > > regime and caches the resulting
translation in the
TLB.
A subsequent > > > > > translation request that is generated when
the
out-
of-context > > > > > translation regime is current uses the
previous
cached
TLB entry > > > > > producing an incorrect virtual to physical
mapping.
> > > > > > > > > > *Probable solution is to implement below
fix in
context.S file:* > > > > > > > > > > *During ELx (1 or 2) context save:* > > > > > > > > > > ·Operate TCR_ELx(1/2) to disable page
table walk
by
operating EPD bits > > > > > > > > > > oThis will avoid any page table walk for S-
EL1 or S-
EL2.
This will > > > > > help in avoiding caching of translations in
TLB
> > > > > > > > > > for S-EL1/S-EL2 in EL3. > > > > > > > > > > ·Save all system registers (which is already
available)
except TCR > > > > > > > > > > ·Clear EPD bits of TCR and then save. (Just
to
enable
PTW during > > > > > restore context). > > > > > > > > > > *During ELx (1 or 2) context restore:* > > > > > > > > > > * Operate TCR_ELx(1/2) to disable page
table
walk
by operating EPD bits > > > > > * Restore all system registers (which are
saved
during context save) > > > > > except TCR register. > > > > > * Restore TCR_ELx(1/2) register (which
enable
back
PTW). > > > > > > > > > > With above we ensured that there will be
no page
table walk for S-EL1 > > > > > and S-EL2 in EL3. > > > > > > > > > > is this proper other way to fix this problem?
Need
some suggestion/use > > > > > cases where and all we need this
workaround in
TF-A
code. > > > > > > > > > > Thanks > > > > > > > > > > Manish Badarkhe > > > > > > > > > > IMPORTANT NOTICE: The contents of this
and
any attachments are > > > > > confidential and may also be privileged. If
you are
not
the intended > > > > > recipient, please notify the sender
immediately
and
do not disclose > > > > > the contents to any other person, use it
for any
purpose, or store or > > > > > copy the information in any medium.
Thank you.
> > > > > > > > > > > > > -- > > > > TF-A mailing list > > > > TF-A@lists.trustedfirmware.org > > > >
https://lists.trustedfirmware.org/mailman/listinfo/tf-a
> > > > > > > > IMPORTANT NOTICE: The contents of this email
and
any
attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank
you.
> > > > -- > > > > TF-A mailing list > > > > TF-A@lists.trustedfirmware.org > > > >
https://lists.trustedfirmware.org/mailman/listinfo/tf-a
> > > > > > > > IMPORTANT NOTICE: The contents of this email
and any
attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank
you.
> > > > -- > > > > TF-A mailing list > > > > TF-A@lists.trustedfirmware.org > > > >
https://lists.trustedfirmware.org/mailman/listinfo/tf-a
> > > > > > > > IMPORTANT NOTICE: The contents of this email and
any
attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank
you.
> > > > > > > > > IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank
you.
> > > > > > > IMPORTANT NOTICE: The contents of this email and any > attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. > -- > TF-A mailing list > TF-A@lists.trustedfirmware.org > https://lists.trustedfirmware.org/mailman/listinfo/tf-a
>
TF-A mailing list TF-A@lists.trustedfirmware.org https://lists.trustedfirmware.org/mailman/listinfo/tf-a
tf-a@lists.trustedfirmware.org