Hello Sandrine,
IMO, the code review process has now evolved from the previous version. I am not too clear on the final outcome.
Does it make sense to add a "lifecycle of a review" section to the initial proposal?
-Varun
-----Original Message-----
From: TF-A <tf-a-bounces(a)lists.trustedfirmware.org> On Behalf Of Sandrine Bailleux via TF-A
Sent: Monday, July 6, 2020 1:53 AM
To: Julius Werner <jwerner(a)chromium.org>
Cc: tf-a <tf-a(a)lists.trustedfirmware.org>
Subject: Re: [TF-A] Announcing some changes around the code review label in Gerrit
External email: Use caution opening links or attachments
Hi Julius and all,
As agreed and announced last week, I've now reintroduced the Code-Review label in addition to the Code-Owner-Review and Maintainer-Review ones.
The Code-Review label is purely informational and won't influence whether a patch is submittable in Gerrit. Anyone should be able to vote on this label, if you face any issue please let me know.
On 7/1/20 11:12 PM, Julius Werner wrote:
>> BTW, any help for the tooling is welcome! If you've got plugin
>> configuration files or scripts we could reuse (with an appropriate
>> license) or even tips on how to best set this up, please feel free to
>> share these.
>
> The Chromium Gerrit has an owners-enforcement system but I'm not
> familiar with how exactly they set it up. I think they're using this
> plugin
> https://gerrit.googlesource.com/plugins/find-owners/+/refs/heads/maste
> r and this is where it's integrated into Gerrit submission rules:
> https://chromium.googlesource.com/chromiumos/+/refs/meta/config/rules.
> pl . It works by having a file called OWNERS in certain directories
> which sets the owners for the subtree below it, so we would have to
> rewrite the current code owner list in that format if we wanted to use
> it.
This "find-owners" plugin sounds promising! Thanks for the pointers.
I notice the "reviewers" plugin [1] is installed on review.tf.org but it does not sound as configurable as "find-owners".
Rewriting the current code owners list to use the OWNERS format does not sound like a big task to me (just a dull one... could be scripted,
though) so this approach definitely sounds worth investigating to me.
[1]
https://review.trustedfirmware.org/plugins/reviewers/Documentation/index.ht…
> There also seems to be this completely separate plugin that claims to
> do roughly the same thing, I have no idea where the difference is
> between the two:
> https://gerrit.googlesource.com/plugins/owners/+/refs/heads/master
I notice the "find-owners" plugin documentation mentions that "this plugin works with Gerrit projects that use Android or Chromium compatible OWNERS files" so maybe they started off from the generic "owners" plugin and customized it for Android/Chromium's needs? Just a guess.
Anyway, worth a look as well, thanks!
[2]
https://gerrit.googlesource.com/plugins/find-owners/+/refs/heads/master/src…
>> One option as you say would be to allow code owners to self-review their
>> patches but I am not convinced we would gain anything out of this. It
>> sounds like a tick-box exercise to me, an admin overhead just to get the
>> patch through the system and I would like to avoid that as much as
>> possible. It is likely that a patch submitter has already self-reviewed
>> his code before posting a patch anyway.
>>
>> The alternative we've been discussing in the team is to call out another
>> reviewer in these situations. I think that there is still value in
>> having a second fresh pair of eyes on a patch. Even if the reviewer has
>> no particular expertise on this specific module, they can still catch
>> potential logic problems or structural issues in the code.
>
> I definitely did not mean to suggest that these patches should not be
> reviewed at all -- I was just talking about the Code-Owner-Review
> label. There would still be a maintainer review, of course, and it
> seems like a good idea to get additional reviews from other people
> too. They just couldn't set the COR+1 bit once we start ACLing it.
Right, I see (and agree with you).
> Basically, each of these bits have their own purpose, and I would see
> the purpose of the COR+1 bit to be that the person most familiar with
> that particular piece of code has made sure that it fits in and
> doesn't cause unexpected problems with certain quirky configurations
> or something like that. That's important when, say, I make a generic
> refactoring that touches a platform I'm unfamiliar with, but if the
> code owner adds to their own platform that's probably already a given
> and they are likely still the best person to judge that, even for
> their own code.
Yes, that makes sense to me.
> That all code gets reviewed by people other than the author I would
> see as a somewhat orthogonal concern that should be checked
> independently. So maybe the rule could be that if the code owner sets
> COR+1 for themselves, there needs to be at least one Code-Review+1 (if
> we reintroduce that label) from another person (who is also not the
> one setting Maintainer-Review+1) to make sure the minimum amount of
> reviews stays the same. Or maybe these should all be completely
> independent checks so every patch needs a Code-Owner-Review+1 (can be
> from the author), a Maintainer-Review+1 (not from the author) and at
> least two Code-Review+1 from people other than the author -- with the
> normal expectation that when a maintainer or code owner reviews a
> patch, they would also set Code-Review+1 (as a general sign that they
> reviewed the patch) in addition to their special meaning flag.
I still don't see the benefit of code owners setting COR+1 for themselves. I would think that a patch owner already reviews his own patch before posting it for review so a self COR+1 is just reinstating the obvious in my eyes.
Your first suggestion (i.e. having at least one CR+1 from another person when a patch author cannot find another code owner to review their
patches) sounds aligned with what I proposed earlier.
Your second suggestion sounds a bit too heavy to me. The fact that people would have to replay their vote on several labels sounds like an admin overhead to me. I guess this would simplify the ACL configuration but I would prefer to avoid this if we can.
So I suggest the following. We could have 2 ways to get a patch approved.
1) For patches that have multiple code owners:
* COR+1 (not from patch owner)
* MR+1 (not from patch owner, not from the person setting COR+1)
2) For patches that a have single code owner:
* CR+1 (not from patch owner)
* MR+1 (not from patch owner, not from the person setting CR+1)
In 2), I like your suggestion of mandating 2 CR+1 but I fear this might be difficult to achieve in practice (simply because we might not have enough reviewers for this).
Note that in some cases, we would need to mix both policies. If a patch modifies several modules, some of them having multiple code owners and others having a single code owner, we would apply one or the other policy for each module.
How does that sound?
Also, 1) assumes we always want different individuals doing the code-owner review and the maintainer review. I personally don't feel strongly about this split and I would be fine with the same person doing both types of review, as they are typically looking at different criteria.
What do others think?
Regards,
Sandrine
--
TF-A mailing list
TF-A(a)lists.trustedfirmware.org
https://lists.trustedfirmware.org/mailman/listinfo/tf-a
Thanks Andrew for raising the issue and Marc and Raghu for your inputs.
The effect of stage 2 when Stage 1 is disabled is something that was overlooked in the workaround implementation. After some discussions this is what we currently understand regarding the problem.
1. EL3 does not modify the EL2 stage 1 translations and hence any speculative execution of AT execution in EL3 and resulting PTW and TLB caching should not be a problem as we always return to the same context in EL2. TF-A in EL3 need not worry about this. Also, we expect that future CPUs having v8.4 extensions for S-EL2 will have already corrected this errata.
2. Whenever the EL1/0 Stage 1 translations are switched in EL3 as part of switching the worlds, it should take Stage 2 into account. Hence disabling the Stage 1 MMU is not effective, as Marc pointed out, because speculative PTW can still take place using Stage 2 identity mapping. After discussion with James, we also realized that flipping the SCR_EL3.NS bit could take Stage 2 out of context (Stage 2 is not valid in secure world) leaving Stage 1 pointing to invalid translations.
As discussed in previous emails, on Entry to EL3, if TCR_EL1.EPDx = 1 and SCTLR_EL3.M = 1, any speculative AT PTWs for EL1 context can be prevented (including Stage 2).
So the fix as we currently understand would involve the following sequence :
a. On Entry to EL3, save the incoming SCTLR_EL1.M and TCR_EL1.EPDx bits and set them (ensure TCR_EL1.EPDx =1 prior to SCTLR_EL1.M =1 using isb())
b. Prior to Exit from EL3, after the target context is restored, restore the SCTLR_EL1.M and TCR_EL1.EPDx bits.
The above sequence now means that any use of AT instruction targeted at lower EL from EL3 that require PTW will fault. So prior to use of AT, ensure the PTW are re-enabled and disabled back again after the AT instructions.
If the above sequence is agreed upon to resolve the errata, then we can work on a patch for the same. I suspect current el1 register save and restore sequence in TF-A is a bit unwieldy and we may need to analyze all the entry points to EL3 to ensure we cover everything.
Best Regards
Soby Mathew
> -----Original Message-----
> From: TF-A <tf-a-bounces(a)lists.trustedfirmware.org> On Behalf Of Andrew
> Scull via TF-A
> Sent: 02 July 2020 09:20
> To: Raghu K <raghu.ncstate(a)icloud.com>
> Cc: android-kvm(a)google.com; willdeacon(a)google.com; Marc Zyngier
> <mzyngier(a)google.com>; tf-a(a)lists.trustedfirmware.org
> Subject: Re: [TF-A] Erroneous speculative AT workaround
>
> On Wed, Jul 01, 2020 at 05:47:00PM -0700, Raghu K wrote:
> > This is interesting. It appears that there is no way on entry to EL3
> > to guarantee that the out-of-context(el2 and el1) translation regimes
> > are in a consistent state and on every entry into EL3, we have to
> > conservatively assume that it is in an inconsistent state. This is
> > because of the situation Andrew mentioned(interrupts to EL3 can occur at any
> time).
> >
> > If this is the case, on EL3 entry:
> > 1) For EL1, we will need to save SCTLR_EL1, set SCTLR_EL1.M = 1,.EPDx
> > = 0
>
> This would still be racing against any potential speculative execution of an AT
> instruction upon the switch to EL3, IIUC. The window would be much smaller
> but not entirely eliminated.
>
> For KVM, this would be enough as KVM will have already applied this
> workaround (with Marc's corrections) whenever it is going to enter an
> inconsistent state. However, other EL2 software may choose to handle the
> errata differently, possibly going to the lengths of ensuring that no AT
> instruction is ever mapped executable.
> --
> TF-A mailing list
> TF-A(a)lists.trustedfirmware.org
> https://lists.trustedfirmware.org/mailman/listinfo/tf-a
This is interesting. It appears that there is no way on entry to EL3 to
guarantee that the out-of-context(el2 and el1) translation regimes are
in a consistent state and on every entry into EL3, we have to
conservatively assume that it is in an inconsistent state. This is
because of the situation Andrew mentioned(interrupts to EL3 can occur at
any time).
If this is the case, on EL3 entry:
1) For EL1, we will need to save SCTLR_EL1, set SCTLR_EL1.M = 1,.EPDx = 0
2) Set whatever bits we need to for EL2 and S2 translations to not
succeed(What are these?)
3) DSB, to ensure no speculative AT can be issued until completion of
DSB, so any AT that occurs will not fill the TLB with bad translations.
On exit(right before ERET), we need to restore the registers saved on
entry, and have the ERET followed by a DSB so that there can be no
speculative execution of AT instructions.
Thanks
Raghu
On 7/1/20 5:54 AM, Marc Zyngier via TF-A wrote:
> Hi Manish,
>
> On Wed, 1 Jul 2020 at 13:14, Manish Badarkhe <Manish.Badarkhe(a)arm.com> wrote:
>> Hi Andrew,
>>
>> As per current implementation, in “el1_sysregs_context_restore” routine do below things:
>>
>> 1. TCR_EL1.EPD0 = 1
>> 2. TCR_EL1.EPD1 = 1
>> 3. SCTR_EL1.M = 0
>> 4. Isb
>> Code snippet:
>> mrs x9, tcr_el1
>> orr x9, x9, #TCR_EPD0_BIT
>> orr x9, x9, #TCR_EPD1_BIT
>> msr tcr_el1, x9
>> mrs x9, sctlr_el1
>> bic x9, x9, #SCTLR_M_BIT
>> msr sctlr_el1, x9
>> isb
>> This is to avoid PTW through while updating system registers at step 5
> Unfortunately, this doesn't prevent anything.
>
> If SCTLR_EL1.M is clear, TCR_EL1.EPDx don't mean much (S1 MMU is
> disabled, no S1 page table walk), and you can still have S2 PTWs
> (using an idmap for S1) and creating TLB corruption if these entry
> alias with any S1 mapping that exists at EL1.
>
> Which is why KVM does *set* SCTLR_EL1.M, which prevents the use of a
> 1:1 mapping at S1, and at which point the TCR_EL1.EPDx bits are
> actually useful in preventing a PTW.
>
>> 5. Restore all system registers for El1 except SCTLR_EL1 and TCR_EL1
>> 6. isb()
>> 7. restore SCTLR_EL1 and TCR_EL1
>> Code Snippet:
>> ldr x9, [x0, #CTX_SCTLR_EL1] -> saved value from "el2_sysregs_context_save"
>> msr sctlr_el1, x9
>> ldr x9, [x0, #CTX_TCR_EL1]
>> msr tcr_el1, x9
>>
>> As per above steps. SCTLR_EL1 get restored back with actual settings at step 7.
>> Similar flow is present for “el2_sysregs_context_restore” to restore SCTLR_EL1 register.
>>
>> In conclusion, this routine temporarily clear M bit of SCTLR_EL1 to avoid speculation but restored it back
>> to its original setting while leaving back to its caller. Please let us know whether this align with KVM
>> workaround for speculative AT erratum.
> It doesn't, unfortunately. I believe this code actively creates
> problems on a system that is affected by speculative AT execution.
>
> I don't understand your rationale for touching SCTLR_EL2.M either if
> you are not context-switching the EL2 S1 state: as far as I understand
> no affected cores have S-EL2, so no switch should happen at this
> stage.
>
> Thanks,
>
> M.
Hi Andrew,
As per current implementation, in “el1_sysregs_context_restore” routine do below things:
1. TCR_EL1.EPD0 = 1
2. TCR_EL1.EPD1 = 1
3. SCTR_EL1.M = 0
4. Isb
Code snippet:
mrs x9, tcr_el1
orr x9, x9, #TCR_EPD0_BIT
orr x9, x9, #TCR_EPD1_BIT
msr tcr_el1, x9
mrs x9, sctlr_el1
bic x9, x9, #SCTLR_M_BIT
msr sctlr_el1, x9
isb
This is to avoid PTW through while updating system registers at step 5
5. Restore all system registers for El1 except SCTLR_EL1 and TCR_EL1
6. isb()
7. restore SCTLR_EL1 and TCR_EL1
Code Snippet:
ldr x9, [x0, #CTX_SCTLR_EL1] -> saved value from "el2_sysregs_context_save"
msr sctlr_el1, x9
ldr x9, [x0, #CTX_TCR_EL1]
msr tcr_el1, x9
As per above steps. SCTLR_EL1 get restored back with actual settings at step 7.
Similar flow is present for “el2_sysregs_context_restore” to restore SCTLR_EL1 register.
In conclusion, this routine temporarily clear M bit of SCTLR_EL1 to avoid speculation but restored it back
to its original setting while leaving back to its caller. Please let us know whether this align with KVM
workaround for speculative AT erratum.
Thanks
Manish Badarkhe
On 01/07/2020, 17:02, "TF-A on behalf of Joanna Farley via TF-A" <tf-a-bounces(a)lists.trustedfirmware.org on behalf of tf-a(a)lists.trustedfirmware.org> wrote:
Thanks Andrew for the notification. We will look into it and get back to this thread.
Thanks
Joanna
On 01/07/2020, 10:16, "TF-A on behalf of Andrew Scull via TF-A" <tf-a-bounces(a)lists.trustedfirmware.org on behalf of tf-a(a)lists.trustedfirmware.org> wrote:
Whilst looking at the speculative AT workaround in KVM, I compared it
against the workaround in TF-A and noticed an inconsistency whereby TF-A
**breaks** KVM's workaround.
In `el1_sysregs_context_restore`, the M bit of SCTRL_EL1 is cleared
however Linux requires this to be set for its workaround to be correct.
If an exception is taken to EL3 partway through a VM context switch,
e.g. a secure interrupt, causing a switch to the secure world, TF-A will
reintroduce the possibility of TLB corruption.
The above explains how it is broken for Linux's chosen workaround
however TF-A will also have to be compatible with whatever workaround
the EL2 software is using.
Starting this thread with the issue identified and we can add more
details as needed.
Thanks
Andrew
--
TF-A mailing list
TF-A(a)lists.trustedfirmware.org
https://lists.trustedfirmware.org/mailman/listinfo/tf-a
--
TF-A mailing list
TF-A(a)lists.trustedfirmware.org
https://lists.trustedfirmware.org/mailman/listinfo/tf-a
+Updated comment for SCTLR_EL2 register restoration.
On 01/07/2020, 17:45, "TF-A on behalf of Manish Badarkhe via TF-A" <tf-a-bounces(a)lists.trustedfirmware.org on behalf of tf-a(a)lists.trustedfirmware.org> wrote:
Hi Andrew,
As per current implementation, in “el1_sysregs_context_restore” routine do below things:
1. TCR_EL1.EPD0 = 1
2. TCR_EL1.EPD1 = 1
3. SCTR_EL1.M = 0
4. Isb
Code snippet:
mrs x9, tcr_el1
orr x9, x9, #TCR_EPD0_BIT
orr x9, x9, #TCR_EPD1_BIT
msr tcr_el1, x9
mrs x9, sctlr_el1
bic x9, x9, #SCTLR_M_BIT
msr sctlr_el1, x9
isb
This is to avoid PTW through while updating system registers at step 5
5. Restore all system registers for El1 except SCTLR_EL1 and TCR_EL1
6. isb()
7. restore SCTLR_EL1 and TCR_EL1
Code Snippet:
ldr x9, [x0, #CTX_SCTLR_EL1] -> saved value from "el2_sysregs_context_save"
msr sctlr_el1, x9
ldr x9, [x0, #CTX_TCR_EL1]
msr tcr_el1, x9
As per above steps. SCTLR_EL1 get restored back with actual settings at step 7.
Similar flow is present for “el2_sysregs_context_restore” to restore SCTLR_EL2 register.
In conclusion, this routine temporarily clear M bit of SCTLR_EL1 to avoid speculation but restored it back
to its original setting while leaving back to its caller. Please let us know whether this align with KVM
workaround for speculative AT erratum.
Thanks
Manish Badarkhe
On 01/07/2020, 17:02, "TF-A on behalf of Joanna Farley via TF-A" <tf-a-bounces(a)lists.trustedfirmware.org on behalf of tf-a(a)lists.trustedfirmware.org> wrote:
Thanks Andrew for the notification. We will look into it and get back to this thread.
Thanks
Joanna
On 01/07/2020, 10:16, "TF-A on behalf of Andrew Scull via TF-A" <tf-a-bounces(a)lists.trustedfirmware.org on behalf of tf-a(a)lists.trustedfirmware.org> wrote:
Whilst looking at the speculative AT workaround in KVM, I compared it
against the workaround in TF-A and noticed an inconsistency whereby TF-A
**breaks** KVM's workaround.
In `el1_sysregs_context_restore`, the M bit of SCTRL_EL1 is cleared
however Linux requires this to be set for its workaround to be correct.
If an exception is taken to EL3 partway through a VM context switch,
e.g. a secure interrupt, causing a switch to the secure world, TF-A will
reintroduce the possibility of TLB corruption.
The above explains how it is broken for Linux's chosen workaround
however TF-A will also have to be compatible with whatever workaround
the EL2 software is using.
Starting this thread with the issue identified and we can add more
details as needed.
Thanks
Andrew
--
TF-A mailing list
TF-A(a)lists.trustedfirmware.org
https://lists.trustedfirmware.org/mailman/listinfo/tf-a
--
TF-A mailing list
TF-A(a)lists.trustedfirmware.org
https://lists.trustedfirmware.org/mailman/listinfo/tf-a
--
TF-A mailing list
TF-A(a)lists.trustedfirmware.org
https://lists.trustedfirmware.org/mailman/listinfo/tf-a
Thanks Andrew for the notification. We will look into it and get back to this thread.
Thanks
Joanna
On 01/07/2020, 10:16, "TF-A on behalf of Andrew Scull via TF-A" <tf-a-bounces(a)lists.trustedfirmware.org on behalf of tf-a(a)lists.trustedfirmware.org> wrote:
Whilst looking at the speculative AT workaround in KVM, I compared it
against the workaround in TF-A and noticed an inconsistency whereby TF-A
**breaks** KVM's workaround.
In `el1_sysregs_context_restore`, the M bit of SCTRL_EL1 is cleared
however Linux requires this to be set for its workaround to be correct.
If an exception is taken to EL3 partway through a VM context switch,
e.g. a secure interrupt, causing a switch to the secure world, TF-A will
reintroduce the possibility of TLB corruption.
The above explains how it is broken for Linux's chosen workaround
however TF-A will also have to be compatible with whatever workaround
the EL2 software is using.
Starting this thread with the issue identified and we can add more
details as needed.
Thanks
Andrew
--
TF-A mailing list
TF-A(a)lists.trustedfirmware.org
https://lists.trustedfirmware.org/mailman/listinfo/tf-a
Whilst looking at the speculative AT workaround in KVM, I compared it
against the workaround in TF-A and noticed an inconsistency whereby TF-A
**breaks** KVM's workaround.
In `el1_sysregs_context_restore`, the M bit of SCTRL_EL1 is cleared
however Linux requires this to be set for its workaround to be correct.
If an exception is taken to EL3 partway through a VM context switch,
e.g. a secure interrupt, causing a switch to the secure world, TF-A will
reintroduce the possibility of TLB corruption.
The above explains how it is broken for Linux's chosen workaround
however TF-A will also have to be compatible with whatever workaround
the EL2 software is using.
Starting this thread with the issue identified and we can add more
details as needed.
Thanks
Andrew
Hi Sandrine,
Sounds like good changes in general! I'm curious what the ACLs are for
Code-Owner-Review? Is it tied to docs/about/maintainers.rst or just
based on the honor system? (I notice that I seem to be able to give a
+1 for code I'm not owning, but maybe that's because I am a
maintainer?) Also, are code owners allowed to +1 themselves (I think
we said we didn't want maintainers to do that, but for code owners I
could see how we might want to allow it since there are usually not
that many)? What do we do when someone uploads the first patch for a
new platform, do they just COR+1 themselves (since there is no code
owner yet)?
I think it might still be useful to retain the existing Code-Review as
a +1/-1 label next to the two new ones, just to allow other interested
parties to record their opinion (without it having any enforcing
effect on the merge). In other Gerrit instances I have used people
often like to give CR+1 as a "I'm not the one who needs to approve
this but I have looked at it and think it's fine" or a CR-1 as a "I
can't stop you from doing this but I still want to be clear that I
don't think it's a good idea". It allows people outside the official
approval process a clearer way to participate and can aid the official
approvers in their decisions (e.g. when I am reviewing a patch as a
maintainer that already has a CR-1 from someone else I know to pay
extra attention to their concerns, and it's more visible than just
some random comment further up in the list). What do you think?
Best regards,
Julius