On Tue, Dec 3, 2019 at 4:58 PM Sandeep Tripathy via TF-A
<tf-a(a)lists.trustedfirmware.org> wrote:
>
> Hi,
> system shutdown or system reset PSCI calls are invoked by the last
> core in the system. PSCI lib does not do cache flush for the last core
> /cluster and does not follow the core / cluster power down sequence.
> This may cause issue in a system if the system is not standalone one
> ie if the system is a slave or node in a bigger system with other
> coherent masters/PEs.
>
I am not sure if system off/reset is the right API to call in such a setup.
> Please suggest if the PSCI spec expected 'shutdown/reset/reset2' to
> deliberately skip the core/cluster shutdown sequence.
>
Yes and IIRC this has been discussed in the past[1]. I expect to get some
closure on open question on that thread. Quite a few questions were raised
by few people and I am not sure if all were answered. I need to dig but but
AFAIK it wasn't all answered.
--
Regards,
Sudeep
[1] https://lkml.org/lkml/2019/1/18/16
Hi,
In ATF-A, I usually see below code in psci suspend or off code path:
/* Prevent interrupts from spuriously waking up this cpu */
plat_arm_gic_cpuif_disable();
But per my understanding, before calling psci_suspend(), the NW, e.g linux kernel
has disabled all interrupts from cpu level, so here preventing interrupt is
to prevent the interrupts from secure world?
Another question is: for Cortex A55, this is not necessary. Because
CA55 TRM says when the core_pwrdn_en bit is set, executing WFI automatically
masks all interrupts and wake-up events in the core. Am I right?
Thanks in advance
Hi Soby,
Thanks for your response.
>>it is needed to ensure the ordering of the succeeding sev().
Agree. Thanks for the clarification.
>>Was this an issue that actually manifested on a hardware or is this
something that you caught while reviewing the code?
Noticed it while reviewing code and I have not observed it on hardware.
Thanks
-Raghu
On November 26, 2019 at 8:55 AM, Soby Mathew <Soby.Mathew(a)arm.com> wrote:
On 26/11/2019 16:30, Raghupathy Krishnamurthy via TF-A wrote:
Hello!
Reposting this from (https://developer.trustedfirmware.org/T589).
bakery_lock_get() uses a dmbld() after lock acquisition which is insufficient in a lock acquire situation. With just dmbld(), stores in a critical section can be reordered before the dmbld() and indeed before the lock acquisition has taken place. similarly, bakery_lock_release() only uses dmbst(). A load in the critical section could be reordered after the dmbst() and write to the lock data structure releasing the lock. This is likely less of a problem but lock release needs to provide release semantics, and dmbst() is insufficient. For ex: A load in the critical section of CPU1 can be reordered after the store for the lock release, and it could read from a store that is executed on CPU2 in the same critical section, since CPU2 saw the store for the lock release first, and raced into the critical section.
Hi Raghu,
You are right on this. The dmbld() and dmbst() does not provide
sufficient guarantees in the cases you mention.
Was this an issue that actually manifested on a hardware or is this
something that you caught while reviewing the code ?
Also the dsb() after the write to the lock seems unnecessary. Am I missing something here ? It looks like the same issue is present even in bakery_lock_normal.
If you are referring to the dsb() at this line :
https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/tree/lib/locks/…
it is needed to ensure the ordering of the succeeding sev().
Best Regards
Soby Mathew
Thanks
Raghu
Hi Sandeep
> -----Original Message-----
> From: TF-A <tf-a-bounces(a)lists.trustedfirmware.org> On Behalf Of Sandeep
> Tripathy via TF-A
> Sent: 03 December 2019 16:59
>
> Hi,
> system shutdown or system reset PSCI calls are invoked by the last core in
> the system.
That may be the case in practice but these functions can be called by any core in the system at any time, not just the last core.
> PSCI lib does not do cache flush for the last core /cluster and
> does not follow the core / cluster power down sequence.
Correct. Not only that, it doesn't do anything with the other cores in the system that may be running at that time.
> This may cause issue in a system if the system is not standalone one ie if
> the system is a slave or node in a bigger system with other coherent
> masters/PEs.
>
What issue specifically? Are the other coherent masters expecting to have the same view of the node's memory that the node had prior to calling system off/suspend? If so, then the node is not really independent to the wider system and system off/reset are probably not the right calls to use. Perhaps SYSTEM_SUSPEND is what you're looking for? That function requires all other cores to be off before the call and will perform cache maintenance on the last core.
> Please suggest if the PSCI spec expected 'shutdown/reset/reset2' to
> deliberately skip the core/cluster shutdown sequence.
>
Yes, the spec was deliberately relaxed between version 0.2 and 1.0 to allow implementations to just "pull the plug" since there would be no observable difference to the normal world caller between this behaviour and attempting to shutdown cleanly, which would be difficult and inherently racy anyway.
> Otherwise in case it makes sense following is a patch to take care of this.
> https://review.trustedfirmware.org/#/c/TF-A/trusted-firmware-a/+/2364/
> issue: https://developer.trustedfirmware.org/T566
Thanks but I don't think we want this.
Regards
Dan.
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Hi,
system shutdown or system reset PSCI calls are invoked by the last
core in the system. PSCI lib does not do cache flush for the last core
/cluster and does not follow the core / cluster power down sequence.
This may cause issue in a system if the system is not standalone one
ie if the system is a slave or node in a bigger system with other
coherent masters/PEs.
Please suggest if the PSCI spec expected 'shutdown/reset/reset2' to
deliberately skip the core/cluster shutdown sequence.
Otherwise in case it makes sense following is a patch to take care of this.
https://review.trustedfirmware.org/#/c/TF-A/trusted-firmware-a/+/2364/
issue: https://developer.trustedfirmware.org/T566
Thanks
Sandeep
Hello Sumit,
Sorry for not getting back to you earlier on this. I started looking at
your patches and although I've not completely got my head around them
yet, I've got some early comments and questions. I've tried to classify
them by themes below.
First of all, let me say that I'm no security expert and I did not know
about the concept of authenticated encryption/decryption before looking
at your patches. I spent a couple of hours reading about such algorithms
in general and AES-GCM in particular. Most of what I've learned so far
is based on my understanding of RFC 5116 [1].
What's the motivation for choosing GCM to start with? I've seen that it
is free of patent [2], which I am guessing was a strong argument for it.
I've also read that it is supposed to be quite lightweight and can take
full advantage of parallel processing, although I've not looked into the
details. Were these the reasons? Any other reasons?
Key management
--------------
fip_file_read() retrieves the key from the platform and stores it in a
buffer on the stack. I don't see any code wiping it out of memory once
we're done with it. Did I miss it? Unlike the root of trust public key,
this is a (symmetric) secret key so it is sensitive data that we must
not leave for grabs, even if the stack is in Trusted RAM and that it's
likely to be overwritten by subsequent stack usage.
Also, I am still trying to get my head around how this would integrate
with a cryptographic engine where the key does not leave the chip. I can
imagine that we could get the address of the encrypted firmware image
from the FIP, pass that to a cryptographic engine, request it to decrypt
it and store the result somewhere in Trusted RAM. In this case, we
wouldn't call plat_get_fip_encryption_key(). Do you have any idea how we
would pull this off? Like how the different modules (IO layer, crypto
module, image parser module, ...) would integrate together?
I have some concerns around the generation of the initialization vectors
in the encrypt_fw tool. Right now, IVs are simply a random sequence of
bytes (obtained through a call to OpenSSL's RAND_bytes() API). Now, I
would imagine that RAND_bytes() is typically based on a good random
number generator and thus will generate different sequences every time
it is called. At least, as long as it is called from the same machine
every time. But what if we encrypt a new FIP bundle from a different
machine, say in the context of a firmware update? Is it not possible
that it might choose the same IV out of bad luck?
Perhaps that's an issue left to provisioning/manufacturing time and is
out of the scope here. But it worries me because AFAIU, the security of
AES-GCM is critically undermined if the same nonce is used multiple
times with the same key (see section 5.1.1. "Nonce reuse" in RFC 5116).
If the encryption key is the SSK (rather than the BSSK) then I guess the
probability is even higher, as it is shared amongst a class of devices.
Impact on memory footprint and performance
------------------------------------------
Do you know what the performance impact is when this feature is enabled
in TF-A, to decrypt images at boot time? Obviously it depends on the
platform and whether there is a dedicated cryptographic engine, and I
suppose you cannot really get any relevant measurements out of QEMU but
I would be interested if you've got any rough numbers.
And what's the memory footprint impact? IIUC, AES-GCM almost does not
inflate the size of the data it encrypts. The size of the ciphertext
seems to be the same as the plaintext + the size of the authentication
tag. So I guess there's no real impact on flash storage and Trusted RAM
usage to hold decrypted firmware. But what about the mbedTLS primitives
to decrypt the images? How much code and data does this add?
encrypt_fw tool
---------------
We have some floating ideas around re-implementing the tools (fiptool,
certtool) in a scripting language (possibly python) in the future and
also doing a better job at sharing a common description of the list of
images to boot/authenticate between the firmware and the host tools. But
we're not there yet, so I agree that implementing this new tool in C
from the same "mold" as fiptool and certtool is what makes the most
sense today. It's just another tool we will have to rework if and when
we get there.
I did not understand why this new tool needs to know what image it is
encrypting. For example, one possible invocation could be:
tools/encrypt_fw/encrypt_fw \
-k 1234567890abcdef1234567890abcdef1234567890abcdef1234567890abcdef \
--soc-fw bl31.bin \
--soc-fw-enc bl31_enc.bin \
--tos-fw bl32.bin \
--tos-fw-enc bl32_enc.bin
Why not invoking the tool once per image instead? As in:
encrypt_fw -k key -in ifile -out ofile
for BL31, then for BL32? Does the tool do anything different based on
the type of image it receives?
Regards,
Sandrine
[1] https://tools.ietf.org/html/rfc5116
[2]
https://csrc.nist.gov/csrc/media/projects/block-cipher-techniques/documents…
Hi All,
The buildsystem of TF-A became complex and loaded with technical debt during the years, and it's time to do something about this.
We made some plans and prototyping work to move to a CMake based solution and we would like to get feedback on the idea.
Why CMake?
In summary CMake is a mature tool having a wide acceptance in C and C++ projects.
Also it has benefits of decreasing fragmentation in the developer community if we sync up with TF-M.
How will it happen?
This will be a slow process where the old build system will co-exist for a period with the new one. How long that period will be is an open question.
For a more detailed summary please see https://developer.trustedfirmware.org/w/tf_a/cmake-buildsystem-proposal/
The design discussion will follow the design review proposal process of TF.org, as described on this page:
https://ci.trustedfirmware.org/job/tf-m-build-test-nightly/lastSuccessfulBu…
You can find the patch for capturing the design decisions and discussion here: https://review.trustedfirmware.org/#/c/TF-A/trusted-firmware-a/+/2662/
If you would like to contribute or have an opinion or any ideas please reply to this email or add a comment on Gerrit (link above).
Regards,
Balint
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
On 26/11/2019 16:30, Raghupathy Krishnamurthy via TF-A wrote:
> Hello!
>
> Reposting this from (https://developer.trustedfirmware.org/T589).
>
> bakery_lock_get() uses a dmbld() after lock acquisition which is insufficient in a lock acquire situation. With just dmbld(), stores in a critical section can be reordered before the dmbld() and indeed before the lock acquisition has taken place. similarly, bakery_lock_release() only uses dmbst(). A load in the critical section could be reordered after the dmbst() and write to the lock data structure releasing the lock. This is likely less of a problem but lock release needs to provide release semantics, and dmbst() is insufficient. For ex: A load in the critical section of CPU1 can be reordered after the store for the lock release, and it could read from a store that is executed on CPU2 in the same critical section, since CPU2 saw the store for the lock release first, and raced into the critical section.
Hi Raghu,
You are right on this. The dmbld() and dmbst() does not provide
sufficient guarantees in the cases you mention.
Was this an issue that actually manifested on a hardware or is this
something that you caught while reviewing the code ?
> Also the dsb() after the write to the lock seems unnecessary. Am I missing something here ? It looks like the same issue is present even in bakery_lock_normal.
>
If you are referring to the dsb() at this line :
https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/tree/lib/locks/…
it is needed to ensure the ordering of the succeeding sev().
Best Regards
Soby Mathew
> Thanks
> Raghu
>
Hello!
Reposting this from (https://developer.trustedfirmware.org/T589).
bakery_lock_get() uses a dmbld() after lock acquisition which is insufficient in a lock acquire situation. With just dmbld(), stores in a critical section can be reordered before the dmbld() and indeed before the lock acquisition has taken place. similarly, bakery_lock_release() only uses dmbst(). A load in the critical section could be reordered after the dmbst() and write to the lock data structure releasing the lock. This is likely less of a problem but lock release needs to provide release semantics, and dmbst() is insufficient. For ex: A load in the critical section of CPU1 can be reordered after the store for the lock release, and it could read from a store that is executed on CPU2 in the same critical section, since CPU2 saw the store for the lock release first, and raced into the critical section. Also the dsb() after the write to the lock seems unnecessary. Am I missing something here ? It looks like the same issue is present even in bakery_lock_normal.
Thanks
Raghu