I measured the psa_call() roundtrip time with NO NS Gate in the psa_call() path, zero invecs and zero outvecs, using the GCC compiler with -O3 optimization level, and one secure partition. Our TF-M code base is a snapshot of the TF-M repo from January 18, 2020.
The psa_call() handler code in the SP is:
switch (msg.type) { case PSA_IPC_CONNECT: if (inuse) { r = PSA_ERROR_CONNECTION_REFUSED; } else { inuse = 1; myClientId = msg.client_id; r = PSA_SUCCESS; } psa_reply(msg.handle, r); break; case PSA_IPC_DISCONNECT: assert (inuse == 1); inuse = 0; myClientId = 0; psa_reply(msg.handle, PSA_SUCCESS); break; case MY_SP_MSG_TYPE_NULL: /* used to measure round trip time */ *secureTSP = DWT->CYCCNT; psa_reply(msg.handle, PSA_SUCCESS); break; } return;
On a platform with zero wait states, the time from NS psa_call() to the SP msg handler's "case MY_SP_MSG_TYPE_NULL:" statement is 2280 CPU cycles and the return path to the NS caller is 1088 CPU cycles.
The psa_call() roundtrip time is 3368 CPU cycles.
Alan
From: Reinhard Keil [mailto:Reinhard.Keil@arm.com] Sent: Wednesday, April 8, 2020 11:51 PM To: Shreve, Erik; DeMars, Alan Cc: tf-m@lists.trustedfirmware.org; nd Subject: [EXTERNAL] RE: [TF-M] Multi-threaded single-scheduler model proposal
Erik,
I believe we should measure timing behaviour and document it first before come to conclusions.
Personally I have not reviewed the IPC mode. I was just reviewing the SFN (aka Library mode) and measured the current implementation (actually it was a RC3+patches version). My result and assessment is here https://lists.trustedfirmware.org/pipermail/tf-m/2020-March/000805.html - it has as expected for v1 "room for improvement".
Did you do similar tests with the IPC model already?
Do you have timing measurements of the HW crypto accelerator operations? From the STM32L5 data sheet we are getting 410 cycles as the maximum time of an AES 256-byte key decrypt operation (most operations seems to take less than 100 cycles).
The fact that crypto is time consuming is not new for system designers that use a single core processor. As said the solution in today's applications is: * Crypto is in "Normal" priority * Time critical execution is "High" priority - this threads can preempt execution of "Normal" priority threads.
Have a happy Easter time and stay healty!
Reinhard
From: Shreve, Erik e-shreve@ti.com Sent: Tuesday, April 7, 2020 3:24 PM To: Reinhard Keil Reinhard.Keil@arm.com; DeMars, Alan ademars@ti.com Cc: tf-m@lists.trustedfirmware.org; nd nd@arm.com Subject: RE: [TF-M] Multi-threaded single-scheduler model proposal
Reinhard,
I'm happy to engage in discussion over email. Didn't mean to send any other impression. Also, appreciate your feedback on the proposal.
Elegance is in the eye of the beholder, but that said I think that the IPC model does have an elegance to it. However, its elegance imposes limitations that complicate the entirety of _some_ systems - leading to less overall elegance in _those_ systems. The IPC model isn't bad or inappropriate, it's a great solution. It just doesn't cover everything (what could?) Further, if a middle ground is needed but there exists no 'elegant' solution then a solution a bit less than elegant may be required. Also, no debate that the IPC model is not much different from when users are running mbedTLS only in a dedicated thread, but that is not how we see crypto used in our ecosystem.
There are many industrial and automotive use cases where determinism is required. Further, both of these markets are seeing an uptick in security interest and even regulation pressure. And these markets don't like change. Change implies opening products back up to costly verification and validation activities and risk. I can't share particulars on this email list, but I can say that any system supporting multiple concurrent connections wherein different connections have different priority and at least some of those have deterministic response requirements will have difficulty adopting the IPC model. (Of course there are other ways to solve this such as increased clock rate or duplicated HW accelerators, but these impact die cost and can negatively impact power performance.) Keep in mind that not all connections are TLS connections over the internet where a few milliseconds would be noise.
But based on your statement "Maybe there are other solutions that extending the RTOS kernel," I'm wondering if I've not communicated the idea well enough. The RTOS kernels would only be extended by the calling of tz_context APIs during task switching. This is something that is already occurring in the IPC model. (You mention that using the tz_context APIs is "tricky," can you elaborate more on this?) There is then a TF-M _to_ RTOS layer allowing secured code to signal semaphore and mutex usage to the RTOS. The RTOS kernel itself has no changes for this layer. Thus, I see the impact to the RTOS kernels as minimal. And since the RTOS retains the same control over when tasks run, there is minimal (if any) impact to application code.
I think one of your concerns is adoption of TF-M and that it may be negatively impacted if difficult integration with each RTOS is required. Yes? If so, I am sharing your concern about adoption, but I think the bigger hindrance is not the RTOS integration but application level integration. After all there are far fewer RTOSes than applications in the world. Likely our different experiences are leading us to different conclusions here despite a shared concern. But providing a few well selected options can maximize adoption for effort. At the end of the Tech Forum meeting someone suggested a model like this could replace (or upgrade) the Library model.
Regarding HW run times, the specifics really are in cycle counts if we want to compare them with the cost of task switching. Many ECC (and certainly RSA/DSA/DH) math operations take much longer (millions of cycles) than the time to switch a task.
Finally, regarding minimal viable product (MVP), I understand the purpose of an MVP to either be a platform to gain feedback toward a product launch or a minimum product that is useful to early adopters. Either way, it seems that with the release of TF-M 1.0 this has been achieved. So then the next steps are to incorporate feedback and grow the market of the product. Growing the market use of TF-M is what I seek to do with the proposal.
Again, appreciate the discussion.
Erik Shreve, PSEM Software Security Engineer & Architect (CMCU Platform Development)
From: Reinhard Keil [mailto:Reinhard.Keil@arm.com] Sent: Friday, April 03, 2020 7:26 AM To: Shreve, Erik; DeMars, Alan Cc: tf-m@lists.trustedfirmware.orgmailto:tf-m@lists.trustedfirmware.org; nd Subject: [EXTERNAL] RE: [TF-M] Multi-threaded single-scheduler model proposal
Erik, Alan,
Sorry I had not enough time to participate the whole meeting yesterday and I therefore kicked-off some discussion before. It's really great to see your engagement here.
Let me summarize:
TF-M should work with many different RTOSes as the various CSPs currently have preferences (Azure=ThreadX, AWS=FreeRTOS, etc.). To make it easy to work with this diverse eco-system we should aim for simplicity of TF-M/RTOS interaction. Also tz_context slows down overall thread switching and is tricky to use.
I agree with you that we will need a middle ground. You say: "No impact to time deterministic execution on the NS side unless two threads call secure services" is the issue. However I cannot see today an elegant solution.
My question is: what use-cases do you see where two different threads need to call secure services. Is in such use-cases timing a critical factor?
Alan raised "In a real world use case we are faced with, the process kicked off by the secure service may take a long time (ie several ms)". This is correct, but makes it really sense to schedule CPU execution. HW accelerator math operations take some ~1usec; thread scheduling is not economic. IMHO: It is not different from today's implementation where i.e. mbedTLS is running in a thread.
For time critical applications you would solve that problem with thread priorities where: * Crypto is in "Normal" priority * Time critical execution is "High" priority
I believe we should focus first on a minimum viable product and then analyze the real-world problems that come with it. Maybe there are other solutions that extending the RTOS kernel.
Have a good weekend. Reinhard
From: Shreve, Erik <e-shreve@ti.commailto:e-shreve@ti.com> Sent: Thursday, April 2, 2020 4:55 PM To: DeMars, Alan <ademars@ti.commailto:ademars@ti.com>; Reinhard Keil <Reinhard.Keil@arm.commailto:Reinhard.Keil@arm.com>; tf-m@lists.trustedfirmware.orgmailto:tf-m@lists.trustedfirmware.org Cc: nd <nd@arm.commailto:nd@arm.com> Subject: RE: [TF-M] Multi-threaded single-scheduler model proposal
I see the discussion of today's tech forum agenda has begun ahead of time. :)
Individual public key operations can take ms even when accelerated with HW. Further, the HW accelerators operate as math coprocessors with a series of math operations stitched together by SW.
No doubt existing models in TF-M have the benefit of simplicity for the secure code analysis. However, their simplicity complicates the scheduling of the non-trusted code. The qualifier in this statement "No impact to time deterministic execution on the NS side unless two threads call secure services" is the issue. I believe we need a middle ground to drive additional adoption.
Erik Shreve, PSEM Software Security Engineer & Architect (CMCU Platform Development)
From: TF-M [mailto:tf-m-bounces@lists.trustedfirmware.org] On Behalf Of DeMars, Alan via TF-M Sent: Thursday, April 02, 2020 9:47 AM To: Reinhard Keil; tf-m@lists.trustedfirmware.orgmailto:tf-m@lists.trustedfirmware.org Cc: nd Subject: [EXTERNAL] Re: [TF-M] Multi-threaded single-scheduler model proposal
I used crypto accelerator as a hypothetical example. In a real world use case we are faced with, the process kicked off by the secure service may take a long time (ie several ms). It is not acceptable to be parked in wfe during that time.
Alan
From: Reinhard Keil [mailto:Reinhard.Keil@arm.com] Sent: Thursday, April 2, 2020 6:52 AM To: DeMars, Alan; tf-m@lists.trustedfirmware.orgmailto:tf-m@lists.trustedfirmware.org Cc: nd Subject: [EXTERNAL] RE: [TF-M] Multi-threaded single-scheduler model proposal
Alan,
"I was afraid that this was the proposal. No lower priority NS threads can run while waiting for the secure interrupt. Only higher priority threads that are initiated by a NS interrupt can run."
You are correct, scheduling of lower priority NS threads would be not possible. This is definitely a shortcoming of the solution.
May I ask: how long does a hardware crypto operation take? What time could be used for low priority NS thread execution?
Reinhard