Re: [TF-A] psci shutdown do not follow graceful power off sequence

10 Dec 2019


      Hi Sudeep,
On Mon, Dec 9, 2019 at 10:40 PM Sudeep Holla sudeep.holla@arm.com wrote:
...
On Mon, Dec 09, 2019 at 10:14:25PM +0530, Sandeep Tripathy wrote:
...
Hi Sudeep,
On Mon, Dec 9, 2019 at 9:07 PM Sudeep Holla sudeep.holla@arm.com wrote:
...
On Mon, Dec 09, 2019 at 08:36:18PM +0530, Sandeep Tripathy wrote:
...
Hi Sudeep,
I am very specific about the core caches. The app/driver or many
entity for that matter are updating cacheable coherent memory range(s).
Are we referring OS apps/drivers or something on the other masters ?
Please be as specific as possible.
Linux application and drivers running on arm cluster and sharing DDR over
CCN with other masters. Not sure if the specifics will make it any clearer.
for ex: Broadcom smartNIC or any accelerator on PCIe. I think the details will
only distract the discussion to platform specifics where as the issue is not.
Sure. Now I feel that it's nothing to do with external master, just Linux
OS(slave) and it's logging application. Please shout if that's not the case.
The producer is OS while the consumer is external master/slave.
Yes in this example.
...
...
Assume one application is logging some data to the said coherent cacheable
ddr. on reboot /shutdown notification it will stop logging.
The application has to terminate cleanly when SIGTERM is sent(may be using
appropriate handler. And can intimate the same to the consumers so that they
can consume the data before it's lost.
The DDR is not powered off ever in this scenario. So when to/how to
consume the log is up
to the (consumer) application design. Assume its an incrementing log
ie. after reboot this (producer)master
again will continue to dump more records on to it. How would you
suggest to handle this.
In this case both producer and consumer deliberately asked for
coherent memory so
why it should also consider a possible data loss due to platforms not
giving the coherency
because it will add some time to flush the core caches.
If they get non-cached(coherent) memory range they don't have to do
anything isn't it ?
...
That's exactly why I kept mentioning
notification.It need not be generic shutdown, but can be from logging
producer to the logging consumers.
...
If cpu cache is not flushed log is lost. The cache can be small but the
buffer can be huge for a range flush.
If both producers and consumers are aware of logging being stopped due
to shutdown or reboot, we must not be worried about caches here.
...
...
...
On shutdown/reboot notification what they ought to do ?
If on the same slave OS, then stop all on-going transactions. There are
hooks to do that. Simply, it can map to be remove device calls. And the
other masters also have notification to do the same.
Done. stopped all transactions. but data lies in local caches.
...
...
If we ask them to do respective range flushes (cache mnt of a coherent
memory) that is less generic compared to if the core infrastructure gives
the coherency guarantee
Sure, but I need clarity on above to answer this.
...
(1-OS: especially more than just halting for secondary(other) cores or
may be cpuhp secondary(other) cores etc,  2- tf-a psci shutdown/reset/reset2
to do graceful pwrdown to take care of the initiating core). May be this can
be under a flag to choose between faster reboot vs graceful power down
sequence if at all it qualifies as generic ?
I think we had enough discussions so far to tell this is not a generic
requirement.
Thanks for the quick responses. looks like finally we have to agree to disagree.
Yes definitely disagree with the attempted approach both in OS and TF-A to
solve the issue as you have describe(rather how I have understood)
...
...
[...]
...
...
And what's done in those masters with *this particular* notification ?
Anything but not cache maintenance preferably.
And what does that *anything* include in your case. Please be specific
with example. I am still failing to understand the role of this fancy
master who wants its slave to take care of cache maintenance for it.
Anything can be nothing also or as you mentioned above it can just
stop further transactions. but it is agnostic of the underlying caches.
OK, but with more info, I feel involving caches into your problem itself
is incorrect.
negative.
...
...
...
...
...
Why can't it stop snooping into caches(or request firmware to) that
belong to/maintained by the other slave(OS) ?
Sure. the respective clusters to be taken out of snoop domain by firmware
as part of psci plat specific hooks.
OK
...
But what about their caches. I don't think there is a pull mechanism that
ccn/or other interconnect can voluntarily pull the dirty lines from exiting core
caches as an alternate to have them flushed by the respective cores.
OK. SO why are dirty lines in local caches, why can't the driver/device
take action on shutdown ?
Since driver/app is working on coherent memory by convention it can assume no
explicit maintenance.
Yes, but they(consumers in your case) need to be aware of start and
stop of producer.
That is a valid constraint. But we should not force application to do
coherency management
when it deliberately asked for coherent memory.
...
--
Regards,
Sudeep
Thanks
Sandeep

2025

2024

2023

2022

2021

2020

2019

2018

Re: [TF-A] psci shutdown do not follow graceful power off sequence