Hi Sudeep,
On Mon, Dec 9, 2019 at 10:40 PM Sudeep Holla sudeep.holla@arm.com wrote:
On Mon, Dec 09, 2019 at 10:14:25PM +0530, Sandeep Tripathy wrote:
Hi Sudeep, On Mon, Dec 9, 2019 at 9:07 PM Sudeep Holla sudeep.holla@arm.com wrote:
On Mon, Dec 09, 2019 at 08:36:18PM +0530, Sandeep Tripathy wrote:
Hi Sudeep, I am very specific about the core caches. The app/driver or many entity for that matter are updating cacheable coherent memory range(s).
Are we referring OS apps/drivers or something on the other masters ? Please be as specific as possible.
Linux application and drivers running on arm cluster and sharing DDR over CCN with other masters. Not sure if the specifics will make it any clearer. for ex: Broadcom smartNIC or any accelerator on PCIe. I think the details will only distract the discussion to platform specifics where as the issue is not.
Sure. Now I feel that it's nothing to do with external master, just Linux OS(slave) and it's logging application. Please shout if that's not the case. The producer is OS while the consumer is external master/slave.
Yes in this example.
Assume one application is logging some data to the said coherent cacheable ddr. on reboot /shutdown notification it will stop logging.
The application has to terminate cleanly when SIGTERM is sent(may be using appropriate handler. And can intimate the same to the consumers so that they can consume the data before it's lost.
The DDR is not powered off ever in this scenario. So when to/how to consume the log is up to the (consumer) application design. Assume its an incrementing log ie. after reboot this (producer)master again will continue to dump more records on to it. How would you suggest to handle this. In this case both producer and consumer deliberately asked for coherent memory so why it should also consider a possible data loss due to platforms not giving the coherency because it will add some time to flush the core caches. If they get non-cached(coherent) memory range they don't have to do anything isn't it ?
That's exactly why I kept mentioning notification.It need not be generic shutdown, but can be from logging producer to the logging consumers.
If cpu cache is not flushed log is lost. The cache can be small but the buffer can be huge for a range flush.
If both producers and consumers are aware of logging being stopped due to shutdown or reboot, we must not be worried about caches here.
On shutdown/reboot notification what they ought to do ?
If on the same slave OS, then stop all on-going transactions. There are hooks to do that. Simply, it can map to be remove device calls. And the other masters also have notification to do the same.
Done. stopped all transactions. but data lies in local caches.
If we ask them to do respective range flushes (cache mnt of a coherent memory) that is less generic compared to if the core infrastructure gives the coherency guarantee
Sure, but I need clarity on above to answer this.
(1-OS: especially more than just halting for secondary(other) cores or may be cpuhp secondary(other) cores etc, 2- tf-a psci shutdown/reset/reset2 to do graceful pwrdown to take care of the initiating core). May be this can be under a flag to choose between faster reboot vs graceful power down sequence if at all it qualifies as generic ?
I think we had enough discussions so far to tell this is not a generic requirement.
Thanks for the quick responses. looks like finally we have to agree to disagree.
Yes definitely disagree with the attempted approach both in OS and TF-A to solve the issue as you have describe(rather how I have understood)
[...]
And what's done in those masters with *this particular* notification ?
Anything but not cache maintenance preferably.
And what does that *anything* include in your case. Please be specific with example. I am still failing to understand the role of this fancy master who wants its slave to take care of cache maintenance for it.
Anything can be nothing also or as you mentioned above it can just stop further transactions. but it is agnostic of the underlying caches.
OK, but with more info, I feel involving caches into your problem itself is incorrect.
negative.
Why can't it stop snooping into caches(or request firmware to) that belong to/maintained by the other slave(OS) ?
Sure. the respective clusters to be taken out of snoop domain by firmware as part of psci plat specific hooks.
OK
But what about their caches. I don't think there is a pull mechanism that ccn/or other interconnect can voluntarily pull the dirty lines from exiting core caches as an alternate to have them flushed by the respective cores.
OK. SO why are dirty lines in local caches, why can't the driver/device take action on shutdown ?
Since driver/app is working on coherent memory by convention it can assume no explicit maintenance.
Yes, but they(consumers in your case) need to be aware of start and stop of producer.
That is a valid constraint. But we should not force application to do coherency management when it deliberately asked for coherent memory.
-- Regards, Sudeep
Thanks Sandeep