Re: [Orekit Developers] Work In Progress for thread safety

On Fri, Apr 13, 2012 at 9:54 AM, Luc Maisonobe <Luc.Maisonobe@c-s.fr> wrote:

Le 12/04/2012 23:37, Thomas Neidhart a écrit :

> After debugging into it, I found two problems:
>
> - in the TimeStampedCache, the quantum is stored as an int
> but it can lead to over/under-flows when doing the following
> comparison in selectSlot (and maybe somewhere else):
>
> if (slots.isEmpty() ||
> slots.get(index).getEarliestQuantum() > dateQuantum +
> NEW_RANGE_FACTOR * neighborsSize ||
> slots.get(index).getLatestQuantum() < dateQuantum -
> NEW_RANGE_FACTOR * neighborsSize) {
>
> the result is, that more slots as necessary are created.
> For consistency, all uses of a quantum should be long instead of int

Good catch!
We should be cautious here. Simply changing the types to long may simply
move the problem without solving it. The quantum step in the cache
constructor is computed as halfSpan / Integer.MAX_VALUE. Blindly
changing int to long would lead to change this statement to halfSpan /
Long.MAX_VALUE. After this change, the earliest and latest dates would
still be of the order of magnitude of min and max values for a long, so
the overflow would still occur!

So I would suggest to both use long and at the same time use a fixed
quantum set. Using 1 microsecond as the quantum step seems reasonable
for separating entries even for very fast pace dynamics (when dealing
with attitude and transient modes), and allows a signed long to support
a range from t0 - 292271 years to t0 + 292271 years.

If we use a fixed quantum step, we don't need anymore to limit the
earliest/latest dates to finite values as explained in the generator
interface javadoc. This would also be a nice improvement as this
limitation is quite awkward.

Yes, I totally agree. In my local fix I have just changed the data type of the dateQuantum and kept the quantum step untouched.

> - UTCScale$Generator: the generator defines its earliest and latest
> date somewhere in the range of 1960 - 2012. When providing dates
> that are outside this range, a new slot is created, which is not
> necessary, as there is no more offset data available.
>
> I would propose to add a method like this to the generator, and call
> it instead of cache.getNeighbors directly:
>
> private UTCTAIOffset[] getNeighbors(final AbsoluteDate date) {
> AbsoluteDate corrected = date;
> final AbsoluteDate latest = cache.getLatest().getDate();
> final AbsoluteDate earliest = cache.getEarliest().getDate();
> if (date.compareTo(latest) > 0) {
> corrected = latest;
> }
> if (date.compareTo(earliest) < 0) {
> corrected = earliest;
> }
> return cache.getNeighbors(corrected);
> }
>
> The idea is to limit the date to the available range of the cache.

I understand your rationale. Shouldn't this however be handled at cache
level as a general feature ? There is such a protection in the Slot
constructor for example, it could perhaps be moved a cache level.

Actually, debugging the Cache was not so easy, and there were lots of side-effects from AbsoluteDate.toString in a debugging view, as it uses the UTC scale to display a date (and this is evaluated for any watches, affecting the cache ;-( ).
So I would suggest to implement a fail fast strategy: whenever there is something unexpected -> throw an exception

In the case of the range: imho, the cache should only provide neighbors in the range the generator can provide. Any requests outside that range should result in an exception.
Specific uses of the cache have to take make sure that the cache is called properly, as the assumption for the UTCScale may not be true for other use-cases I guess?

> - There should be a possibility to limit the calls to
> Generator#generate. In the case of the UTCScale, there is no use in
> calling it multiple times, as it will not provide additional data,
> but there is an unnecessary overhead involved (i.e. checking the
> result).

I'm not sure about that. The case for UTC where we know one call
generates everything is quite specific. In ephemerides cases the number
of calls is also limited by the files contents, but we avoid reading all
files at first, so we don't know beforehand when we will have exhausted
the files (and we often stop computation before exhausting them as they
cover very large ranges). In analytical model cases (like
precession-nutation), there is no limit to the number of calls.

One way could be to say that a limit count equal to zero or less than
zero means no limits.

>
> I have done a first test to reduce the overhead by returning an empty
> list after the first call to the generator, which works, but there
> are surely better solutions:
>
> private static final List<UTCTAIOffset> EMPTY =
> new ArrayList<UTCTAIOffset>();
> private boolean generated = false;
>
> public List<UTCTAIOffset> generate(final UTCTAIOffset existing,
> final AbsoluteDate date) {
> // everything has been generated at construction
> List<UTCTAIOffset> reply = generated ? EMPTY : offsets;
> generated = true;
> return reply;
> }

If we decide to set a limit on the number of calls, it would be trivial
to implement at cache level, as there is already a counter for calls.
Also in the case of UTC, we limit generation because we know nothing
more can be added, but in other cases this could be used as a safety
net. So in the first case, no error should be triggered if the limit is
reached, but in the other case an exception should be thrown.

Another possibility would be to improve the checks the cache does. It
already knows the earliest/latest date and should not generate more than
that. I tried to do that, but obviously failed. There is a dirty trick
in the UTC case because there are regular entries between 1960 and 2012
(for now), and also two entries at infinity, in the past and in the
future. Perhaps these two entries mess up the cache.

Yes, I would favor something like that, my initial fix was just quick and dirty, but a real solution should limit the call to generate itself.
I did not spend much time digging into the logic of the cache where he decides to generate new entries, but I will try to do so this evening.

> For the first two issues, I can provide a patch, the third one needs to
> be discussed more in detail I guess.
>
> btw. I know its early work in progress :-), but having more fine-grained
> control over the cache settings would be good, i.e. having specific
> default values for each cache, e.g. UTCScale cache instead of having
> default values for all caches. If you agree, I can also provide patches
> here.

Yes, please do it. Also if you think we can add more control methods in
the same spirit as the various getters for generation calls, evictions,
number of slots and so on, do not hesitate to add them.

ok I will do so.