[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Orekit Users] Test failure




Walter Grossman <w.grossman@ieee.org> a écrit :

Thanks for prompt response.  I will do my best.  Let me also add that there
was a warning that 2 tests were skipped.

The skipped tests are expected, they correspond to one of the class
considered experimental as of 9.2.


I cloned the repository using git.  the jar is orekit-9.2.jar

UBUNTU 16.04LTS
Intel® Core™ i5-3320M CPU @ 2.60GHz × 4
Intel® Ivybridge Mobile
64-bit

openjdk version "1.8.0_171"
OpenJDK Runtime Environment (build 1.8.0_171-8u171-b11-0ubuntu0.16.04.1-b11)
OpenJDK 64-Bit Server VM (build 25.171-b11, mixed mode)

Maybe I should switch to Oracle Java?

No, most of the Orekit developers use Linux and openJDK.

I'll have a quick look at this, but this may be a numerical glitch. Increasing
the tolerance seems fine to me.

best regards,
Luc


On Mon, Jun 4, 2018 at 9:54 AM, MAISONOBE Luc <luc.maisonobe@c-s.fr> wrote:

Hi Walter,

Walter Grossman <w.grossman@ieee.org> a écrit :


I am a newbie to OREkit.  I ran tests and go a "near-miss" failure.  I
resolved by relaxing precision.  How do I know if I am OK?



OrbitDeterminationTest.testW3B:384 expected:<0.687998> but
was:<0.6880143632396981>

found this line:  Assert.assertEquals(0.687998, covariances.getEntry(6,
6),
1.0e-5);


Is the problem that acceptance criterion too tight?  Why?


The test tolerance is intentionally extremely small, see below
for the rationale for this stringent choice. The test should however
succeed with the current settings. Could you tell us which version
of Orekit you use (development version from the git repository, released
version?) and with which Java environment (OS, JVM version, processor)?

Some tests in Orekit are built in several stages. First the test is
created without any thresholds and only output its results, which are
compared by the developer with whatever is available to get confidence
on the results. This may be run of other reference programs if available,
this may be another independent implementation using different algorithms,
or this may be sensitivity analysis with the program under test itself.
This
validation phase may be quite long. Once developers are convinced the
implementation
is good, they run the test one last time and register its output as the
reference values with a stringent threshold in order to transform the
nature
of the test into a non-regression test. The threshold is therefore not an
indication that the results are very good, it is only a way for us to
ensure
that any change in the code that affects this part will break the test and
will enforce developers to look again at this code and to decide what to
do.
They can decide that the changes that broke the test are valid and that
they
only changed the results in an acceptable way (sometimes to improve the
results),
so they change either the reference value or the threshold. They can
decide that
the changes in fact triggered something unexpected and that they should
improve
their new code so the test pass again without changing it. So as a summary
thresholds for non-regression tests are small to act as a fuse and people
notice
when it blows up and can take decisions.

best regards,
Luc