Orekit provides an automatic filtering feature for data. This feature is activated
automatically with an initial set of predefined filters for data loaded through a
DataProvidersManager
instance when using the default configuration,
and can be used for explicit loading of application data.
All filters implement the DataFilter
interface and can be registered to a FiltersManager
instance. In the default configuration, three filters are
registered to the FiltersManager
contained in the DataProvidersManager
:
GzipFilter
which uncompresses files compressed with the gzip algorithmUnixCompressFilter
which uncompresses files compressed with the Unix compress algorithmHatanakaCompressFilter
which uncompresses RINEX files compressed with the Hatanaka methodUpon load time, all filters that can be applied to a set of data will
be applied. If for example a file is both encrypted and compressed
(in any order) and filters exist for uncompression and for deciphering,
then both filters will be applied in the right order to the data retrieved
by the DataProvider
before being fed to the DataLoader
(or the parsers set up by
users in explicit loading of application data).
The following class diagrams shows the main classes and interfaces involved in this feature.
The filtering principle is based on a stack of DataSource
instances, with at the bottom
an instance (created by a DataProvider
when using DataProvidersManager
, or created
manually when loading data explicitly). The instance at the bottom of the stack will read
bytes or characters directly from storage. Upwards in the stack, one will find instances added
by the FiltersManager.applyRelevantFilters
method as needed, each one reading data from the
underlying stack element and providing filtered data to the next element upward.
In the DataProvidersManager
case, if at the end the name part of the DataSource
matches the
name that the DataLoader
instance expects, then the data stream of the top of the stack is opened.
This is were the lazy opening occurs, and it generally ends up with all the intermediate bytes or
characters streams being opened as well. The opened stream is then passed to the DataLoader
to be
parsed. If on the other hand the name part of the DataSource
does not match the name that the
DataLoader
instance expects, then the data stream is not opened, the full stack is discarded
and the next resource/file from the DataProvider
is considered for filtering and loading.
In the explicit loading case, application can decide on its own to open or discard the top
level DataSource
, or select the appropriate parser based on the source name without having
to bother about extensions like ‘.gz’ as they would already have been handled by lower level
filters.
One example will explain this method more clearly. Consider a DirectoryCrawler
configured to look into a directories tree containing files tai-utc.dat
and
MSAFE/may2019f10_prd.txt.gz
, consider one of the defaults filters: GzipFilter
that uncompresses files with the .gz
extension (the defaults filters also include
UnixCompressFilter
and HatanakaCompressFilter
, they are omitted for clarity), and
consider MarshallSolarActivityFutureEstimation
which implements DataLoader
and can
load files whose name follow a pattern mmmyyyyf10_prd.txt (among others).
When the tai-utc.dat
file is considered by the DirectoryCrawler
, a DataSource
is created
for it. Then the filters are checked (only one filter shown in the diagram), and all of them
decline to act on the file, so they all return the same DataSource
that was created for the
raw file. At the end of the filters loop, the name (which is still tai-utc.dat
) is checked
against the pattern expected by the data loader. As it does not match, the stack composed of
only one DataSource
is discarded. During all checks, the file has not been opened at all,
only its name has been considered.
The DirectoryCrawler
then considers the next directory, and in this directory the next
file which is may2019f10_prd.txt.gz
. A new DataSource
is created for it and the filters are
checked. As the extension is .gz
, the GzipFilter
filter considers it can act on the file
and it creates and returns a new DataSource
, with name set to may2019f10_prd.txt
(it has removed
the .gz
extension) and lazy stream opener set to insert an uncompressing algorithm between the raw file bytes
stream and the uncompressed bytes stream it will provide. The loop is restarted, but no other
filter applies so at the end the stack contains two DataSource
, the bottom one reading from
storage and providing gzip compressed data, and the top one reading the gzip compressed data,
uncompressing it and providing uncompressed data. As the name of the top instance matches the
expected pattern for MSAFE data, the MarshallSolarActivityFutureEstimation
will be able to
load it. At this stage, the DirectoryCrawler
calls the method to open the bytes stream at the
top level of the stack. This method then asks the underlying DataSource
to open its stream
(which it the raw file data), feeds the gzip uncompression algorithm with this data and provides
the output uncompressed data as a newly opened bytes stream. The data loader then parses the data,
without knowing that is is uncompressed on the fly.
When loading data explicitly, the application is responsible to set up the FiltersManager
and call it. The following example shows for example how to load CCSDS Orbit Ephemeris Messages
from files selected by users in a graphical interface, where some files may have been
compressed using either gzip or with Unix compress, and may have been ciphered on disk as
they contain sensitive information (and they may be both compressed and ciphered, in any
order):
// set up filters manager, this may be done only once at application startup if needed
FiltersManager manager = new FiltersManager();
filtersManager.addFilter(new GzipFilter());
filtersManager.addFilter(new UnixCompressFilter());
filtersManager.addFilter(new MyOwnDecipheringFilter(secretKey));
// set up builder for CCSDS files parsers
ParserBuilder parserBuilder = new ParserBuilder(dataContext);
// parse files
for (final File file : userInterface.getFilesToProcess()) {
// set up raw data source, which may be compressed and/or ciphered
DataSource rawSource = new DataSource(file.getName(), () -> new FileInputStream(file));
// apply relevant filters
DataSource filteredSource = manager.applyRelevantFilters(rawSource);
// parse the file, which is now known to be uncompressed and deciphered
OEM oem = parserBuidler.buildOemParser().parse(filteredSource);
// process the ephemeris
...
}