As explained on the filtering page, the FiltersManager.applyRelevantFilters
method works by looping over all registered DataFilter
instances and calling their filter
method with the current DataSource
. If the filter
method returns the exact same instance that was passed to it, it means the filter does not do anything. In this case, the FiltersManager.applyRelevantFilters
method just continues its loop and check the next filter. If the filter
method returns a different DataSource
instance that was passed to it, it means the filter does indeed act on the data stream. In this case, the FiltersManager.applyRelevantFilters
method sets the current DataSource
to the returned value and restart its loop from the beginning.
This algorithm allows the same filter to be applied several time if needed, and it also allows the filters to be applied in any order, regardless of the order in which they have been registered to the FiltersManager
.
Users may benefit from this general feature to add their own filters. One example could be a deciphering algorithm if sensitive data should be stored enciphered and should be deciphered on the fly when data is loaded.
As per the way the applyRelevantFilters
method works, the filter
method must be implemented in such a way that it should check the DataSource
passed to it and return its parameter if it considers it should not filter it, or return a new DataSource
if it considers is should filter it.
There is one important caveat to understand when implementing custom filter: filters must never open the DataSource
by themselves, regardless of the fact they will return the original instance or a filtered instance when their filter
method is called. The rationale is that it is the upper layer that will decide to open (or not) the returned value and that a DataSource
can be opened only once; this is the core principle of lazy-opening provided by DataSource
.
A consequence of this caveat is that a filter cannot peek on the few bytes of the data stream that is referenced by a DataSource
, for example in an attempt to look for a magic number in a header. This is the reason why for example the GzipFilter
looks for a .gz
suffix in the name and does not look for the 0x1f8B
magic number at file start.
As applyRelevantFilters
restarts its loop from the beginning each time a filter is added to the stack, some care must be taken to avoid stacking an infinite number of instances of the same filter on top of each other. This means that the filtered DataSource
returned after filtering should be recognized as already filtered and not matched again by the same filter. If the check is based on file names extensions (like .gz
for gzip-compressed files), then if the original DataSource
has a name of the form base.ext.gz
than the filtered file should have a name of the form base.ext
. Another point is that if a filters does not act on a DataSource
, then it must return the same instance that was passed to it, it must not simply create a transparent filter that just passes names and data stream unchanged, otherwise it would be considered as a valid filter and added again and again until either a stack overflow or memory exhaustion exception occurs.
The filtering part itself is implemented by opening the data stream from the underlying original DataSource
, reading raw data from it, performing the processing on these data (uncompressing, deciphering, …) and returning them as another stream.
The following example shows how to do that for a dummy deciphering algorithm based on a simple XOR (this is a toy example only, not intended to be secure at all).
public class XorFilter implements DataFilter { /** Suffix for XOR ciphered files. */ private static final String SUFFIX = ".xor"; /** Highly secret key. */ private static final int key = 0x3b; /** {@inheritDoc} */ @Override public DataSource filter(final DataSource original) { final String oName = original.getName(); final DataSource.Opener oOpener = original.getOpener(); if (oName.endsWith(SUFFIX)) { final String fName = oName.substring(0, oName.length() - SUFFIX.length()); final DataSource.StreamOpener fOpener = () -> new XORInputStream(oName, oOpener.openStreamOnce()); return new DataSource(fName, fOpener); } else { return original; } } /** Filtering of XOR ciphered stream. */ private static class XORInputStream extends InputStream { /** File name. */ private final String name; /** Underlying compressed stream. */ private final InputStream input; /** Indicator for end of input. */ private boolean endOfInput; /** Simple constructor. * @param name file name * @param input underlying compressed stream */ XORInputStream(final String name, final InputStream input) { this.name = name; this.input = input; this.endOfInput = false; } /** {@inheritDoc} */ @Override public int read() throws IOException { if (endOfInput) { // we have reached end of data return -1; } final int raw = input.read(); if (raw < 0) { endOfInput = true; return -1; } else { return raw ^ key; } } } }