Enhance the CachingLayer #904

nimo23 · 2022-10-24T17:48:26Z

This issue is about improving/replacing/correcting the CachingLayer:

Some background information:

[BUG] Incorrect ParabolicSarIndicator calculation - RT calculation does not equal to SAR on historical data #902 (comment)
CachedIndicator should be a wrapper indicator not in inheritance structure #608
Sorting out Indicators for real-time where getValue(x0) doesn't change with added bars #522 (comment)
Provide simple implementation of a circular array? #843

Ideas welcome.

dazre · 2022-10-25T08:08:51Z

IMHO, CachingLayer in its current implementation does not solve the problem of re-calculating the indicator. I mean, in one process, an indicator with a certain set of parameters is calculated once - this is good. However, in practice, in the process of optimization, it is necessary to select the parameters of indicators, which leads to their constant recalculation. In addition, during the research process, we run our strategies over and over again and each time the indicators are recalculated anew.

On the other hand, when the strategy is launched as a trading robot, caching is generally redundant because for most indicators, only its last value matters.

So In my opinion, caching can be improved by saving the results of the indicator calculation on disk. Of course, it is better to use memcached, redis, mongodb or hazelcast for data storage, but even writing to disk will significantly increase the caching efficiency.

nimo23 · 2022-10-25T11:07:32Z

Would it make sense to just remove the cache from the project entirely?

Why caching when most users never actually use the cache because:

For backtesting: They only calculate the values once.
For trading: They are only interested in the latest value.

Do I miss some other broad use cases which justifies the "caching layer" in ta4j?

If someone wants to save or cache the values somewhere, then they can do that, but the project doesn't need it for any indicator calculation.

I think, we should do two steps:

Remove the CachedIndicator completly.
Add (optionally) a new caching layer (if really required) which does not use inheritance but composition.

Step 1 can be done immediatley.
Step 2 can be done (if really required) in any future versions of ta4j.

dazre · 2022-10-25T11:26:39Z

Remove the CachedIndicator completly.

As I see, the caching layer is used in many indicators and I could not imagine which performance impact we would get in case of removing CachedIndicator.

nimo23 · 2022-10-25T14:14:07Z

Yes, we also need it for its child RecursiveCachedIndicator..

team172011 · 2022-10-25T20:17:14Z

Why caching when most users never actually use the cache because

As I see, the caching layer is used in many indicators and I could not imagine which performance impact we would get in case of removing CachedIndicator

The cache is very important for a lot of use cases.

A basic feature of ta4j is to combine Indicators with other Indicators (for example Bollinger Bands may use an EMAIndicator that is also used by an MACDIndicator).
- So for each index the Indicator#getValue method will be called several times.
You can also have several Strategies and/or Rules that are based on a lot of Indicators and those Indicators again may call each other.
Also if you work with indicators like the PreviousValueIndicator, without a cache you could easily double the amount of calcuations done in an indicator

So far I have no metrics but I think the caching layer that ensures a calculation only needs to executed once has a very big impact of performance

Remove the CachedIndicator completly.
Add (optionally) a new caching layer (if really required) which does not use inheritance but composition.

I would prefer to have a cache in the default mode and adding the possiblity to disable the cache.
Example code could look like:

BarSeries barSeries = new BaseBarSeriesBuilder()
      .withName("AAPL live trading")
      .withCache(BaseCacheBuilder.NO_CACHE);

If implementing the cache this way I would suggest to use caffeine as a third party cache provider for the default "BaseCache" implementation.

nimo23 · 2022-10-25T21:56:00Z

I would prefer to have a cache in the default mode and adding the possiblity to disable the cache.

After considering the importance of the caching in ta4j, I have a few questions:

Why would you want to disable the cache - which use cases justifies to disable the cache in ta4j?
What would be the benefit of adding an external caching provider. Currently our cache is only a single class, which might only need to be enhanced slightly to meet the missing requirements.

The missing requirements are to avoid errors in indicator calculations due to the things described in #902 (comment). We should also discuss the following steps to clarify whether much reworking of the caching layer is necessary at all:

If I go to all indicators (inheriting CachedIndicator) making use of class parameters to store their calculated values and change the class parameters to method parameters, then I do not need to rework the caching layer. Or? And even if I rework the caching layer, we need to change those class parameters also, or?
You said: Using the date of a bar (like a map of Map<LocalDateTime, Num>) would be better. If we change this, then we can use class parameters in cached indicators as usual and avoid the rework of the caching layer?

team172011 · 2022-10-26T18:54:29Z

Why would you want to disable the cache - which use cases justifies to disable the cache in ta4j?

The possibility of disabling the cache would be only a side effect of an "pluggable" cache.

For example some advanced ta4j users could benefit from disabling the cache if the use it with a few indicators only on high performance real time trading or if someone needs a very low memory foodprint
Like mentioned above the "no_cache" option would only be a nice side effect. If the cache would be pluggable and designed like BarSeries (with BaseBarSeries as a reference implementation), Bar (with BaseBar with a reference implementation) or TradingRecord (with BaseTradingRecord...) users could implement their own cache for specific needs (very simple in memory caching up to very complex distributed/persisted caching) or just use the BaseCache implementation if they do not care.

What would be the benefit of adding an external caching provider. Currently our cache is only a single class, which might only need to be enhanced slightly to meet the missing requirements.

There are fairly common issues and bug reports that are based or related to our current caching logic. And I think the problems are not solved by adding more and more special cases to this:

 public T getValue(int index) {
        BarSeries series = getBarSeries();
        if (series == null) {
            // Series is null; the indicator doesn't need cache.
            // (e.g. simple computation of the value)
            // --> Calculating the value
            T result = calculate(index);
            if (log.isTraceEnabled()) {
                log.trace("{}({}): {}", this, index, result);
            }
            return result;
        }

        // Series is not null

        final int removedBarsCount = series.getRemovedBarsCount();
        final int maximumResultCount = series.getMaximumBarCount();

        T result;
        if (index < removedBarsCount) {
            // Result already removed from cache
            if (log.isTraceEnabled()) {
                log.trace("{}: result from bar {} already removed from cache, use {}-th instead",
                        getClass().getSimpleName(), index, removedBarsCount);
            }
            increaseLengthTo(removedBarsCount, maximumResultCount);
            highestResultIndex = removedBarsCount;
            result = results.get(0);
            if (result == null) {
                // It should be "result = calculate(removedBarsCount);".
                // We use "result = calculate(0);" as a workaround
                // to fix issue #120 (https://github.com/mdeverdelhan/ta4j/issues/120).
                result = calculate(0);
                results.set(0, result);
            }
        } else {
            if (index == series.getEndIndex()) {
                // Don't cache result if last bar
                result = calculate(index);
            } else {
                increaseLengthTo(index, maximumResultCount);
                if (index > highestResultIndex) {
                    // Result not calculated yet
                    highestResultIndex = index;
                    result = calculate(index);
                    results.set(results.size() - 1, result);
                } else {
                    // Result covered by current cache
                    int resultInnerIndex = results.size() - 1 - (highestResultIndex - index);
                    result = results.get(resultInnerIndex);
                    if (result == null) {
                        result = calculate(index);
                        results.set(resultInnerIndex, result);
                    }
                }
            }

        }
        if (log.isTraceEnabled()) {
            log.trace("{}({}): {}", this, index, result);
        }
        return result;
    }

    /**
     * Increases the size of cached results buffer.
     *
     * @param index     the index to increase length to
     * @param maxLength the maximum length of the results buffer
     */
    private void increaseLengthTo(int index, int maxLength) {
        if (highestResultIndex > -1) {
            int newResultsCount = Math.min(index - highestResultIndex, maxLength);
            if (newResultsCount == maxLength) {
                results.clear();
                results.addAll(Collections.nCopies(maxLength, null));
            } else if (newResultsCount > 0) {
                results.addAll(Collections.nCopies(newResultsCount, null));
                removeExceedingResults(maxLength);
            }
        } else {
            // First use of cache
            assert results.isEmpty() : "Cache results list should be empty";
            results.addAll(Collections.nCopies(Math.min(index + 1, maxLength), null));
        }
    }

    /**
     * Removes the N first results which exceed the maximum bar count. (i.e. keeps
     * only the last maximumResultCount results)
     *
     * @param maximumResultCount the number of results to keep
     */
    private void removeExceedingResults(int maximumResultCount) {
        int resultCount = results.size();
        if (resultCount > maximumResultCount) {
            // Removing old results
            final int nbResultsToRemove = resultCount - maximumResultCount;
            if (nbResultsToRemove == 1) {
                results.remove(0);
            } else {
                results.subList(0, nbResultsToRemove).clear();
            }
        }
    }

If I go to all indicators (inheriting CachedIndicator) making use of class parameters to store their calculated values and change the class parameters to method parameters, then I do not need to rework the caching layer. Or? And even if I rework the caching layer, we need to change those class parameters also, or?

Yes, independently of the cache fix or enhancement we need to eliminate all usages of instance variables that store context information that depend on an index

You said: Using the date of a bar (like a map of Map<LocalDateTime, Num>) would be better. If we change this, then we can use class parameters in cached indicators as usual and avoid the rework of the caching layer?

Using Map<LocalDateTime, Num> would not mean we can use class parameters and it would mean that we have to rework the caching layer. But I would prefer to just use a caching library like coffein and replace the CachedIndicator#results: ArrayList<T> "cache" with something like LoadingCache<LocalDateTime, T>

nimo23 · 2022-10-26T20:30:19Z

Yes, independently of the cache fix or enhancement we need to eliminate all usages of instance variables that store context information that depend on an index

I think before providing a cache fix or optional caching, we should do #906.

team172011 · 2022-10-26T20:48:05Z

Draft: #907

sergproua · 2023-03-04T17:01:16Z

Hi Simon (@team172011). I was thinking to implement cache myself and came across this ticket. I've reviewed your great work and was wondering what was the reason for removing CacheProvider from BaseSeries as it completely decouples cache implementation from CacheIndicator. I have use cases where I need partial cache busting (rewind) as well as use of different cache implementations for different instances of bar series and going back to cache factory via BarSeries would be perfect.

team172011 · 2023-03-07T09:22:41Z

@sergproua I think one problem with the CacheProvider approach is that the cacheProvider does not know the Num type or even the Indicator<T> type (mostly Boolean or a Num). So the cache instantiation must be located in the corresponding indicator to be able to have the correct type

nimo23 added the enhancement Issue describing or discussing an enhancement for this library label Oct 24, 2022

nimo23 mentioned this issue Oct 24, 2022

[BUG] Incorrect ParabolicSarIndicator calculation - RT calculation does not equal to SAR on historical data #902

Closed

2 tasks

nimo23 changed the title ~~Improve CachingLayer~~ Enhance the CachingLayer Oct 24, 2022

nimo23 mentioned this issue Oct 26, 2022

Javadoc: Update CachedIndicator.java #905

Merged

1 task

team172011 closed this as completed Oct 26, 2022

team172011 reopened this Oct 26, 2022

nimo23 mentioned this issue Oct 26, 2022

Eliminate class instance variables from indicators #906

Closed

29 tasks

team172011 mentioned this issue Apr 13, 2023

Implement the Zig Zag indicator #858

Open

2 tasks

nimo23 mentioned this issue May 31, 2023

[Question] Cached Indicator Memory Consumption #981

Closed

1 task

nimo23 mentioned this issue Aug 7, 2023

Pluggable and customizable cache design #907

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance the CachingLayer #904

Enhance the CachingLayer #904

nimo23 commented Oct 24, 2022 •

edited

Loading

dazre commented Oct 25, 2022

nimo23 commented Oct 25, 2022 •

edited

Loading

dazre commented Oct 25, 2022

nimo23 commented Oct 25, 2022

team172011 commented Oct 25, 2022 •

edited

Loading

nimo23 commented Oct 25, 2022 •

edited

Loading

team172011 commented Oct 26, 2022

nimo23 commented Oct 26, 2022

team172011 commented Oct 26, 2022

sergproua commented Mar 4, 2023 •

edited

Loading

team172011 commented Mar 7, 2023 •

edited

Loading

Enhance the CachingLayer #904

Enhance the CachingLayer #904

Comments

nimo23 commented Oct 24, 2022 • edited Loading

dazre commented Oct 25, 2022

nimo23 commented Oct 25, 2022 • edited Loading

dazre commented Oct 25, 2022

nimo23 commented Oct 25, 2022

team172011 commented Oct 25, 2022 • edited Loading

nimo23 commented Oct 25, 2022 • edited Loading

team172011 commented Oct 26, 2022

nimo23 commented Oct 26, 2022

team172011 commented Oct 26, 2022

sergproua commented Mar 4, 2023 • edited Loading

team172011 commented Mar 7, 2023 • edited Loading

nimo23 commented Oct 24, 2022 •

edited

Loading

nimo23 commented Oct 25, 2022 •

edited

Loading

team172011 commented Oct 25, 2022 •

edited

Loading

nimo23 commented Oct 25, 2022 •

edited

Loading

sergproua commented Mar 4, 2023 •

edited

Loading

team172011 commented Mar 7, 2023 •

edited

Loading