Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance the CachingLayer #904

Open
nimo23 opened this issue Oct 24, 2022 · 11 comments
Open

Enhance the CachingLayer #904

nimo23 opened this issue Oct 24, 2022 · 11 comments
Labels
enhancement Issue describing or discussing an enhancement for this library

Comments

@nimo23
Copy link
Contributor

nimo23 commented Oct 24, 2022

@nimo23 nimo23 added the enhancement Issue describing or discussing an enhancement for this library label Oct 24, 2022
@nimo23 nimo23 changed the title Improve CachingLayer Enhance the CachingLayer Oct 24, 2022
@dazre
Copy link

dazre commented Oct 25, 2022

IMHO, CachingLayer in its current implementation does not solve the problem of re-calculating the indicator. I mean, in one process, an indicator with a certain set of parameters is calculated once - this is good. However, in practice, in the process of optimization, it is necessary to select the parameters of indicators, which leads to their constant recalculation. In addition, during the research process, we run our strategies over and over again and each time the indicators are recalculated anew.

On the other hand, when the strategy is launched as a trading robot, caching is generally redundant because for most indicators, only its last value matters.

So In my opinion, caching can be improved by saving the results of the indicator calculation on disk. Of course, it is better to use memcached, redis, mongodb or hazelcast for data storage, but even writing to disk will significantly increase the caching efficiency.

@nimo23
Copy link
Contributor Author

nimo23 commented Oct 25, 2022

Would it make sense to just remove the cache from the project entirely?

Why caching when most users never actually use the cache because:

  • For backtesting: They only calculate the values once.
  • For trading: They are only interested in the latest value.

Do I miss some other broad use cases which justifies the "caching layer" in ta4j?

If someone wants to save or cache the values ​​somewhere, then they can do that, but the project doesn't need it for any indicator calculation.

I think, we should do two steps:

  1. Remove the CachedIndicator completly.
  2. Add (optionally) a new caching layer (if really required) which does not use inheritance but composition.

Step 1 can be done immediatley.
Step 2 can be done (if really required) in any future versions of ta4j.

@dazre
Copy link

dazre commented Oct 25, 2022

  1. Remove the CachedIndicator completly.

As I see, the caching layer is used in many indicators and I could not imagine which performance impact we would get in case of removing CachedIndicator.

@nimo23
Copy link
Contributor Author

nimo23 commented Oct 25, 2022

Yes, we also need it for its child RecursiveCachedIndicator..

@team172011
Copy link
Member

team172011 commented Oct 25, 2022

Why caching when most users never actually use the cache because

As I see, the caching layer is used in many indicators and I could not imagine which performance impact we would get in case of removing CachedIndicator

The cache is very important for a lot of use cases.

  • A basic feature of ta4j is to combine Indicators with other Indicators (for example Bollinger Bands may use an EMAIndicator that is also used by an MACDIndicator).
    • So for each index the Indicator#getValue method will be called several times.
  • You can also have several Strategies and/or Rules that are based on a lot of Indicators and those Indicators again may call each other.
  • Also if you work with indicators like the PreviousValueIndicator, without a cache you could easily double the amount of calcuations done in an indicator

So far I have no metrics but I think the caching layer that ensures a calculation only needs to executed once has a very big impact of performance

Remove the CachedIndicator completly.
Add (optionally) a new caching layer (if really required) which does not use inheritance but composition.

I would prefer to have a cache in the default mode and adding the possiblity to disable the cache.
Example code could look like:

BarSeries barSeries = new BaseBarSeriesBuilder()
      .withName("AAPL live trading")
      .withCache(BaseCacheBuilder.NO_CACHE);

If implementing the cache this way I would suggest to use caffeine as a third party cache provider for the default "BaseCache" implementation.

@nimo23
Copy link
Contributor Author

nimo23 commented Oct 25, 2022

I would prefer to have a cache in the default mode and adding the possiblity to disable the cache.

After considering the importance of the caching in ta4j, I have a few questions:

  1. Why would you want to disable the cache - which use cases justifies to disable the cache in ta4j?
  2. What would be the benefit of adding an external caching provider. Currently our cache is only a single class, which might only need to be enhanced slightly to meet the missing requirements.

The missing requirements are to avoid errors in indicator calculations due to the things described in #902 (comment). We should also discuss the following steps to clarify whether much reworking of the caching layer is necessary at all:

  1. If I go to all indicators (inheriting CachedIndicator) making use of class parameters to store their calculated values and change the class parameters to method parameters, then I do not need to rework the caching layer. Or? And even if I rework the caching layer, we need to change those class parameters also, or?
  2. You said: Using the date of a bar (like a map of Map<LocalDateTime, Num>) would be better. If we change this, then we can use class parameters in cached indicators as usual and avoid the rework of the caching layer?

@team172011
Copy link
Member

Why would you want to disable the cache - which use cases justifies to disable the cache in ta4j?

The possibility of disabling the cache would be only a side effect of an "pluggable" cache.

  • For example some advanced ta4j users could benefit from disabling the cache if the use it with a few indicators only on high performance real time trading or if someone needs a very low memory foodprint
  • Like mentioned above the "no_cache" option would only be a nice side effect. If the cache would be pluggable and designed like BarSeries (with BaseBarSeries as a reference implementation), Bar (with BaseBar with a reference implementation) or TradingRecord (with BaseTradingRecord...) users could implement their own cache for specific needs (very simple in memory caching up to very complex distributed/persisted caching) or just use the BaseCache implementation if they do not care.

What would be the benefit of adding an external caching provider. Currently our cache is only a single class, which might only need to be enhanced slightly to meet the missing requirements.

There are fairly common issues and bug reports that are based or related to our current caching logic. And I think the problems are not solved by adding more and more special cases to this:

 public T getValue(int index) {
        BarSeries series = getBarSeries();
        if (series == null) {
            // Series is null; the indicator doesn't need cache.
            // (e.g. simple computation of the value)
            // --> Calculating the value
            T result = calculate(index);
            if (log.isTraceEnabled()) {
                log.trace("{}({}): {}", this, index, result);
            }
            return result;
        }

        // Series is not null

        final int removedBarsCount = series.getRemovedBarsCount();
        final int maximumResultCount = series.getMaximumBarCount();

        T result;
        if (index < removedBarsCount) {
            // Result already removed from cache
            if (log.isTraceEnabled()) {
                log.trace("{}: result from bar {} already removed from cache, use {}-th instead",
                        getClass().getSimpleName(), index, removedBarsCount);
            }
            increaseLengthTo(removedBarsCount, maximumResultCount);
            highestResultIndex = removedBarsCount;
            result = results.get(0);
            if (result == null) {
                // It should be "result = calculate(removedBarsCount);".
                // We use "result = calculate(0);" as a workaround
                // to fix issue #120 (https://github.com/mdeverdelhan/ta4j/issues/120).
                result = calculate(0);
                results.set(0, result);
            }
        } else {
            if (index == series.getEndIndex()) {
                // Don't cache result if last bar
                result = calculate(index);
            } else {
                increaseLengthTo(index, maximumResultCount);
                if (index > highestResultIndex) {
                    // Result not calculated yet
                    highestResultIndex = index;
                    result = calculate(index);
                    results.set(results.size() - 1, result);
                } else {
                    // Result covered by current cache
                    int resultInnerIndex = results.size() - 1 - (highestResultIndex - index);
                    result = results.get(resultInnerIndex);
                    if (result == null) {
                        result = calculate(index);
                        results.set(resultInnerIndex, result);
                    }
                }
            }

        }
        if (log.isTraceEnabled()) {
            log.trace("{}({}): {}", this, index, result);
        }
        return result;
    }

    /**
     * Increases the size of cached results buffer.
     *
     * @param index     the index to increase length to
     * @param maxLength the maximum length of the results buffer
     */
    private void increaseLengthTo(int index, int maxLength) {
        if (highestResultIndex > -1) {
            int newResultsCount = Math.min(index - highestResultIndex, maxLength);
            if (newResultsCount == maxLength) {
                results.clear();
                results.addAll(Collections.nCopies(maxLength, null));
            } else if (newResultsCount > 0) {
                results.addAll(Collections.nCopies(newResultsCount, null));
                removeExceedingResults(maxLength);
            }
        } else {
            // First use of cache
            assert results.isEmpty() : "Cache results list should be empty";
            results.addAll(Collections.nCopies(Math.min(index + 1, maxLength), null));
        }
    }

    /**
     * Removes the N first results which exceed the maximum bar count. (i.e. keeps
     * only the last maximumResultCount results)
     *
     * @param maximumResultCount the number of results to keep
     */
    private void removeExceedingResults(int maximumResultCount) {
        int resultCount = results.size();
        if (resultCount > maximumResultCount) {
            // Removing old results
            final int nbResultsToRemove = resultCount - maximumResultCount;
            if (nbResultsToRemove == 1) {
                results.remove(0);
            } else {
                results.subList(0, nbResultsToRemove).clear();
            }
        }
    }

If I go to all indicators (inheriting CachedIndicator) making use of class parameters to store their calculated values and change the class parameters to method parameters, then I do not need to rework the caching layer. Or? And even if I rework the caching layer, we need to change those class parameters also, or?

Yes, independently of the cache fix or enhancement we need to eliminate all usages of instance variables that store context information that depend on an index

You said: Using the date of a bar (like a map of Map<LocalDateTime, Num>) would be better. If we change this, then we can use class parameters in cached indicators as usual and avoid the rework of the caching layer?

Using Map<LocalDateTime, Num> would not mean we can use class parameters and it would mean that we have to rework the caching layer. But I would prefer to just use a caching library like coffein and replace the CachedIndicator#results: ArrayList<T> "cache" with something like LoadingCache<LocalDateTime, T>

@nimo23
Copy link
Contributor Author

nimo23 commented Oct 26, 2022

Yes, independently of the cache fix or enhancement we need to eliminate all usages of instance variables that store context information that depend on an index

I think before providing a cache fix or optional caching, we should do #906.

@team172011
Copy link
Member

Draft: #907

@sergproua
Copy link

sergproua commented Mar 4, 2023

Hi Simon (@team172011). I was thinking to implement cache myself and came across this ticket. I've reviewed your great work and was wondering what was the reason for removing CacheProvider from BaseSeries as it completely decouples cache implementation from CacheIndicator. I have use cases where I need partial cache busting (rewind) as well as use of different cache implementations for different instances of bar series and going back to cache factory via BarSeries would be perfect.

@team172011
Copy link
Member

team172011 commented Mar 7, 2023

@sergproua I think one problem with the CacheProvider approach is that the cacheProvider does not know the Num type or even the Indicator<T> type (mostly Boolean or a Num). So the cache instantiation must be located in the corresponding indicator to be able to have the correct type

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Issue describing or discussing an enhancement for this library
Projects
None yet
Development

No branches or pull requests

4 participants