Skip to content

terma/fast-select

Repository files navigation

fast-select

Build Status Coverage Status Maven Central

Compact in-memory read-only storage with lock free ultra-fast quering by any attributes under Apache 2.0 license.

Key Properties

  • Compact
    • No java object overhead (in avg x10 less than object representation)
    • Compact string representation (UTF-8 instead of Java UTF-16)
    • String data compression
    • Small metadata footprint (no indexes overhead)
  • Fast
    • All dimension available for search
    • Using data statistic to avoid full scan
    • Column oriented
    • Thread safe and lock free
  • Support fast save/load to/from disk details
  • Small jar file
  • Apache 2.0

Use Cases

  • Speed up analytical quering by caching main data in compact and query optimized way instead of using expensive solution details
  • Separate ETL and analytic load by keeping main data optimized for processing and add compact model optimizing for quering details
  • Sub second quering of historical data by loading portion of data on demand in a seconds details

How to use

Create Data Class

public class Data {
    public byte a;
    public byte b;
}

Build storage

FastSelect<Data> database = new FastSelectBuilder<>(Data.class).create();

// add your data
database.addAll(new ArrayList<Data>(...)); 

Aggregate

In case if you just want select F1, F2, count(X) ... group by F1, F2

MultiGroupCountCallback callback = new MultiGroupCountCallback(fastSelect.getColumnsByNames().get("a"));
database.select(
  new Request[] {new IntRequest("a", new int[]{12, 3})}, 
  callback);
callback.getCounters(); // your result here grouped by field 'a'

Starting from version 3.2.0 For more sophisticated and flexible cases you can use AggregateCallback which support user defined aggregation in fast way, so you don't need to worry about aggregation key perfromance etc.

FastSelect fastSelect = ...;
final ByteData data = fastSelect.getData("columnWithData");
AggregateCallback<MutableInt> callback = new AggregateCallback<>(
    new Aggregator<MutableInt>() {
        @Override
        public MutableInt create(int position) {
           // will be called when this unique key happens first time
           return new MutableInt(data.data[position]);
        }
        
        @Override
        public void aggregate(MutableInt agg, int position) {
            // will be called all other times
            agg.add(data.data[position]);
        }
    },
    fastSelect.getColumnsByNames().get("aggregationColumn1"),
    fastSelect.getColumnsByNames().get("aggregationColumn2")
    // you can specify any amount of columns with any type
);
fastSelect.select(new Request[0], callback);
Map<AggregateKey, MutableInt> result = callback.getResult();

Select first 25 items from sorted dataset

ListLimitCallback<DemoData> callback = new ListLimitCallback<>(25);
fastSelect.selectAndSort(where, callback, "a");
callback.getResult();

Filter dataset get total and render only one page

// get ref to real data
IntData id = (IntData) fastSelect.getColumnsByNames().get("id").data;

List<Integer> positions = fastSelect.selectPositions(new Request[] {...});
Collections.sort(positions, new Comparator<Integer>() {
    public int compare(Integer p1, Integer p2) { 
        return id.data[p1] - id.data[p2];
    }
});

// page render
List<Map<String, String>> page = new ArrayList<>();
for (int i = 10; i < 20; i++) {
    int p = positions.get(i);
    Map<String, String> row = new HashMap<>();
    row.put("id", id.data[p]);
    page.add(row);
}

int total = positions.size();

Combine filters by AND

Just add more requests

fastSelect.select(new Request[] {
    new IntRequest("id", 12),
    new StringLikeRequest("name", "bim"); // name like '%bim%'
    ...
});

Combine filters by OR

Wrap requests which should be by OR to OrRequest

new OrRequest(
    new IntRequest("id", 12),
    new StringLikeRequest("name", "bim"); // name like '%bim%'
)

JMX

To publish information by JMX about instance of FastSelect you can use embedded class FastSelectMXBeanImpl from package com.github.terma.fastselect.jmx It provide read-only info like:

  • size (count of records)
  • allocated size
  • used mem
  • columns (type, name, mem)

To register FastSelect instance by JMX

FastSelect<Object> fastSelect = ...;
FastSelectMXBean fastSelectMXBean = new FastSelectMXBeanImpl(fastSelect);
MBeanServer mbs = ManagementFactory.getPlatformMBeanServer();
mbs.registerMBean(fastSelectMXBean, new ObjectName("fastselect:type=mbeanname"));

Unregister

Use standard way for MBeans:

String mbeanName = ...;
MBeanServer mbs = ManagementFactory.getPlatformMBeanServer();
mbs.unregisterMBean(new ObjectName("fastselect:type=mbeanname"));

More use cases you can find in javadoc callbacks package

Low Cardinality Strings

fast-select provides very compact storage for small Java types and String, which is provide you mem benefits because of no Java object overhead and UTF-8 compression. However you can get even better result for low cardinality columns. Take a look on:

  • com.github.terma.fastselect.data.StringCompressedByteData
  • com.github.terma.fastselect.data.StringCompressedShortData
  • com.github.terma.fastselect.data.StringCompressedIntData