In [1]:
import sys
sys.path.append('..')

# Sampling: Almost twice

## Explanation
The function `indices_twice` also calls `indices_overlap` but connects the non-overlapping examples to new BWS sets.

<img alt="Connect not overlapped examples to new BWS sets." src="bwsample-twice.png" width="300px">

## Demo

In [2]:
from bwsample import indices_twice

n_sets, n_items, shuffle = 6, 4, False

bwsindices, n_examples = indices_twice(n_sets, n_items, shuffle)

bwsindices

[[0, 1, 2, 3],
 [3, 4, 5, 6],
 [6, 7, 8, 9],
 [9, 10, 11, 12],
 [12, 13, 14, 15],
 [15, 16, 17, 0],
 [1, 5, 10, 14],
 [2, 7, 11, 16],
 [4, 8, 13, 17]]

## Problems
The function does **not** guarantees that all examples occur twice across BWS sets.
The reasons is that the numbers `n_sets` and `n_items` require a common denominator.
For example, if both `n_sets=7` and `n_items=3`  are prime numbers, then remainder examples are unavoidable. 

In [3]:
from bwsample import indices_twice

n_sets, n_items, shuffle = 7, 3, False

bwsindices, n_examples = indices_twice(n_sets, n_items, shuffle)

bwsindices

[[0, 1, 2],
 [2, 3, 4],
 [4, 5, 6],
 [6, 7, 8],
 [8, 9, 10],
 [10, 11, 12],
 [12, 13, 0],
 [1, 5, 9],
 [3, 7, 11]]

When we count the occurence of each item,
we see that there is something off.

In [4]:
import itertools
from collections import Counter
counts = list(Counter(itertools.chain(*bwsindices)).values())
print(counts)
all([c == 2 for c in counts])

[2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1]


False

The reason is as follows.
If `n_items` is a prime number, you must ensure that `n_sets` is a multiple of `n_items`, e.g.

In [5]:
from bwsample import indices_twice

n_items, shuffle = 3, False
n_sets = 3 * n_items

bwsindices, n_examples = indices_twice(n_sets, n_items, shuffle)

bwsindices

[[0, 1, 2],
 [2, 3, 4],
 [4, 5, 6],
 [6, 7, 8],
 [8, 9, 10],
 [10, 11, 12],
 [12, 13, 14],
 [14, 15, 16],
 [16, 17, 0],
 [1, 7, 13],
 [3, 9, 15],
 [5, 11, 17]]