Skip to content

Similarity measures for sets using fast bit vectors (BV)

License

Notifications You must be signed in to change notification settings

wollmers/Set-Similarity-BV

Repository files navigation

NAME

Set::Similarity::BV - similarity measures for sets using fast bit vectors (BV)

Set-Similarity-BV Coverage Status Kwalitee Score CPAN version

SYNOPSIS

use Set::Similarity::BV::Dice;

# object method
my $dice = Set::Similarity::BV::Dice->new;
my $similarity = $dice->similarity('af09ff','9c09cc');

# class method
my $dice = 'Set::Similarity::BV::Dice';
my $similarity = $dice->similarity('af09ff','9c09cc');

DESCRIPTION

This is the base class including mainly helper and convenience methods.

Use one of the child classes:

Set::Similarity::BV::Cosine

Set::Similarity::BV::Dice

Set::Similarity::BV::Jaccard

Set::Similarity::BV::Overlap

Overlap coefficient

( A intersect B ) / min(A,B)

Jaccard Index

The Jaccard coefficient measures similarity between sample sets, and is defined as the size of the intersection divided by the size of the union of the sample sets

( A intersect B ) / (A union B)

The Tanimoto coefficient is the ratio of the number of features common to both sets to the total number of features, i.e.

( A intersect B ) / ( A + B - ( A intersect B ) ) # the same as Jaccard

The range is 0 to 1 inclusive.

Dice coefficient

The Dice coefficient is the number of features in common to both sets relative to the average size of the total number of features present, i.e.

( A intersect B ) / 0.5 ( A + B ) # the same as sorensen

The weighting factor comes from the 0.5 in the denominator. The range is 0 to 1.

METHODS

All methods can be used as class or object methods.

new

$object = Set::Similarity::BV->new();

similarity

my $similarity = $object->similarity($hex1,$hex2);

$hex is a string of hexadecimal characters.

from_integers

my $similarity = $object->from_integers($AoI1,$AoI2);

Croaks if called directly. This method should be implemented in a child module.

intersection

my $intersection_size = $object->intersection($AoI1,$AoI2);

$AoI is an array reference of integers. Returns the length of the intersection.

combined_length

my $set_size_sum = $object->combined_length($AoI1,$AoI2);

$AoI is an array reference of integers.

min

my $min = $object->min($int1,$int2);

bits

my $bits = $object->bits($int);

Returns the number of bits set in integer.

SEE ALSO

Set::Similarity::BV::Cosine

Set::Similarity::BV::Dice

Set::Similarity::BV::Jaccard

Set::Similarity::BV::Overlap

SOURCE REPOSITORY

http://github.com/wollmers/Set-Similarity-BV

AUTHOR

Helmut Wollmersdorfer, <helmut.wollmersdorfer@gmail.com>

Kwalitee Score

COPYRIGHT AND LICENSE

Copyright (C) 2016 by Helmut Wollmersdorfer

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

About

Similarity measures for sets using fast bit vectors (BV)

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages