Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing Gini coefficient in metric.py #72

Closed
FanwangM opened this issue May 13, 2022 · 4 comments
Closed

Implementing Gini coefficient in metric.py #72

FanwangM opened this issue May 13, 2022 · 4 comments
Assignees
Labels

Comments

@FanwangM
Copy link
Collaborator

No description provided.

@FanwangM
Copy link
Collaborator Author

The related paper is Journal of Computational Chemistry2016,37, 2091–2097

This relates to #4.

@Khaleeh
Copy link
Collaborator

Khaleeh commented May 17, 2022

Just as a warning me and @PaulWAyers tried to implement this and concluded that the equation is wrong.

@PaulWAyers
Copy link
Member

I'll look at it again and figure out the right equation and put it here.

@PaulWAyers
Copy link
Member

PaulWAyers commented May 17, 2022

The equation given is correct only for the case where the data is uniformly distributed. Then Eq.(1) in this paper is identical to the second expression on the "alternative expressions" list on wikipedia

  • bitstrings
  • lists/arrays of descriptors where the data is centered and normalized.

Then, for each molecule, we have a feature vector (or bitstring) with length L , count(i) where i=0,1,2,...L is the sum of the feature-values for each feature i over all the molecules. I.e.
count(i) = sum( m in molecules) features(m,i)

  1. sort the count vector in increasing order. This is just np.sort(count).
  2. evaluate Eq. (1) with the sorted vectors.

I think it is nice to avoid sorting. Then you can use:
sum(i,j) |count(i)-count(j)| / (2*L**2 * mean(count(:))

This is the third equation in the first line of https://en.wikipedia.org/wiki/Gini_coefficient#Definition and it might be a bit slower than the presorted version, but it does save the work (and code complexity) from the sort.

Just using Eq. (1) is fine though.

JackyZzZz pushed a commit to JackyZzZz/Selector that referenced this issue Jun 14, 2024
FanwangM added a commit that referenced this issue Jul 2, 2024
Add Gini coefficient, fixes #72
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants