Git Analytics Sample Datasets

Datasets are in CSV format, gzipped for storage efficiency. Each dataset consists of 3 files,

<project>_<importance>_feature_vectors.csv.gz - Numpy array saved using numpy.savetext()
<project>_<importance>_labels.csv.gz - Single column binary label for each feature vector. This value is the result of comparing the guilt value with a threshold value, where the threshold value is selected such that the number of labeled commits (value=1) roughly corresponds to the number of actual bugs (subject to minimum importance value)
<project>_<importance>_info.csv.gz - Other info each feature vector
Git Commit ID (SHA)
actual guilt value associated each commit
ordering number of each commit. 1 for first commit in repo.

The sample datasets are all derived from OpenStack. <project> values include:

cinder
glance
heat
nova
swift

The guilt value computed with each commit based based the association of the commit with any subsequent bug fix. Higher importance bugs fix (e.g.: Critical vs High vs Medium vs Low) get higher weight in the guilt calculation. Standard <importance> values include:

critical - only includes Criticalbug fixes
highplus - includes Critical and High bug fixes
mediumplus - includes Critical, High and Medium bug fixes
highplus - includes Critical, High, Medium and Low bug fixes

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
cinder_critical_feature_vectors.csv.gz		cinder_critical_feature_vectors.csv.gz
cinder_critical_info.csv.gz		cinder_critical_info.csv.gz
cinder_critical_labels.csv.gz		cinder_critical_labels.csv.gz
cinder_highplus_feature_vectors.csv.gz		cinder_highplus_feature_vectors.csv.gz
cinder_highplus_info.csv.gz		cinder_highplus_info.csv.gz
cinder_highplus_labels.csv.gz		cinder_highplus_labels.csv.gz
cinder_lowplus_feature_vectors.csv.gz		cinder_lowplus_feature_vectors.csv.gz
cinder_lowplus_info.csv.gz		cinder_lowplus_info.csv.gz
cinder_lowplus_labels.csv.gz		cinder_lowplus_labels.csv.gz
cinder_medplus_feature_vectors.csv.gz		cinder_medplus_feature_vectors.csv.gz
cinder_medplus_info.csv.gz		cinder_medplus_info.csv.gz
cinder_medplus_labels.csv.gz		cinder_medplus_labels.csv.gz
glance_critical_feature_vectors.csv.gz		glance_critical_feature_vectors.csv.gz
glance_critical_info.csv.gz		glance_critical_info.csv.gz
glance_critical_labels.csv.gz		glance_critical_labels.csv.gz
glance_feature_vectors.csv.gz		glance_feature_vectors.csv.gz
glance_highplus_feature_vectors.csv.gz		glance_highplus_feature_vectors.csv.gz
glance_highplus_info.csv.gz		glance_highplus_info.csv.gz
glance_highplus_labels.csv.gz		glance_highplus_labels.csv.gz
glance_info.csv.gz		glance_info.csv.gz
glance_labels.csv.gz		glance_labels.csv.gz
glance_lowplus_feature_vectors.csv.gz		glance_lowplus_feature_vectors.csv.gz
glance_lowplus_info.csv.gz		glance_lowplus_info.csv.gz
glance_lowplus_labels.csv.gz		glance_lowplus_labels.csv.gz
glance_medplus_feature_vectors.csv.gz		glance_medplus_feature_vectors.csv.gz
glance_medplus_info.csv.gz		glance_medplus_info.csv.gz
glance_medplus_labels.csv.gz		glance_medplus_labels.csv.gz
heat_critical_feature_vectors.csv.gz		heat_critical_feature_vectors.csv.gz
heat_critical_info.csv.gz		heat_critical_info.csv.gz
heat_critical_labels.csv.gz		heat_critical_labels.csv.gz
heat_highplus_feature_vectors.csv.gz		heat_highplus_feature_vectors.csv.gz
heat_highplus_info.csv.gz		heat_highplus_info.csv.gz
heat_highplus_labels.csv.gz		heat_highplus_labels.csv.gz
heat_lowplus_feature_vectors.csv.gz		heat_lowplus_feature_vectors.csv.gz
heat_lowplus_info.csv.gz		heat_lowplus_info.csv.gz
heat_lowplus_labels.csv.gz		heat_lowplus_labels.csv.gz
heat_medplus_feature_vectors.csv.gz		heat_medplus_feature_vectors.csv.gz
heat_medplus_info.csv.gz		heat_medplus_info.csv.gz
heat_medplus_labels.csv.gz		heat_medplus_labels.csv.gz
nova_critical_feature_vectors.csv.gz		nova_critical_feature_vectors.csv.gz
nova_critical_info.csv.gz		nova_critical_info.csv.gz
nova_critical_labels.csv.gz		nova_critical_labels.csv.gz
nova_highplus_feature_vectors.csv.gz		nova_highplus_feature_vectors.csv.gz
nova_highplus_info.csv.gz		nova_highplus_info.csv.gz
nova_highplus_labels.csv.gz		nova_highplus_labels.csv.gz
nova_lowplus_feature_vectors.csv.gz		nova_lowplus_feature_vectors.csv.gz
nova_lowplus_info.csv.gz		nova_lowplus_info.csv.gz
nova_lowplus_labels.csv.gz		nova_lowplus_labels.csv.gz
nova_medplus_feature_vectors.csv.gz		nova_medplus_feature_vectors.csv.gz
nova_medplus_info.csv.gz		nova_medplus_info.csv.gz
nova_medplus_labels.csv.gz		nova_medplus_labels.csv.gz
swift_critical_feature_vectors.csv.gz		swift_critical_feature_vectors.csv.gz
swift_critical_info.csv.gz		swift_critical_info.csv.gz
swift_critical_labels.csv.gz		swift_critical_labels.csv.gz
swift_highplus_feature_vectors.csv.gz		swift_highplus_feature_vectors.csv.gz
swift_highplus_info.csv.gz		swift_highplus_info.csv.gz
swift_highplus_labels.csv.gz		swift_highplus_labels.csv.gz
swift_lowplus_feature_vectors.csv.gz		swift_lowplus_feature_vectors.csv.gz
swift_lowplus_info.csv.gz		swift_lowplus_info.csv.gz
swift_lowplus_labels.csv.gz		swift_lowplus_labels.csv.gz
swift_medplus_feature_vectors.csv.gz		swift_medplus_feature_vectors.csv.gz
swift_medplus_info.csv.gz		swift_medplus_info.csv.gz
swift_medplus_labels.csv.gz		swift_medplus_labels.csv.gz

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

Git Analytics Sample Datasets

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

williamsdoug/GitAnalyticsDatasets

Folders and files

Latest commit

History

Repository files navigation

Git Analytics Sample Datasets

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages