This script attempts to identify simple cyclic dependencies inside the ecosystem. 
It searches for cycles of length 2; i.e., such that A -> B and B -> A where A =/= B.
We don't have to account for the scenario A = A as these are ignored during the dataset generation process.

In addition, it outputs a list of projects involved in such a cycle in an output file.

In [None]:
# Forces packages to be reloaded.

%reload_ext autoreload
%autoreload 2

In [3]:
from python_proj.data_preprocessing.sliding_window_features.dependent_ecosystem_experience import load_dependency_map

dependency_map, proj_name_to_id = load_dependency_map()

Attempting quick load dependencies.
Loading projects and dependencies from: "/workspaces/msc_thesis/data/libraries/npm-libraries-1.6.0-2020-01-12/ql_dependencies.csv".
Finished quick load!
Loaded 631066 projects and 1695834 projects with dependencies.
Loaded dependency data in 0:00:28.838419.


In [4]:
dep_cycles = set()

cp = dependency_map.copy()

for focal in dependency_map.keys():
    for other in dependency_map[focal]:
        if other in cp and focal in cp[other]:
            entry = (focal, other) \
                if focal < other \
                else (other, focal)
            dep_cycles.add(entry)

print(f'{len(dep_cycles)=}')

len(dep_cycles)=37637


Conclusion: the dataset has cyclic dependencies; i.e., where dependencies A -> B and B -> A exist.

A consequence of this is that, when calculating experience for project A, both the dependency and inverse dependency experience contains experience acquired in project B; i.e., duplicate entries. Consequently, when calculating `non_dep_exp = eco_exp - dep_exp - inv_dep_exp`, the results can be negative.

In [None]:
# outputs file with illegal project IDs

from csv import writer

from python_proj.utils.util import flip_dict, flatten
from python_proj.utils.exp_utils import BASE_PATH

project_id_to_name = flip_dict(proj_name_to_id)
cyclic_projects = set(flatten(dep_cycles))


output_file_name = f'{BASE_PATH}/cyclic_dependency_projects.csv'
print(f'{output_file_name=}')
with open(output_file_name, 'w+', encoding='utf-8') as output_file:
    csv_writer = writer(output_file)
    csv_writer.writerow(['proj-id', 'proj-name'])

    for project_id in cyclic_projects:
        if project_id in project_id_to_name:
            project_name = project_id_to_name[project_id]

        csv_writer.writerow([project_id, project_name])

output_file_name='/workspaces/msc_thesis/data//cyclic_dependency_projects.csv'
