# Import Libs

In [None]:
pip install factor_analyzer

In [None]:
import pandas as pd
from factor_analyzer import FactorAnalyzer, Rotator
import matplotlib.pyplot as plt
from factor_analyzer.factor_analyzer import calculate_bartlett_sphericity
from factor_analyzer.factor_analyzer import calculate_kmo

# Tasks 1-7. Given variables

## Pre-task execution

In [None]:
file_name = 'Export_Data.xlsx'

df = pd.read_excel(file_name)
df.head()

In [None]:
factors = [
    'ArmsexportsSIPRItrendindica', # Arms export
    'Commercialserviceexportscurr', # Commercial service exports
    'Communicationscomputeretc', # Communications, computer
    'Travelservicesofcommercial', # Travel services
    'Hightechnologyexportsofma', # High-technology exports
    'Agriculturalrawmaterialsexpor', # Agricultural raw materials exports
    'Foodexportsofmerchandisee', # Food exports
    'Transportservicesofservice', # Transport services
    'Fuelexportsofmerchandisee', # Fuel exports
]

factors_index = [
    'Arms export',
    'Commercial service exports',
    'Communications, computer',
    'Travel services',
    'High-technology exports',
    'Agricultural raw materials exports',
    'Food exports',
    'Transport services',
    'Fuel exports'
]

df_factors = df[factors]
df_factors = df_factors.dropna()
df_factors.head()

## Task 1. Factor analysis

### Number of factors

In [None]:
fa = FactorAnalyzer(n_factors=len(df_factors), rotation='varimax', method='principal') 
fa.fit(df_factors)

In [None]:
ev, v = fa.get_eigenvalues()
ev # 9 eigenvalues

In [None]:
ev[ev > 1]

It means, that we have 4 factors for factor analysis

### Scree Plot

In [None]:
plt.figure(figsize=(15, 7))
plt.scatter(range(1, df_factors.shape[1]+1), ev)
plt.plot(range(1, df_factors.shape[1]+1), ev)
plt.axhline(y=1, color='r', linestyle='-')
plt.title('Scree Plot')
plt.xlabel('Factors')
plt.ylabel('Eigenvalue')
plt.grid()
plt.show()

### Final model

In [None]:
fa = FactorAnalyzer(n_factors=4, rotation='varimax', method='principal') 
fa.fit(df_factors)

## Task 2. Data factorability

In [None]:
calculate_bartlett_sphericity(df_factors) #Bartlett's test of Sphericity

In [None]:
_, kmo_model = calculate_kmo(df_factors)
kmo_model #KMO

H0: variables are unrelated (correlation matrix is an identity matrix)

As p-value < 0.05, H0 is rejected, there are statistically significant correlations between the variables.

KMO=0.56, coefficient has a middle value. <br>KMO >= 0.5.<br>It means, that it possible to make a factor analysis, but the data is miserable



## Task 3. Communalities

In [None]:
fa.get_communalities()

In [None]:
comm = pd.DataFrame(data=fa.get_communalities(), 
                       index=df_factors.columns, 
                       columns=['Communalities'])
comm.index = factors_index
comm.sort_values('Communalities') 

C = Communalities <br><br>
High-technology exports: low C => the variable did not fit into the model <br>
Fuel exports: high C => the variable fit into the model <br>
Communications, computer: high C => the variable fit into the model <br>
Travel services: high C => the variable fit into the model <br>
Food exports: high C => the variable fit into the model <br>
Transport services: high C => the variable fit into the model <br>
Agricultural raw materials exports: high C => the variable fit into the model <br>
Commercial service exports: high C => the variable fit into the model <br>
Arms export: high C => the variable fit into the model

## Task 4. Cumulative variance

In [None]:
cumulative_variances = fa.get_factor_variance()[2]
cumulative_variances

In [None]:
factor_variance = pd.DataFrame(data=fa.get_factor_variance()[1:4],
                              index=['Proportional variance', 'Cumulative variance'],
                              columns=['Factor_1', 'Factor_2', 'Factor_3', 'Factor_4'])
factor_variance

**Factor 1** explains 20.3%, but factor model with 1 factor explains 20.3% of the total variance of the original 9 variables <br>
**Factor 2** explains 17.8%, but factor model with 2 factor explains 38.1% of the total variance of the original 9 variables <br>
**Factor 3** explains 20.8%, but factor model with 3 factor explain 58.9% of the total variance of the original 9 variables <br>
**Factor 4** explains 19.7%, but factor model with 4 factor explains 78.7% of the total variance of the original 9 variables

## Task 5. Loading matrix

In [None]:
fa.loadings_

In [None]:
loading_matrix = pd.DataFrame(data=fa.loadings_, 
                       index=df_factors.columns, 
                       columns=['Factor_1', 'Factor_2', 'Factor_3', 'Factor_4'])
loading_matrix.index = factors_index

In [None]:
loading_matrix

**Arms export** mostly affects (high) the **Factor 3** (direct proportionality) <br>
**Commercial service exports** mostly affects (high) the **Factor 3** (direct proportionality) <br>
**Communications, computer** mostly affects (high) the **Factor 1** (direct proportionality) <br>
**Travel services** mostly affects (high) the **Factor 1** (inverse proportionality) <br>
**High-technology exports** mostly affects (middle) the **Factor 1 and Factor 3** (direct proportionality) and **Factor 2 and Factor 4** (inverse proportionality) <br>
**Agricultural raw materials exports** mostly affects (high) the **Factor 4** (direct proportionality) <br>
**Food exports** mostly affects (high) the **Factor 4** (direct proportionality) <br>
**Transport services** mostly affects (high) the **Factor 2** (direct proportionality) <br>
**Fuel exports** mostly affects (high) the **Factor 2** (direct proportionality)

## Task 6. Describe factors

**Factor 1** <br>
The country is concentrated on the high-tech export and computers etc. export, that's why travel services export is low. We can called that countries as 'High-tech countries'. <br>
That factor evaluates the percentage of export of high-tech and of computers and the influence of travel export onto economy level of the country. <br>
**FACTOR NAME** = High-tech countries

**Factor 2** <br>
The country is concentrated on the trasport service and trasport support service (fuel). We can called that counteis as 'Transport countries'. <br>
That factor evaluates the percentage of export of transport services and of fuel export of the country <br>
**FACTOR NAME** = Transport countries

**Factor 3** <br>
The country is concentrated on the army and commercial services. We can called that countries as 'Commercial and Military Countries' <br>
That factor evaluates the percentage of export of arms and commercial services of the country. <br>
**FACTOR NAME** = Commercial and Military Countries

**Factor 4** <br>
The country is concentrated on the agriculture export (food and raw materials). We can called that countries as 'Agricultural countries' <br>
That factor evaluates the percentage of agriculture export of the country <br>
**FACTOR NAME** = Agricultural countries

In [None]:
factor_names = [
    'High-tech countries', 
    'Transport countries', 
    'Commercial and Military Countries', 
    'Agricultural countries'
]

## Task 7. Save factors to variable

In [None]:
transformed = pd.DataFrame(data=fa.transform(df_factors),
                           index=df_factors.index,
                           columns=factor_names)
transformed

In [None]:
df = pd.concat([df, transformed], axis=1)

In [None]:
df[df['High-tech countries'].notna()]

# Task 8. My own variables

## Pre-task execution

In [None]:
file_name = 'Export_Data.xlsx'

df = pd.read_excel(file_name)
df.head()

In [None]:
factors = [
    'ExportsofgoodsandservicesB', # Exports of goods and services
    'Exportsofgoodsandservicesa', # Exports of goods and services (growth)
    'GoodsexportsBoPcurrentUS', # Goods exports
    'Commercialserviceexportscurr', # Commercial service exports
    'Hightechnologyexportscurrent', # High-technology exports
    'Transportservicesofservice', # Transport services
    'Travelservicesofserviceex', # Travel services
    'Agriculturalrawmaterialsexpor', # Agricultural raw materials exports
    'ArmsexportsSIPRItrendindica', # Arms exports
    'Foodexportsofmerchandisee', # Food exports
]

factors_index = [
    'Exports of goods and services',
    'Exports of goods and services (growth)',
    'Goods exports',
    'Commercial service exports',
    'High-technology exports',
    'Transport services',
    'Travel services',
    'Agricultural raw materials exports',
    'Arms exports',
    'Food exports'
]

df_factors = df[factors]
df_factors = df_factors.dropna()
df_factors

## Task 1. Data factorability

In [None]:
calculate_bartlett_sphericity(df_factors) #Bartlett's test of Sphericity

In [None]:
_, kmo_model = calculate_kmo(df_factors)
kmo_model #KMO

H0: variables are unrelated (correlation matrix is an identity matrix)

As p-value < 0.05, H0 is rejected, there are statistically significant correlations between the variables.

KMO=0.62, coefficient has a middle value. <br>KMO >= 0.6.<br>It means, that it possible to make a factor analysis, but the data is mediocre



## Task 2. Factor analysis

### Number of factors

In [None]:
fa = FactorAnalyzer(n_factors=len(df_factors), rotation='varimax', method='principal') 
fa.fit(df_factors)

In [None]:
ev, v = fa.get_eigenvalues()
ev # 10 eigenvalues

In [None]:
ev[ev > 1]

It means, that we have 3 factors for factor analysis

### Scree Plot

In [None]:
plt.figure(figsize=(15, 7))
plt.scatter(range(1, df_factors.shape[1]+1), ev)
plt.plot(range(1, df_factors.shape[1]+1), ev)
plt.axhline(y=1, color='r', linestyle='-')
plt.title('Scree Plot')
plt.xlabel('Factors')
plt.ylabel('Eigenvalue')
plt.grid()
plt.show()

### Final model

In [None]:
fa = FactorAnalyzer(n_factors=3, rotation='varimax', method='principal') 
fa.fit(df_factors)

## Task 3. Communalities

In [None]:
fa.get_communalities()

In [None]:
comm = pd.DataFrame(data=fa.get_communalities(), 
                       index=df_factors.columns, 
                       columns=['Communalities'])
comm.index = factors_index
comm.sort_values('Communalities') 

C = Communalities <br><br>
Arms export: mid C => the variable fit into the model (middle) <br>
Travel services: mid C => the variable fit into the model (middle) <br>
Exports of goods and services (growth): mid C => the variable fit into the model (middle) <br>
Agricultural raw materials export: mid C => the variable fit into the model (middle) <br>
Transport services: high C => the variable fit into the model <br>
High-technology exports: high C => the variable fit into the model <br>
Food exports: high C => the variable fit into the model <br>
Commercial service exports: high C => the variable fit into the model <br>
Goods exports: high C => the variable fit into the model <br>
Exports of goods and services: high C => the variable fit into the model

## Task 4. Cumulative variance

In [None]:
cumulative_variances = fa.get_factor_variance()[2]
cumulative_variances

In [None]:
factor_variance = pd.DataFrame(data=fa.get_factor_variance()[1:4],
                              index=['Proportional variance', 'Cumulative variance'],
                              columns=['Factor_1', 'Factor_2', 'Factor_3'])
factor_variance

**Factor 1** explains 42.02%, but factor model with 1 factor explains 42.02% of the total variance of the original 10 variables <br>
**Factor 2** explains 20.75%, but factor model with 2 factor explains 62.78% of the total variance of the original 10 variables <br>
**Factor 3** explains 12.18%, but factor model with 3 factor explain 74.96% of the total variance of the original 10 variables

## Task 5. Loading matrix

In [None]:
fa.loadings_

In [None]:
loading_matrix = pd.DataFrame(data=fa.loadings_, 
                       index=df_factors.columns, 
                       columns=['Factor_1', 'Factor_2', 'Factor_3'])
loading_matrix.index = factors_index

In [None]:
loading_matrix

**Exports of goods and services** mostly affects (high) the **Factor 1** (direct proportionality) <br>
**Exports of goods and services (growth)** mostly affects (mid) the **Factor 3** (direct proportionality) <br>
**Goods exports** mostly affects (high) the **Factor 1** (direct proportionality) <br>
**Commercial service exports** mostly affects (high) the **Factor 1** (inverse proportionality) <br>
**High-technology exports** mostly affects (high) the **Factor 1** (direct proportionality) <br>
**Transport services** mostly affects (high) the **Factor 3** (inverse proportionality) <br>
**Travel services** mostly affects (mid) the **Factor 2 and Factor 3** (direct proportionality) <br>
**Agricultural raw materials exports** mostly affects (high) the **Factor 2** (direct proportionality) <br>
**Arms exports** mostly affects (high) the **Factor 1** (direct proportionality)
**Food exports** mostly affects (high) the **Factor 2** (direct proportionality)

## Task 6. Describe factors

**Factor 1** <br>
The country is concentrated on the export of goods, services, high-tech and arms. We can called that countries as 'Developed countries' (lots of export is good lvl) <br>
That factor evaluates export of goods and services, of goods, of commercial service, of high-tech and of arms. <br>
**FACTOR NAME** = Developed countries

**Factor 2** <br>
The country is concentrated on the export of travel, of argriculture raw materials and of food. We can called that countries as 'Agricultural countries' <br>
That factor evaluates export of travel, of argriculture raw materials and of food <br>
**FACTOR NAME** = Agricultural countries

**Factor 3** <br>
The country is concentrated on the export of travel (mid/low). Moreover, countries is a annual growth of goods and services. In that case, transport services export goes down. We can called that countries as 'Devloping countries'<br>
That factor evaluates export of transport and influence of growth on transport service. <br>
**FACTOR NAME** = Developing countries

In [None]:
factor_names = [
    'Developed countries',
    'Agricultural countries',
    'Developing countries'
]

## Task 7. Save factors to variable

In [None]:
transformed = pd.DataFrame(data=fa.transform(df_factors),
                           index=df_factors.index,
                           columns=factor_names)
transformed

In [None]:
df = pd.concat([df, transformed], axis=1)

In [None]:
df[df['Developed countries'].notna()]