The NACE Rev. 2 classification (Statistical Classification of Economic Activities in the European Community) is a standardized system for categorizing economic activities. It consists of 21 broad sections (identified by letters A–U), further divided into divisions, groups, and classes.

Here are the 21 broad sections of NACE Rev. 2:


A. Agriculture, Forestry and Fishing

    Crop and animal production, hunting and related service activities
    Forestry and logging
    Fishing and aquaculture

B. Mining and Quarrying

    Mining of coal and lignite
    Extraction of crude petroleum and natural gas
    Mining of metal ores
    Other mining and quarrying
    Mining support service activities

C. Manufacturing

    Manufacture of food products, beverages, and tobacco products
    Manufacture of textiles, clothing, leather, and related products
    Manufacture of wood and paper products
    Manufacture of chemicals, pharmaceuticals, rubber, and plastic products
    Manufacture of basic metals and fabricated metal products
    Manufacture of machinery, equipment, and transport vehicles
    Other manufacturing activities

D. Electricity, Gas, Steam, and Air Conditioning Supply

    Production and distribution of electricity
    Distribution of gaseous fuels
    Steam and air conditioning supply

E. Water Supply; Sewerage, Waste Management, and Remediation Activities

    Water collection, treatment, and supply
    Sewerage
    Waste collection, treatment, and disposal activities
    Remediation activities and other waste management services

F. Construction

    Construction of buildings
    Civil engineering
    Specialized construction activities

G. Wholesale and Retail Trade; Repair of Motor Vehicles and Motorcycles

    Wholesale and retail trade and repair of motor vehicles
    Wholesale trade (except motor vehicles)
    Retail trade (except motor vehicles)

H. Transportation and Storage

    Land transport and transport via pipelines
    Water transport
    Air transport
    Warehousing and support activities for transportation
    Postal and courier activities

I. Accommodation and Food Service Activities

    Accommodation
    Food and beverage service activities

J. Information and Communication

    Publishing activities
    Motion picture, video, and television production, sound recording, and music publishing
    Telecommunications
    Computer programming, consultancy, and related activities
    Information service activities

K. Financial and Insurance Activities

    Financial service activities
    Insurance, reinsurance, and pension funding
    Activities auxiliary to financial services and insurance

L. Real Estate Activities

    Real estate activities

M. Professional, Scientific, and Technical Activities

    Legal and accounting activities
    Management consultancy
    Architectural and engineering activities
    Scientific research and development
    Advertising and market research
    Other professional, scientific, and technical activities
    Veterinary activities

N. Administrative and Support Service Activities

    Rental and leasing activities
    Employment activities
    Travel agency, tour operator, and reservation services
    Security and investigation activities
    Services to buildings and landscape activities
    Office administrative and support activities

O. Public Administration and Defence; Compulsory Social Security

    Administration of the state, economic and social policy
    Defence activities
    Provision of services for the community

P. Education

    Education

Q. Human Health and Social Work Activities

    Human health activities
    Residential care activities
    Social work activities without accommodation

R. Arts, Entertainment, and Recreation

    Creative, arts, and entertainment activities
    Libraries, archives, museums, and other cultural activities
    Sports, amusement, and recreation activities

S. Other Service Activities

    Activities of membership organizations
    Repair of computers and personal goods
    Other personal service activities

T. Activities of Households as Employers; Undifferentiated Goods- and Services-Producing Activities of Households for Own Use

    Activities of households as employers
    Undifferentiated goods- and services-producing activities of households for own use

U. Activities of Extraterritorial Organizations and Bodies

    Activities of international organizations (e.g., the United Nations, embassies)

These sections form the broadest categories of the NACE Rev. 2 classification. They are further broken down into more detailed divisions, groups, and classes. Let me know if you'd like the divisions or deeper breakdowns!



<span style="color: DodgerBlue;">
==============================================================================================  
    
### Libraries

==============================================================================================  
</span>

In [1]:
import pandas as pd
import numpy as np
import csv

In [3]:
pd.set_option('display.max_columns', 1000)
pd.set_option('display.max_rows', 1000)

<span style="color: DodgerBlue;">
==============================================================================================  
    
### **data_import.ipynb** starts from the naio dataset to prepare this reduced version.  
### source: https://ec.europa.eu/eurostat/databrowser/view/naio_10_cp1610__custom_13696232/default/table  

### year 2022, using the table at basic prices [naio_10_cp1610__custom_13696232], i.e., without VAT

### Columns "Added value, gross" and "Compensation of employees" are trasposed from the related rows

==============================================================================================  
</span>  

In [None]:
mini_naio = pd.read_pickle('mini_naio.xp')
sbs = pd.read_pickle('sbs.xp')
lc = pd.read_pickle('lc.xp')
nama = pd.read_pickle('nama_assets.xp')

# Unit of measure:   Million euro
mini_naio

<span style="color: DodgerBlue;">
==============================================================================================  
    
### Calculating the shares of production of intermediate good, investment goods, 
### consumption goods

==============================================================================================  
</span>  

In [None]:
mini_naio['Consumption good share'] = mini_naio['Final consumption expenditure by households']/\
    (mini_naio['Final consumption expenditure by households']+ mini_naio['Gross fixed capital formation']\
    +mini_naio['Total intermediate goods'])

mini_naio['Investment good share'] = mini_naio['Gross fixed capital formation']/\
    (mini_naio['Final consumption expenditure by households']+ mini_naio['Gross fixed capital formation']\
    +mini_naio['Total intermediate goods'])

mini_naio['Intermediate good share'] = mini_naio['Total intermediate goods']/\
    (mini_naio['Final consumption expenditure by households']+ mini_naio['Gross fixed capital formation']\
    +mini_naio['Total intermediate goods'])

pd.set_option('display.max_colwidth', 100)
mini_naio

<span style="color: DodgerBlue;">
==============================================================================================  
    
### **sbs** reports the number of employees by sectors and dimensional classes of the firms
### Sectors are less detailed than in naio

### The total number of employees, from sbs['Total'].sum(), is 
## 156,946,662 (157 millions) not consistent with the sum of the classes so we will use the sum of the classes, see below
==============================================================================================  
</span>  

In [None]:
pd.set_option('display.max_colwidth', 100)
sbs

In [None]:
pd.set_option('future.no_silent_downcasting', True)
sbs=sbs.fillna(0, inplace=False)

In [None]:
#sbs

In [None]:
# We add a row to sbs to manage sectors 61-64 of naio
# to fix Total e Total2 we use the same wage of sectors 59-60, with a naio value of compensantion of employees of 
# 51425.54 + 52098.60, with 2192933	workers from sbs row 15, giving 0.0472080725 millions per capita, so 47208.0725 €
# the sum of compensations of employees 61-64 interval is 80847.76+8549.41+35311.70+44398.89 => 169107.76 => 169107
# columns 3 and 5 are ignored, the distribution is quite arbitrary, anyway on Eu27 scale these are tiny values

sbs.loc[16]=['Artificial row related to naio 61-64 sectors',169107,0,89107,0,20000,20000,20000,20000]
sbs

<span style="color: DodgerBlue;">
==============================================================================================  
    
### The Labor Cost table, 2023, from https://ec.europa.eu/eurostat/databrowser/product/page/lc_lci_lev__custom_13900260, is used uniquely for rows 864-923 of ff, to set the wage, employing row 26 of lc (Other service activities)
### We do that because the sbs table does not provide employees data for these sectors

ff.iloc[864:879, ff.columns.get_loc('Wage')] = lc['lc'].loc[26] * 1e3 #mini_naio['Compensation of employees'].loc[61]  
ff.iloc[879:894, ff.columns.get_loc('Wage')] = lc['lc'].loc[26] * 1e3 #mini_naio['Compensation of employees'].loc[62]  
ff.iloc[894:909, ff.columns.get_loc('Wage')] = lc['lc'].loc[26] * 1e3 #mini_naio['Compensation of employees'].loc[63]  
ff.iloc[909:924, ff.columns.get_loc('Wage')] = lc['lc'].loc[26] * 1e3 #mini_naio['Compensation of employees'].loc[64]  

==============================================================================================  
</span>  


In [None]:
#data 2023
pd.set_option('display.max_colwidth', 100)
lc

## computing labor quantity for ff

<span style="color: DodgerBlue;">
==============================================================================================  
    
### The number of firm is obtained using the classes of **sbs**, dividing the number of employees by the center of the class

### The upper value of the open last class is set to 1850 by choice, anyway using a different upper values changes in a limited way the total number of firms

### (at https://www.statista.com/statistics/1248775/number-of-businesses-eu/, year 2022, 25 millions of non financial business activities, certainly including independent workers non included in the IOT)

### Agriculture values are set in a consistent way to this [source](https://ec.europa.eu/eurostat/statistics-explained/index.php?title=Farmers_and_the_agricultural_labour_force_-_statistics#:~:text=Agriculture%20remains%20a%20big%20employer%20in%20the%20EU%3B%20about%208.7,an%20estimated%208.7%20million%20persons) rounding 8.7 millions to 9; the owners of the farms have their conmpensation in (Added value, gross - Compensation of employees) large amount for agricultural sectors

==============================================================================================  
</span>  

In [None]:
column_names = ['#','NACE definition', 'type', 'dimensional class', 'Share of firms',\
                'Share of firms per sbs sector', 'sbs reference row'] 

# type = consumption / investment / intermediate good

# dimensional class gives the range of number of workers per class

# share of firms is computed by considering the weight of each subsector and each type (C-I-Int) on the total

# share of firms per sbs is computed by considering the weigth of each subsector in the sbs sector 
# considering compensation of employees and not considering type (agriculture is calculated aside)

ff = pd.DataFrame(columns=column_names)

workforce = sbs['From 0 to 9 persons employed'].sum()+\
sbs['From 10 to 19 persons employed'].sum()+\
sbs['From 20 to 49 persons employed'].sum()+\
sbs['From 50 to 249 persons employed'].sum()+\
sbs['250 persons employed or more'].sum()

agriculture = 9000000
workforce +=  agriculture
agricultureFirms = 10000000

# 0-9 in test-ff starts from 0, accepting firms with 0 employees, 1.6 is the calculated mean of the class
numberOfFirms = sbs['From 0 to 9 persons employed'].sum() /1.6\
              + sbs['From 10 to 19 persons employed'].sum() / 13.45\
              + sbs['From 20 to 49 persons employed'].sum() / 30.2\
              + sbs['From 50 to 249 persons employed'].sum() / 98.8\
              + sbs['250 persons employed or more'].sum() / 1144.3\
              + agricultureFirms 
numberOfFirms = int(numberOfFirms)
(workforce-agriculture, agriculture,numberOfFirms)


<span style="color: DodgerBlue;">
==============================================================================================  
    
### Share calculation for agriculture, using compensation of employees, but in this case it would be also possible to use the whole added value
==============================================================================================  
</span>  

In [None]:
agric_tot = (mini_naio['Compensation of employees'].loc[1] \
               + mini_naio['Compensation of employees'].loc[2]\
               + mini_naio['Compensation of employees'].loc[3])

agricultureSectorWeight = mini_naio['Compensation of employees'].loc[1]/ agric_tot 
silvicultureSectorWeight = mini_naio['Compensation of employees'].loc[2]/ agric_tot
fishingSectorWeight = mini_naio['Compensation of employees'].loc[3]/ agric_tot

# agriculture
share = (agricultureFirms/numberOfFirms) * mini_naio['Compensation of employees'].loc[1]/ agric_tot \
                                    * mini_naio['Consumption good share'].loc[1]
ff.loc[0] = [1, "Agriculture", "C", "From 0 to 9 persons employed", share, agricultureSectorWeight, np.nan]

share = (agricultureFirms/numberOfFirms) * mini_naio['Compensation of employees'].loc[1]/ agric_tot \
                                    * mini_naio['Investment good share'].loc[1]
ff.loc[1] = [1,"Agriculture", "I", "From 0 to 9 persons employed", share, agricultureSectorWeight, np.nan]

share = (agricultureFirms/numberOfFirms) * mini_naio['Compensation of employees'].loc[1]/ agric_tot \
                                    * mini_naio['Intermediate good share'].loc[1]
ff.loc[2] = [1, "Agriculture", "Int", "From 0 to 9 persons employed", share, agricultureSectorWeight, np.nan]

# silviculture
share = (agricultureFirms/numberOfFirms) * mini_naio['Compensation of employees'].loc[2]/ agric_tot \
                                    * mini_naio['Consumption good share'].loc[2]
ff.loc[3] = [2, "Silviculture", "C", "From 0 to 9 persons employed", share, silvicultureSectorWeight, np.nan]

share = (agricultureFirms/numberOfFirms) * mini_naio['Compensation of employees'].loc[2]/ agric_tot \
                                    * mini_naio['Investment good share'].loc[2]
ff.loc[4] = [2, "Silviculture", "I", "From 0 to 9 persons employed", share, silvicultureSectorWeight, np.nan]

share = (agricultureFirms/numberOfFirms) * mini_naio['Compensation of employees'].loc[2]/ agric_tot \
                                    * mini_naio['Intermediate good share'].loc[2]
ff.loc[5] = [2, "Silviculture", "Int", "From 0 to 9 persons employed", share, silvicultureSectorWeight, np.nan]

# fishing (no investments, but with the row anyway)
share = (agricultureFirms/numberOfFirms) * mini_naio['Compensation of employees'].loc[3]/ agric_tot\
                                    * mini_naio['Consumption good share'].loc[3]              
ff.loc[6] = [3, "Fishing", "C", "From 0 to 9 persons employed", share, fishingSectorWeight, np.nan]

share = (agricultureFirms/numberOfFirms) * mini_naio['Compensation of employees'].loc[3]/ agric_tot\
                                    * mini_naio['Investment good share'].loc[3]              
ff.loc[7] = [3, "Fishing", "I", "From 0 to 9 persons employed", share, fishingSectorWeight, np.nan]

share = (agricultureFirms/numberOfFirms) * mini_naio['Compensation of employees'].loc[3]/ agric_tot\
                                    * mini_naio['Intermediate good share'].loc[3]               
ff.loc[8] = [3, "Fishing", "Int", "From 0 to 9 persons employed", share, fishingSectorWeight, np.nan]

print("****************", numberOfFirms)

In the following cell, we compute  the share of firms per each row of the firm-feature file that we are generating (each row describes the specification of the NACE sector, the type of good it produces &mdash; consumption, investment, intermediate &mdash; and its dimensional class in terms of number of employees).
We calculate the share as the ratio between number of firms of the sector (estimated through the number of workers per dimensional class in standard cases, with some exception, e.g. agriculture) and the total number of firms; whereas we decompose consumptions, investments, and intermediate goods by using the Compensations of employees.

<span style="color: DodgerBlue;">
==============================================================================================  
    
### Share calculation by sectors and dimensions
### The function shareCalculation() uses the Compensation of employees as wheights

### Problem: the case of Real estate services with a super high capital intensity with Compensation of employees reprensenting only the 5% of the Added value, gross (ff rows 609-639)

### Also in this case, employees compensations give a reasonable dimension of the firms' number; the recipe, very huge, states the correct amount of capital

==============================================================================================  
</span>  

In [None]:
def shareCalculation(ffRow,miniNaioRow, miniNaioRange, naceDef,sbsRow):
    global numberOfFirms
    
    totalCompensationPerSector = sum(list(mini_naio['Compensation of employees'].loc[i] for i in miniNaioRange))
    sectorWeight = mini_naio['Compensation of employees'].loc[miniNaioRow] / totalCompensationPerSector
    
    

    share = sectorWeight * (sbs['From 0 to 9 persons employed'][sbsRow] / 1.6) / numberOfFirms * mini_naio['Consumption good share'].loc[miniNaioRow]
    ff.loc[ffRow] = [miniNaioRow, naceDef, "C", "From 0 to 9 persons employed", share, sectorWeight, sbsRow]

    share = sectorWeight * (sbs['From 10 to 19 persons employed'][sbsRow] / 13.45) / numberOfFirms * mini_naio['Consumption good share'].loc[miniNaioRow]
    ff.loc[ffRow+1] = [miniNaioRow, naceDef, "C", "From 10 to 19 persons employed", share, sectorWeight, sbsRow]

    share = sectorWeight * (sbs['From 20 to 49 persons employed'][sbsRow] / 30.2) / numberOfFirms * mini_naio['Consumption good share'].loc[miniNaioRow]
    ff.loc[ffRow+2] = [miniNaioRow, naceDef, "C", "From 20 to 49 persons employed", share, sectorWeight, sbsRow]

    share = sectorWeight * (sbs['From 50 to 249 persons employed'][sbsRow] / 98.8) / numberOfFirms * mini_naio['Consumption good share'].loc[miniNaioRow]
    ff.loc[ffRow+3] = [miniNaioRow, naceDef, "C", "From 50 to 249 persons employed", share, sectorWeight, sbsRow]

    share = sectorWeight * (sbs['250 persons employed or more'][sbsRow] / 1144.3) / numberOfFirms * mini_naio['Consumption good share'].loc[miniNaioRow]
    ff.loc[ffRow+4] = [miniNaioRow, naceDef, "C", "250 persons employed or more", share, sectorWeight, sbsRow]

    share = sectorWeight * (sbs['From 0 to 9 persons employed'][sbsRow] / 1.6) / numberOfFirms * mini_naio['Investment good share'].loc[miniNaioRow]
    ff.loc[ffRow+5] = [miniNaioRow, naceDef, "I", "From 0 to 9 persons employed", share, sectorWeight, sbsRow]
    
    share = sectorWeight * (sbs['From 10 to 19 persons employed'][sbsRow] / 13.45) / numberOfFirms * mini_naio['Investment good share'].loc[miniNaioRow]
    ff.loc[ffRow+6] = [miniNaioRow, naceDef, "I", "From 10 to 19 persons employed", share, sectorWeight, sbsRow]

    share = sectorWeight * (sbs['From 20 to 49 persons employed'][sbsRow] / 30.2) / numberOfFirms * mini_naio['Investment good share'].loc[miniNaioRow]
    ff.loc[ffRow+7] = [miniNaioRow, naceDef, "I", "From 20 to 49 persons employed", share, sectorWeight, sbsRow]

    share = sectorWeight * (sbs['From 50 to 249 persons employed'][sbsRow] / 98.8) / numberOfFirms * mini_naio['Investment good share'].loc[miniNaioRow]
    ff.loc[ffRow+8] = [miniNaioRow, naceDef, "I", "From 50 to 249 persons employed", share, sectorWeight, sbsRow]

    share = sectorWeight * (sbs['250 persons employed or more'][sbsRow] / 1144.3) / numberOfFirms * mini_naio['Investment good share'].loc[miniNaioRow]
    ff.loc[ffRow+9] = [miniNaioRow, naceDef, "I", "250 persons employed or more", share, sectorWeight, sbsRow]

    share = sectorWeight * (sbs['From 0 to 9 persons employed'][sbsRow] / 1.6) / numberOfFirms * mini_naio['Intermediate good share'].loc[miniNaioRow]
    ff.loc[ffRow+10] = [miniNaioRow, naceDef, "Int", "From 0 to 9 persons employed", share, sectorWeight, sbsRow]
    
    share = sectorWeight * (sbs['From 10 to 19 persons employed'][sbsRow] / 13.45) / numberOfFirms * mini_naio['Intermediate good share'].loc[miniNaioRow]
    ff.loc[ffRow+11] = [miniNaioRow, naceDef, "Int", "From 10 to 19 persons employed", share, sectorWeight, sbsRow]

    share = sectorWeight * (sbs['From 20 to 49 persons employed'][sbsRow] / 30.2) / numberOfFirms * mini_naio['Intermediate good share'].loc[miniNaioRow]
    ff.loc[ffRow+12] = [miniNaioRow, naceDef, "Int", "From 20 to 49 persons employed", share, sectorWeight, sbsRow]

    share = sectorWeight * (sbs['From 50 to 249 persons employed'][sbsRow] / 98.8) / numberOfFirms * mini_naio['Intermediate good share'].loc[miniNaioRow]
    ff.loc[ffRow+13] = [miniNaioRow, naceDef, "Int", "From 50 to 249 persons employed", share, sectorWeight, sbsRow]

    share = sectorWeight * (sbs['250 persons employed or more'][sbsRow] / 1144.3) / numberOfFirms * mini_naio['Intermediate good share'].loc[miniNaioRow]
    ff.loc[ffRow+14] = [miniNaioRow, naceDef, "Int", "250 persons employed or more", share, sectorWeight, sbsRow]
    

In [None]:
r=9

shareCalculation(r, 4, [4], mini_naio.iloc[4,0],0)
r=r+15

for i in range(5,24):
     shareCalculation(r + (i-5) * 15, i, range(5,24), mini_naio.iloc[i,0],1)
r=r+15*(24-5)

shareCalculation(r,24, [24],mini_naio.iloc[24,0],2)
r=r+15

for i in range(25,27):
     shareCalculation(r + (i-25) * 15, i, range(25,27), mini_naio.iloc[i,0],3)
r=r+15*(27-25)

shareCalculation(r,27, [27], mini_naio.iloc[27,0],4)
r=r+15

for i in range(28,31):
     shareCalculation(r + (i-28) * 15, i,range(28,31),mini_naio.iloc[i,0],5)
r=r+15*(31-28)

for i in range(31,36):
     shareCalculation(r + (i-31) * 15, i, range(31,36),mini_naio.iloc[i,0],6)
r=r+15*(36-31)

shareCalculation(r,36, [36], mini_naio.iloc[36,0],7)
r=r+15

for i in range(37,41):
     shareCalculation(r + (i-37) * 15, i, range(37,41),mini_naio.iloc[i,0],8)
r=r+15*(41-37)

for i in range(41,44):
     shareCalculation(r + (i-41) * 15, i, range(41,44),mini_naio.iloc[i,0],9)
r=r+15*(44-41)

for i in range(44,46):
     shareCalculation(r + (i-44) * 15, i, range(44,46),mini_naio.iloc[i,0],10)
r=r+15*(46-44)

for i in range(46,51):
     shareCalculation(r + (i-46) * 15, i, range(46,51),mini_naio.iloc[i,0],11)
r=r+15*(51-46)

for i in range(51,56):
     shareCalculation(r + (i-51) * 15, i, range(51,56),mini_naio.iloc[i,0],12)
r=r+15*(56-51)

shareCalculation(r,56, [56], mini_naio.iloc[56,0],13)
r=r+15

for i in range(57,59):
     shareCalculation(r + (i-57) * 15, i, range(57,59),mini_naio.iloc[i,0],14)
r=r+15*(59-57)

for i in range(59,61):
     shareCalculation(r + (i-59) * 15, i, range(59,61),mini_naio.iloc[i,0],15)
r=r+15*(61-59)


# using here range(61,65) instead of range(61,66) to exclude the last row sector 65 whose NaN produces 
# a NaN result also for the weight of the other sectors of the same group, being NaN the sum of the 
# Compensations of employees
for i in range(61,65):
     shareCalculation(r + (i-61) * 15, i, range(61,65),mini_naio.iloc[i,0],16) # using 16, the artificial row

print("****************", numberOfFirms)        

In [None]:
pd.set_option('display.float_format', '{:.10f}'.format)
pd.set_option("display.max_rows", None)
#pd.set_option("display.max_rows", 100)

In [None]:

ff["Share of firms"].sum()


<span style="color: DodgerBlue;">
==============================================================================================  
    
### Dimensional class limits from sbs, plus the option for the open class, see above

==============================================================================================  
</span>  

In [None]:
#L min & L max

#the interval will generate a random uniform number r to be corrected with math.ceil(r)
#in this way, e.g., the interval [0;1[ will produce 1, ..., that [8;9[ will produce 9

ff["L min"]=0
ff["L min"]=np.where(ff["dimensional class"]=="From 0 to 9 persons employed",0,ff["L min"])
ff["L min"]=np.where(ff["dimensional class"]=="From 10 to 19 persons employed",10,ff["L min"])
ff["L min"]=np.where(ff["dimensional class"]=="From 20 to 49 persons employed",20,ff["L min"])
ff["L min"]=np.where(ff["dimensional class"]=="From 50 to 249 persons employed",50,ff["L min"])
ff["L min"]=np.where(ff["dimensional class"]=="250 persons employed or more",250,ff["L min"])

ff["L max"]=0
ff["L max"]=np.where(ff["dimensional class"]=="From 0 to 9 persons employed",9,ff["L max"])
ff["L max"]=np.where(ff["dimensional class"]=="From 10 to 19 persons employed",19,ff["L max"])
ff["L max"]=np.where(ff["dimensional class"]=="From 20 to 49 persons employed",49,ff["L max"])
ff["L max"]=np.where(ff["dimensional class"]=="From 50 to 249 persons employed",249,ff["L max"])
ff["L max"]=np.where(ff["dimensional class"]=="250 persons employed or more",1850,ff["L max"])

# when considering agriculture we must achieve the total number of 19 mln of workers 
# i.e. 10 mln of agricultural firms on 20 mln of firms - eurostat
# assumption of agricultural firms on istat and eurostat data, this justifies L max = 3
# while for the other sectors we consider the avg values of each dimensional class

ff.loc[:8, "L max"] = 2

#reading ff into the simulation code, the case "L max" == 3 will be elaborated in the following special way

<span style="color: DodgerBlue;">
==============================================================================================  
    
### Agriculture case

==============================================================================================  
</span>  

we have 10,000,000 firms and 9,000,000 workers,
being $x_1$ the # of firms with 0 worker, $x_2$ that of firms with 1 workers, $x_3$ that of firms with 2 workers

$0 x_1 + 1 x_2 + 2 x_3 = 9,000,000$

$x_1 + x_2 + x_3  = 10,000,000$

if $x_3 = 1,000,000$

$0 x_1 + 1 x_2 = 7,000,000$

$  x_1 +   x_2 = 9,000,000$


giving

$x_2 = 7,000,000$

$x_1 = 2,000,000$



<span style="color: DodgerBlue;">
==============================================================================================  
    
### Firms in absolute numbers and new column of correct row totals, named 'Total2'

==============================================================================================  
</span>  

In [None]:
ff['Firms in absolute numbers'] = ff['Share of firms'] * numberOfFirms 
# European data in real world -> CAVEAT: != number of firms in the model

In [None]:
# eliminating nan
ff["Share of firms per sbs sector"]         = np.where(pd.isna(ff["Share of firms per sbs sector"]),
                                               0,
                                               ff["Share of firms per sbs sector"])
ff["Firms in absolute numbers"]             = np.where(pd.isna(ff["Firms in absolute numbers"]),
                                               0,
                                               ff["Firms in absolute numbers"])

### now onwards drafts and controls

In [None]:
#Eliminating NaN in sbs
sbs.loc[ 3,"250 persons employed or more"]=0
sbs.loc[ 9,"250 persons employed or more"]=0
sbs.loc[10,"250 persons employed or more"]=0

sbs['Total'].sum() 

In [None]:
sbs['Total2']=sbs['From 0 to 9 persons employed']+\
sbs['From 10 to 19 persons employed']+\
sbs['From 20 to 49 persons employed']+\
sbs['From 50 to 249 persons employed']+\
sbs['250 persons employed or more']

In [None]:
sbs['Total2'].sum()

In [None]:
# check number of firm in agriculture


# how to filter one single sector or group of sectors
#ff[ff['#'].isin([1,2,3])]
#ff[ff['#'].isin([4])]

ff[ff['#'].isin([1,2,3])]["Firms in absolute numbers"].sum()
#rememember ff[ ~ ff['#'].isin([1,2,3])]["Firms in absolute numbers"].sum()

<span style="color: DodgerBlue;">
==============================================================================================  
    
### Capital stocks 

==============================================================================================  
</span>  

## computing capital quantity for ff

To assess the quantity of capital our starting point is the Appendix B of Priori, Terna _et al._ (2025). There, we roughly state the following relation.

To set a proportion, e.g., of $\frac{1}{2}$ and $\frac{1}{2}$ for the global compensations of labor and productive capital, we need a recipe $\frac{K}{L}$ with, in a time unit:

$\frac{Kr}{n}=Lw$

$K=\frac{Lwn}{r}$

with $w=1$, $L=1$, $n=12$, $r=0.10$, we obtain $K=120$. 

In real life a proportion of $120$ to $1$ between the productive capital per worker and the monthly compensation of a worker is not unrealistic.



We looked for some confirmation of this intuition in real data. Unfortunately, Eurostat data does not provide data on non-financial capital stocks, so we had to look for this information in Istat data:
- http://dati.istat.it/Index.aspx?QueryId=37156 here there are data on gross and net non-fin capital stocks and depreciations for NACE sectors
- http://dati.istat.it/Index.aspx?QueryId=12581 here there are data on occupation by sector in Italy
- we update occupational data with the LC table, to incorporate information on labor cost (even if this reflects values fo EU and not only for Italy - data not available).

In particular we look for some confirmation of this intuition by observing two examples (Manifacturing, which is a labor-intensive sector, and Coke, petrol, chemistry and farmaceutical, which is a capital-intensive one). 
- non-fin capital stock for manifacturing is 485.357.000.000 whereas the occupational data is 3.972.000. So its ratio is about 120.000.
- non-fin capital stock for chemistry and others is about 60.000.000.000 (28.859.300.000 + 30.830.800.000 because they are classified as separate sectors, i.e., coke and petrol vs. chemistry and farmaceutical) and its occupational data (here classified as an aggregate sector) is 195.000. So the furthcoming ratio is about 300.000.

Since we are considering a monthly compensation of one unit of labor at the labor cost of $1$ to a value of capital equal to $120$, we need to account for annual compensation of workers per sector (reported in LC table).

Then, the value of 120 is about ten times the annual compensation of workers, in the same way in that the ratio 300.000 is ten times the LC annual compensation of workers of 32.000 for coke, petrol, chemistry and farmaceutical (being considered as a sub-sector of manifacturing in the LC table). As far as concerns the whole manifacturing sector the value are close but not perfect beacuse we need to consider a higher capital compensation $r$. 

This makes perfectly sense as manifacture can be considered as a "light" sector where capital can obtain higher compensation per unit rather than coke, petrol, chemistry and farmaceutical, being a heavy sector with lower capital compensation per unit.



<span style="color: DodgerBlue;">
==============================================================================================  
    
### Capital stocks from Italian data

==============================================================================================  
</span>  

In [None]:
cap_stocks = pd.read_excel("capital_stocks_istat_net.xlsx", sheet_name=0) #we use only year 2016 (the last with full entries)
# keeep in mind that these units are to be intended as MLN (ie x1.000.000) :)

cap_stocks = cap_stocks.drop(cap_stocks.index[0:9])
cap_stocks = pd.DataFrame([cap_stocks.iloc[:,0],cap_stocks['Unnamed: 7']])
cap_stocks = cap_stocks.T
cap_stocks.columns = ['NACE sector', 'Net cap stock' ]
cap_stocks = cap_stocks.drop(cap_stocks.index[-1])

pd.set_option('display.max_colwidth', None)
pd.options.display.float_format = '{:.1f}'.format

cap_stocks = cap_stocks.reset_index(drop=True)
cap_stocks = cap_stocks.drop(cap_stocks.index[[0,2,3,5,22,23,27,33,34,39,42,45]]) 
cap_stocks = cap_stocks.reset_index(drop=True)


cap_stocks

<span style="color: DodgerBlue;">
==============================================================================================  
    
### Consistent employees number from Italian data

==============================================================================================  
</span>  

In [None]:
employed = pd.read_excel("occupati_istat.xlsx", sheet_name=0)
# here units represent THOUSANDS (x1.000)

employed = employed.drop(employed.index[0:8])
employed = pd.DataFrame([employed.iloc[:,0],employed['Unnamed: 5']])
employed = employed.T
employed.columns = ['NACE sector', 'employed' ]

employed = employed.drop(employed.index[-1])
employed = employed.drop(employed.index[-1])
employed = employed.drop(employed.index[-1])

pd.set_option('display.max_colwidth', None)
pd.options.display.float_format = '{:.1f}'.format

employed = employed.reset_index(drop=True)
employed = employed.drop(employed.index[[0,2,3,5,15]]) 
# some rows contain aggr values of successive observations, we drop them
employed = employed.reset_index(drop=True) 

employed_num = employed['employed']

employed

<span style="color: DodgerBlue;">
==============================================================================================  
    
### millions of euros divided by thousands of employees => the recipes are in thousands of euros 
### and the same the capital

==============================================================================================  
</span>  

In [None]:
ff['Recipe'] = 0
ff['Recipe']=ff['Recipe'].astype(float) # to avoid a 64bit warning that will raise an error in 
                                        # future versions of pandas

ff.iloc[0:9, ff.columns.get_loc('Recipe')] = cap_stocks.iloc[0,1] / employed_num.iloc[0]
ff.iloc[9:24, ff.columns.get_loc('Recipe')] = cap_stocks.iloc[1,1] / employed_num.iloc[1]
ff.iloc[24:39, ff.columns.get_loc('Recipe')] = cap_stocks.iloc[2,1] / employed_num.iloc[2]

ff.iloc[39:54, ff.columns.get_loc('Recipe')] = (cap_stocks.iloc[3,1] + cap_stocks.iloc[4,1]) / employed_num.iloc[3]
ff.iloc[54:69, ff.columns.get_loc('Recipe')] = (cap_stocks.iloc[3,1] + cap_stocks.iloc[4,1]) / employed_num.iloc[3]
ff.iloc[69:84, ff.columns.get_loc('Recipe')] = (cap_stocks.iloc[3,1] + cap_stocks.iloc[4,1]) / employed_num.iloc[3]
ff.iloc[84:99, ff.columns.get_loc('Recipe')] = (cap_stocks.iloc[3,1] + cap_stocks.iloc[4,1]) / employed_num.iloc[3]

ff.iloc[99:114, ff.columns.get_loc('Recipe')] = \
                (cap_stocks.iloc[5,1] + cap_stocks.iloc[6,1] + cap_stocks.iloc[7,1]) / employed_num.iloc[4]
ff.iloc[114:129, ff.columns.get_loc('Recipe')] =\
                (cap_stocks.iloc[5,1] + cap_stocks.iloc[6,1] + cap_stocks.iloc[7,1]) / employed_num.iloc[4]
ff.iloc[129:144, ff.columns.get_loc('Recipe')] = \
                (cap_stocks.iloc[5,1] + cap_stocks.iloc[6,1] + cap_stocks.iloc[7,1]) / employed_num.iloc[4]

ff.iloc[144:159, ff.columns.get_loc('Recipe')] = (cap_stocks.iloc[8,1] + cap_stocks.iloc[9,1]) / employed_num.iloc[5]
ff.iloc[159:174, ff.columns.get_loc('Recipe')] = (cap_stocks.iloc[8,1] + cap_stocks.iloc[9,1]) / employed_num.iloc[5]
ff.iloc[174:189, ff.columns.get_loc('Recipe')] = (cap_stocks.iloc[8,1] + cap_stocks.iloc[9,1]) / employed_num.iloc[5]
ff.iloc[189:204, ff.columns.get_loc('Recipe')] = (cap_stocks.iloc[8,1] + cap_stocks.iloc[9,1]) / employed_num.iloc[5]

ff.iloc[204:219, ff.columns.get_loc('Recipe')] = \
                (cap_stocks.iloc[10,1] + cap_stocks.iloc[11,1] + cap_stocks.iloc[12,1]) / employed_num.iloc[6]
ff.iloc[219:234, ff.columns.get_loc('Recipe')] = \
                (cap_stocks.iloc[10,1] + cap_stocks.iloc[11,1] + cap_stocks.iloc[12,1]) / employed_num.iloc[6]
ff.iloc[234:249, ff.columns.get_loc('Recipe')] = \
                (cap_stocks.iloc[10,1] + cap_stocks.iloc[11,1] + cap_stocks.iloc[12,1]) / employed_num.iloc[6]

ff.iloc[249:264, ff.columns.get_loc('Recipe')] = cap_stocks.iloc[13,1] / employed_num.iloc[7]
ff.iloc[264:279, ff.columns.get_loc('Recipe')] = cap_stocks.iloc[13,1] / employed_num.iloc[7]

ff.iloc[279:294, ff.columns.get_loc('Recipe')] = cap_stocks.iloc[14,1] / employed_num.iloc[8]
ff.iloc[294:309, ff.columns.get_loc('Recipe')] = cap_stocks.iloc[14,1] / employed_num.iloc[8]

ff.iloc[309:324, ff.columns.get_loc('Recipe')] = (cap_stocks.iloc[15,1] + cap_stocks.iloc[16,1]) / employed_num.iloc[9]
ff.iloc[324:339, ff.columns.get_loc('Recipe')] = (cap_stocks.iloc[15,1] + cap_stocks.iloc[16,1]) / employed_num.iloc[9]
ff.iloc[339:354, ff.columns.get_loc('Recipe')] = (cap_stocks.iloc[15,1] + cap_stocks.iloc[16,1]) / employed_num.iloc[9]

ff.iloc[354:369, ff.columns.get_loc('Recipe')] = cap_stocks.iloc[17,1] / employed_num.iloc[10]

ff.iloc[369:504, ff.columns.get_loc('Recipe')] = \
                (cap_stocks.iloc[18,1] + cap_stocks.iloc[19,1] + cap_stocks.iloc[20,1]) / employed_num.iloc[11]


ff.iloc[504:564, ff.columns.get_loc('Recipe')] = \
                (cap_stocks.iloc[21,1] + cap_stocks.iloc[22,1] + cap_stocks.iloc[23,1]) / employed_num.iloc[12]

ff.iloc[564:609, ff.columns.get_loc('Recipe')] = cap_stocks.iloc[24,1] / employed_num.iloc[13]

ff.iloc[609:639, ff.columns.get_loc('Recipe')] = cap_stocks.iloc[25,1] / employed_num.iloc[14]

ff.iloc[639:774, ff.columns.get_loc('Recipe')] = (cap_stocks.iloc[28,1] + cap_stocks.iloc[29,1]) / employed_num.iloc[15]

ff.iloc[774:834, ff.columns.get_loc('Recipe')] = (cap_stocks.iloc[30,1] + \
                cap_stocks.iloc[31,1] + cap_stocks.iloc[32,1] + cap_stocks.iloc[33,1]) / employed_num.iloc[16]

ff.iloc[834:909, ff.columns.get_loc('Recipe')] =\
                (cap_stocks.iloc[34,1] + cap_stocks.iloc[35,1] + cap_stocks.iloc[36,1]) / employed_num.iloc[17]

# 909 -939 have no capital stocks and then have recipe 0 

In [None]:
ff['K min'] = ff['L min'] * ff['Recipe']
ff['K min'] = np.where(ff['L min'] == 0, ff['Recipe'] * 1, ff['L min'] * ff['Recipe']) # to consider at least Lmin=1
ff['K max'] = ff['L max'] * ff['Recipe']

<span style="color: DodgerBlue;">
==============================================================================================  
    
### Cost of Capital, considering a risk free rent value 
### Drastically reduced for the sector of Real estate services with a super high capital intensity 
### with Compensation of employees reprensenting the 5% of the Added value, gross

==============================================================================================  
</span>  

In [None]:
ff['Cost of capital'] = 0.15
ff.iloc[609:624, ff.columns.get_loc('Cost of capital')] = 0.0 #imputer rents in naio has VA 0
ff.iloc[624:639, ff.columns.get_loc('Cost of capital')] = 0.03 #real estate need to compensate out of scale recipes
ff['Cost of capital'] = ff['Cost of capital'].apply(lambda x: format(x, '.2f'))

<span style="color: DodgerBlue;">
==============================================================================================  
    
### Wages
###  
### using Compensations of employees and sbs number of Employees

==============================================================================================  
</span>  

In [None]:
ff['Wage'] = 0
ff['Wage']=ff['Wage'].astype(float) # to avoid a 64bit warning that will raise an error in 
                                        # future versions of pandas


ff.iloc[0:9, ff.columns.get_loc('Wage')] = mini_naio['Compensation of employees'].loc[1:3].sum() * 1e6 / agriculture
ff.iloc[9:24, ff.columns.get_loc('Wage')] = mini_naio['Compensation of employees'].loc[4]*1e6 / sbs['Total2'].loc[0]
ff.iloc[24:39, ff.columns.get_loc('Wage')] = mini_naio['Compensation of employees'].loc[5:23].sum() * 1e6 /sbs['Total2'].loc[1]
ff.iloc[39:54, ff.columns.get_loc('Wage')] = mini_naio['Compensation of employees'].loc[5:23].sum() * 1e6 /sbs['Total2'].loc[1]                                      
ff.iloc[54:69, ff.columns.get_loc('Wage')] = mini_naio['Compensation of employees'].loc[5:23].sum() * 1e6 /sbs['Total2'].loc[1]    
ff.iloc[69:84, ff.columns.get_loc('Wage')] = mini_naio['Compensation of employees'].loc[5:23].sum() * 1e6 /sbs['Total2'].loc[1]
ff.iloc[84:99, ff.columns.get_loc('Wage')] = mini_naio['Compensation of employees'].loc[5:23].sum() * 1e6 /sbs['Total2'].loc[1]
ff.iloc[99:114, ff.columns.get_loc('Wage')] = mini_naio['Compensation of employees'].loc[5:23].sum() * 1e6 /sbs['Total2'].loc[1]
ff.iloc[114:129, ff.columns.get_loc('Wage')] = mini_naio['Compensation of employees'].loc[5:23].sum() * 1e6 / sbs['Total2'].loc[1]
ff.iloc[129:144, ff.columns.get_loc('Wage')] = mini_naio['Compensation of employees'].loc[5:23].sum() * 1e6 / sbs['Total2'].loc[1]
ff.iloc[144:159, ff.columns.get_loc('Wage')] = mini_naio['Compensation of employees'].loc[5:23].sum() * 1e6 / sbs['Total2'].loc[1]
ff.iloc[159:174, ff.columns.get_loc('Wage')] = mini_naio['Compensation of employees'].loc[5:23].sum() * 1e6 / sbs['Total2'].loc[1]
ff.iloc[174:189, ff.columns.get_loc('Wage')] = mini_naio['Compensation of employees'].loc[5:23].sum() * 1e6 / sbs['Total2'].loc[1]
ff.iloc[189:204, ff.columns.get_loc('Wage')] = mini_naio['Compensation of employees'].loc[5:23].sum() * 1e6 / sbs['Total2'].loc[1]
ff.iloc[204:219, ff.columns.get_loc('Wage')] = mini_naio['Compensation of employees'].loc[5:23].sum() * 1e6 / sbs['Total2'].loc[1]
ff.iloc[219:234, ff.columns.get_loc('Wage')] = mini_naio['Compensation of employees'].loc[5:23].sum() * 1e6 / sbs['Total2'].loc[1]
ff.iloc[234:249, ff.columns.get_loc('Wage')] = mini_naio['Compensation of employees'].loc[5:23].sum() * 1e6 / sbs['Total2'].loc[1]
ff.iloc[249:264, ff.columns.get_loc('Wage')] = mini_naio['Compensation of employees'].loc[5:23].sum() * 1e6 / sbs['Total2'].loc[1]
ff.iloc[264:279, ff.columns.get_loc('Wage')] = mini_naio['Compensation of employees'].loc[5:23].sum() * 1e6 / sbs['Total2'].loc[1]
ff.iloc[279:294, ff.columns.get_loc('Wage')] = mini_naio['Compensation of employees'].loc[5:23].sum() * 1e6 / sbs['Total2'].loc[1]
ff.iloc[294:309, ff.columns.get_loc('Wage')] = mini_naio['Compensation of employees'].loc[5:23].sum() * 1e6 / sbs['Total2'].loc[1]
ff.iloc[309:324, ff.columns.get_loc('Wage')] =  mini_naio['Compensation of employees'].loc[24] * 1e6 / sbs['Total2'].loc[2]                                  
ff.iloc[324:339, ff.columns.get_loc('Wage')] = mini_naio['Compensation of employees'].loc[25:26].sum() * 1e6 /sbs['Total2'].loc[3]
ff.iloc[339:354, ff.columns.get_loc('Wage')] = mini_naio['Compensation of employees'].loc[25:26].sum() * 1e6 /sbs['Total2'].loc[3]
ff.iloc[354:369, ff.columns.get_loc('Wage')] = mini_naio['Compensation of employees'].loc[27] * 1e6 /sbs['Total2'].loc[4]
ff.iloc[369:384, ff.columns.get_loc('Wage')] = mini_naio['Compensation of employees'].loc[28:30].sum() * 1e6 /sbs['Total2'].loc[5]
ff.iloc[384:399, ff.columns.get_loc('Wage')] = mini_naio['Compensation of employees'].loc[28:30].sum() * 1e6 /sbs['Total2'].loc[5]
ff.iloc[399:414, ff.columns.get_loc('Wage')] = mini_naio['Compensation of employees'].loc[28:30].sum() * 1e6 /sbs['Total2'].loc[5]
ff.iloc[414:429, ff.columns.get_loc('Wage')] = mini_naio['Compensation of employees'].loc[31:35].sum() * 1e6 /sbs['Total2'].loc[6]
ff.iloc[429:444, ff.columns.get_loc('Wage')] = mini_naio['Compensation of employees'].loc[31:35].sum() * 1e6 /sbs['Total2'].loc[6]
ff.iloc[444:459, ff.columns.get_loc('Wage')] = mini_naio['Compensation of employees'].loc[31:35].sum() * 1e6 /sbs['Total2'].loc[6]
ff.iloc[459:474, ff.columns.get_loc('Wage')] = mini_naio['Compensation of employees'].loc[31:35].sum() * 1e6 /sbs['Total2'].loc[6]
ff.iloc[474:489, ff.columns.get_loc('Wage')] = mini_naio['Compensation of employees'].loc[31:35].sum() * 1e6 /sbs['Total2'].loc[6]
ff.iloc[489:504, ff.columns.get_loc('Wage')] = mini_naio['Compensation of employees'].loc[36] * 1e6 /sbs['Total2'].loc[7]
ff.iloc[504:519, ff.columns.get_loc('Wage')] = mini_naio['Compensation of employees'].loc[37:40].sum() * 1e6 /sbs['Total2'].loc[8]
ff.iloc[519:534, ff.columns.get_loc('Wage')] = mini_naio['Compensation of employees'].loc[37:40].sum() * 1e6 /sbs['Total2'].loc[8]
ff.iloc[534:549, ff.columns.get_loc('Wage')] = mini_naio['Compensation of employees'].loc[37:40].sum() * 1e6 /sbs['Total2'].loc[8]
ff.iloc[549:564, ff.columns.get_loc('Wage')] = mini_naio['Compensation of employees'].loc[37:40].sum() * 1e6 /sbs['Total2'].loc[8]
ff.iloc[564:579, ff.columns.get_loc('Wage')] = mini_naio['Compensation of employees'].loc[41:43].sum() * 1e6 /sbs['Total2'].loc[9]
ff.iloc[579:594, ff.columns.get_loc('Wage')] = mini_naio['Compensation of employees'].loc[41:43].sum() * 1e6 /sbs['Total2'].loc[9]
ff.iloc[594:609, ff.columns.get_loc('Wage')] = mini_naio['Compensation of employees'].loc[41:43].sum() * 1e6 /sbs['Total2'].loc[9]
ff.iloc[609:624, ff.columns.get_loc('Wage')] = mini_naio['Compensation of employees'].loc[44:45].sum() * 1e6 /sbs['Total2'].loc[10]
ff.iloc[624:639, ff.columns.get_loc('Wage')] = mini_naio['Compensation of employees'].loc[44:45].sum() * 1e6 /sbs['Total2'].loc[10]
ff.iloc[639:654, ff.columns.get_loc('Wage')] = mini_naio['Compensation of employees'].loc[46:50].sum() * 1e6 /sbs['Total2'].loc[11]
ff.iloc[654:669, ff.columns.get_loc('Wage')] = mini_naio['Compensation of employees'].loc[46:50].sum() * 1e6 /sbs['Total2'].loc[11]
ff.iloc[669:684, ff.columns.get_loc('Wage')] = mini_naio['Compensation of employees'].loc[46:50].sum() * 1e6 /sbs['Total2'].loc[11]
ff.iloc[684:699, ff.columns.get_loc('Wage')] = mini_naio['Compensation of employees'].loc[46:50].sum() * 1e6 /sbs['Total2'].loc[11]
ff.iloc[699:714, ff.columns.get_loc('Wage')] = mini_naio['Compensation of employees'].loc[46:50].sum() * 1e6 /sbs['Total2'].loc[11]
ff.iloc[714:729, ff.columns.get_loc('Wage')] = mini_naio['Compensation of employees'].loc[51:55].sum() * 1e6 /sbs['Total2'].loc[12]
ff.iloc[729:744, ff.columns.get_loc('Wage')] = mini_naio['Compensation of employees'].loc[51:55].sum() * 1e6 /sbs['Total2'].loc[12]
ff.iloc[744:759, ff.columns.get_loc('Wage')] = mini_naio['Compensation of employees'].loc[51:55].sum() * 1e6 /sbs['Total2'].loc[12]
ff.iloc[759:774, ff.columns.get_loc('Wage')] = mini_naio['Compensation of employees'].loc[51:55].sum() * 1e6 /sbs['Total2'].loc[12]
ff.iloc[774:789, ff.columns.get_loc('Wage')] = mini_naio['Compensation of employees'].loc[51:55].sum() * 1e6 /sbs['Total2'].loc[12]
ff.iloc[789:804, ff.columns.get_loc('Wage')]= mini_naio['Compensation of employees'].loc[56] * 1e6 /sbs['Total2'].loc[13]
ff.iloc[804:819, ff.columns.get_loc('Wage')] = mini_naio['Compensation of employees'].loc[57:58].sum() * 1e6 /sbs['Total2'].loc[14]
ff.iloc[819:834, ff.columns.get_loc('Wage')] = mini_naio['Compensation of employees'].loc[57:58].sum() * 1e6 /sbs['Total2'].loc[14]
ff.iloc[834:849, ff.columns.get_loc('Wage')] = mini_naio['Compensation of employees'].loc[59:60].sum() * 1e6 /sbs['Total2'].loc[15]
ff.iloc[849:864, ff.columns.get_loc('Wage')] = mini_naio['Compensation of employees'].loc[59:60].sum() * 1e6 /sbs['Total2'].loc[15]
ff.iloc[864:879, ff.columns.get_loc('Wage')] = mini_naio['Compensation of employees'].loc[61:64].sum() * 1e6 /sbs['Total2'].loc[16]
ff.iloc[879:894, ff.columns.get_loc('Wage')] = mini_naio['Compensation of employees'].loc[61:64].sum() * 1e6 /sbs['Total2'].loc[16] 
ff.iloc[894:909, ff.columns.get_loc('Wage')] = mini_naio['Compensation of employees'].loc[61:64].sum() * 1e6 /sbs['Total2'].loc[16] 
ff.iloc[909:924, ff.columns.get_loc('Wage')] = mini_naio['Compensation of employees'].loc[61:64].sum() * 1e6 /sbs['Total2'].loc[16] 
ff.iloc[924:939, ff.columns.get_loc('Wage')] = 0 #mini_naio['Compensation of employees'].loc[65]  is nan


#ff

In [None]:
mini_naio['Compensation of employees'].loc[1:65].sum()

<span style="color: DodgerBlue;">
==============================================================================================  
    
### display a few rows of the ff table, for control purposes

==============================================================================================  
</span>  

In [None]:
#pd.set_option('display.float_format', '{:.5f}'.format)
ff[ff['#'].isin([65])]

<span style="color: DodgerBlue;">
==============================================================================================  
    
### intermediate inputs (calculations made by intermediate65x65.ipynb)

==============================================================================================  
</span>  

In [None]:
naio_io = pd.read_pickle('naio_io.xp') #shares

In [None]:
naio_io

#### In test_ff and in model, load naio_io_N.xp which contains only the numerical part of the table, as a numpy array

<span style="color: DodgerBlue;">
==============================================================================================  
    
### corrections on AV

==============================================================================================  
</span>

We adjust ADDED VALUE items by using the ratio between simulated and actual values that we calculated in comparingActual&SimulatedData.ipynb.

In [None]:
compare = pd.read_pickle('./av_ratio_sim_act.xp')

"""
the structure of expanded_compare is:
row 1 => 3 rows 
row 2 => 3 rows
row 3 => 3 rows 
for Agricolture, Forestry, Phishing

then
row 4  => 15 rows
until 
row 64 => 15 rows
we dropped row 65, above

for globally 924 rows
"""

expanded1_compare = compare[0:3].repeat(3).reset_index(drop=True) # it multiplies by 3 each row of compare 
expanded2_compare = compare[3:64].repeat(15).reset_index(drop=True) # it multiplies by 15 each row of compare

In [None]:
#expanded1_compare

In [None]:
#expanded2_compare

In [None]:
expanded_compare = pd.concat([expanded1_compare.squeeze(), expanded2_compare.squeeze()], ignore_index=True)

In [None]:
#expanded_compare

In [None]:
ff['Wage'] = ff['Wage']/ expanded_compare 

In [None]:
ff['K min'] =  ff['K min'] / expanded_compare

In [None]:
ff['K max']  = ff['K max'] / expanded_compare

In [None]:
ff['Recipe']  = ff['Recipe'] / expanded_compare

In [None]:
ff=ff.replace([np.inf,-np.inf],0)

## ff-output with labor and capital

In [None]:
ff.to_csv("ff_with_class_limits.csv",index=False)

In [None]:
ff[ff['#'].isin([44])]

<span style="color: DodgerBlue;">
==============================================================================================  
    
### Investments, start, a look to nama

==============================================================================================  
</span>

In [None]:
#pd.set_option('display.max_colwidth', None)  

In [None]:
#nama

In [None]:
# swap rows and rows label# Positions to swap
i, j = 43, 44  # swap first and third row

# Swap row contents
temp = nama.iloc[i].copy()
nama.iloc[i] = nama.iloc[j]
nama.iloc[j] = temp

# Swap index labels
new_index = nama.index.tolist()
new_index[i], new_index[j] = new_index[j], new_index[i]
nama.index = new_index

In [None]:
#nama

In [None]:
#elimitaning negative values
nama[nama < 0] = 0

In [None]:
nama

In [None]:
# total inv. from naio 3320258.70
nama.iloc[:,1:].sum()

In [None]:
nama.iloc[:,1:].sum().sum()

<span style="color: DodgerBlue;">
==============================================================================================  
    
### create investment table, basic version with shares (no nama)

==============================================================================================  
</span>

In [None]:
subset_index = mini_naio.index[1:66]

invTable=pd.DataFrame(index=subset_index,dtype='float64')
invTable[mini_naio['IND_USE (Labels)']]=0.0
invTable.rename(columns={'Total': 'sectors'}, inplace=True)

In [None]:
invTable['sectors']=nama.index.tolist()

In [None]:
invTable

In [None]:
GrossFixedCapitalFormation = mini_naio['Gross fixed capital formation'][1:67].to_list()

In [None]:
GrossFixedCapitalFormation

In [None]:
tot = sum(GrossFixedCapitalFormation)
tot

In [None]:
shares = [sect / tot for sect in GrossFixedCapitalFormation]

In [None]:
tot2 = sum(shares) 
tot2

In [None]:
invTable.iloc[:,1:66] = shares[0:65]

In [None]:
pd.options.display.float_format = '{:.4f}'.format
invTable

In [None]:
#TEST 1
#anticipated for the control in TEST 1 and repeated below (from a run of test_ff with 1,000,000 firms, reported to EU scale)
with open('buyingSectorsPurchases.csv', 'r') as csvfile:
    reader = csv.reader(csvfile)
    buyingSectorsPurchases = [float(row[0]) for row in reader]

In [None]:
buyingSectorsPurchases

In [None]:
t=int(sum(buyingSectorsPurchases))

In [None]:
#! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! 
#if text_ff find that its internal value of the total (int part) of buying sectors purchases
#has changed, firm-feature-generation has to be rerun
with open("control4text_ff.txt", "w") as f:
    f.write(str(t))

In [None]:
#TEST 1
#anticipated for the control in TEST 1 and repeated below
correctionCoef = sum(buyingSectorsPurchases)/mini_naio.iloc[0,6]

In [None]:
#TEST 1
#control test to understand how to fix invTableNama below
v=[0]*3
for i in range(65):
    v[0]+=invTable.iloc[i,20]*buyingSectorsPurchases[i]
    v[1]+=invTable.iloc[i,21]*buyingSectorsPurchases[i]
    v[2]+=invTable.iloc[i,27]*buyingSectorsPurchases[i]
v

In [None]:
invTable.to_pickle("./invTableNoNama.xp")

In [None]:
invTable.shape

<span style="color: DodgerBlue;">
==============================================================================================  
    
### create investment table, nama version, with raw corrections

==============================================================================================  
</span>

In [None]:
pd.options.display.float_format = '{:.1f}'.format

In [None]:
subset_index = mini_naio.index[1:66]

invTableNama=pd.DataFrame(index=subset_index,dtype='float64')
invTableNama[mini_naio['IND_USE (Labels)']]= 0.0 #= mini_naio['Gross fixed capital formation'][1:67].to_list()
invTableNama.rename(columns={'Total': 'sectors'}, inplace=True)
invTableNama['sectors']=nama.index.tolist()
#result of the above operations: invTableNama filled with zeroes, col. 0 with sectors from nama, col. names with naio sectors

with open('buyingSectorsPurchases.csv', 'r') as csvfile:
    reader = csv.reader(csvfile)
    buyingSectorsPurchases = [float(row[0]) for row in reader]
invTableNama['Totals'] = buyingSectorsPurchases
#result of the above operations: the table is always filled with zeroes, but with one more col containing the buying values 
#coming from test_ff; NB, those values are independent from the distribution of purchases within test_ff
#invTableNama['Totals'].sum() gives np.float64(3085530.281151042) which is the sum of SIM. EU SCALE buying sectors (from a 1,000,000
#firm sample)

#invTableNama['Totals'].sum()
invTableNama


In [None]:
for i in range(len(buyingSectorsPurchases)):
    print(i,buyingSectorsPurchases[i])

In [None]:
sum(buyingSectorsPurchases)

In [None]:
rowStructure = []
for i in range(len(GrossFixedCapitalFormation)): 
    rowStructure.append(GrossFixedCapitalFormation[i]/sum(GrossFixedCapitalFormation))
rowStructure = np.array(rowStructure)
for i in range(65):  
    invTableNama.iloc[i, 1:66] = invTableNama.iloc[i, 66] * rowStructure

# summing cols
totals = invTableNama.iloc[:, 1:].sum(numeric_only=True)
# Insert a label for the new row
totals_row = pd.Series(['Totals'] + totals.tolist(), index=invTableNama.columns)
# Append the row to the DataFrame
invTableNama = pd.concat([invTableNama, totals_row.to_frame().T], ignore_index=True)

#the global total is 3085534.6, same value above
#now invTableNama contains absolute values, consistent with the buying values from test_ff and structure coming from naio
#the total value is less than the Gross Fixed Capital Formation of naio (3320258.70) but here we have uniquely the substitutions

#by construction, we have the same total for the marginal values of rows (purchases by sectors, broken down by the composition  
#of gross capital formation) and columns (each of them contains the result of applying the same share to a list of addenda summing up to 
#the total 3085534.6 and the sum of all the shares giving 1)

invTableNama

In [None]:
#TEST

correctionCoef = sum(buyingSectorsPurchases)/mini_naio.iloc[0,6]

In [None]:
#TEST 
#control test to understand how to fix invTableNama below
v=[0]*3
for i in range(65):
    v[0]+=invTableNama.iloc[i,20] #*buyingSectorsPurchases[i]
    v[1]+=invTableNama.iloc[i,21] #*buyingSectorsPurchases[i]
    v[2]+=invTableNama.iloc[i,27] #*buyingSectorsPurchases[i]
v

In [None]:
#TEST 
sum(buyingSectorsPurchases)

In [None]:
#TEST 
print(GrossFixedCapitalFormation[19]*correctionCoef,\
      GrossFixedCapitalFormation[20]*correctionCoef,\
      GrossFixedCapitalFormation[26]*correctionCoef)

Correct correspondence of the above results, calculated in two ways

**Gross capital formation** in mini_naio is related to substitutions & new investment; our analysis here, useful to build the investment
IO transaction table, has the buyer side coming, as volumes, from the simulation of the firms' subsitution.

The total of Gross capital formation, i.e., **3320258.70** (mini_naio.iloc[0,6]), has to be reproportionated to the amount of investment on buyer side of the simulation, sum(buyingSectorsPurchases) using a correctionCoef

In [None]:
sum(buyingSectorsPurchases)

In [None]:
correctionCoef = sum(buyingSectorsPurchases)/mini_naio.iloc[0,6]

In [None]:
correctionCoef

In [None]:
#construction special case
constructions = (nama['Dwellings'] + nama['Other buildings']).squeeze()
totConstructions = constructions.sum()
#shares from nama
constructionShares = [x / totConstructions for x in constructions]
constructionShares.append(1.0)
constructionShares = pd.Series(constructionShares)


#setting constructions in invTableNama
salesFromMiniNaioConstructions = mini_naio.iloc[27,6]*correctionCoef
invTableNama['Constructions and construction works'] = constructionShares * salesFromMiniNaioConstructions

In [None]:
totConstructions

In [None]:
nama['Dwellings'].sum()+nama['Other buildings'].sum()

In [None]:
invTableNama['Constructions and construction works']

In [None]:
invTableNama

In [None]:
invTableNama.shape

In [None]:
#TEST after constructions, but before reproportioning
#control test to understand how to fix invTableNama
v=0
for i in range(65):
    v+=invTableNama.iloc[i,27] #*buyingSectorsPurchases[i]
v

In [None]:
#TEST
sum(buyingSectorsPurchases)

In [None]:
#TEST
print(GrossFixedCapitalFormation[26]*correctionCoef)

In [None]:
#as an axample, row 0 true sum
invTableNama.iloc[0,1:-1].sum()

In [None]:
#as an axample, row 0 expected sum
invTableNama.iloc[0,-1]

In [None]:
#reproportion without modifying the nama sectors

In [None]:
#METHOD 1
#reconcile row totals, not modifying the cols listed in a vector

In [None]:
#num to be used
col_indices = list(enumerate(invTableNama.columns))
col_indices

In [None]:
#other sectors to be corrected in invTableNama, besides 'constructions' (special case, above)

In [None]:
nama.columns

Sector associations (to be used!) applying shares on the left to cols on the right

    from nama 'Dwellings'                       [1] to mini_naio [27] (done above)

    from nama 'Other buildings'                 [2] to mini_naio [27] (done above)

    from nama 'Transport equipment'             [3] to mini_naio [20, 21]

    from nama 'Computer hardware'               [4] to mini_naio [17]

    from nama 'Telecommunications'              [5] to mini_naio [39]

    from nama 'Other machinery'                 [6] to mini_naio [18, 19, 23]

    from nama 'Cultivated biological resources' [7] to mini_naio [1, 2, 3]

    from nama 'R & D'                           [8] to mini_naio [48]

    from nama 'Computer software and db'        [9] to mini_naio [40] 

In [None]:
#substitutions

def substitutions(namaCol,mini_naioRow):
    tot = nama.iloc[:,namaCol].sum()

    #shares from nama
    namaShares = [x / tot for x in nama.iloc[:,namaCol]]
    namaShares.append(1.0)
    namaShares = pd.Series(namaShares)

    salesFromMiniNaio = mini_naio.iloc[mini_naioRow,6]*correctionCoef
    invTableNamaCol=mini_naioRow
    invTableNama.iloc[:,invTableNamaCol] = namaShares * salesFromMiniNaio

substitutions(3,20)
substitutions(3,21)
substitutions(4,17)
substitutions(5,39)
substitutions(6,18)
substitutions(6,19)
substitutions(6,23)
substitutions(7,1)
substitutions(7,2)
substitutions(7,3)
substitutions(8,48)
substitutions(9,40)

In [None]:
#**********************************

# Get all rows and columns except the last one in both directions (the totals) for the IPF calculation below
data_part_for_IPF = invTableNama.iloc[:-1, 1:-1]
data_partNu =data_part_for_IPF.to_numpy()

#**********************************

In [None]:
#Numbers of the cols to be excluded
excluded_cols = [1,2,3,17,18,19,20,21,23,27,39,40,48]
#excluded_cols = [27]
# Get all column indices
all_cols = list(range(invTableNama.shape[1]))
all_cols[1:-1]
#given a row, e.g., 0
invTableNama.iloc[0,1:-1].sum()

In [None]:
#all_cols[1:-1]

In [None]:
# Get columns to modify (i.e., those NOT in exclude list)
cols_to_modify = [i for i in all_cols[1:-1] if i not in excluded_cols]
#cols_to_modify

In [None]:
#rows
new_sum=invTableNama.iloc[:,1:-1].sum(axis=1)

In [None]:
#new_sum

In [None]:
#invTableNama['Totals']

In [None]:
diff=invTableNama['Totals'] - new_sum
#diff

In [None]:
#within each row
cols_to_modify_sum = invTableNama.iloc[:, cols_to_modify].sum(axis=1)

In [None]:
#cols_to_modify_sum

In [None]:
type(cols_to_modify_sum)

In [None]:
#within each row
excluded_cols_sum = invTableNama.iloc[:, excluded_cols].sum(axis=1)

In [None]:
#excluded_cols_sum

In [None]:
#for each row
correctionsCols_to_modify = [1.0]*len(diff)
correctionsExcludedCols   = [1.0]*len(diff)
for i in range(len(diff)-1):
    if cols_to_modify_sum[i] > 0: 
        correctionsCols_to_modify[i] = 1.0 + diff[i]/cols_to_modify_sum[i]
        if correctionsCols_to_modify[i] < 0.0:
            correctionsCols_to_modify[i] = 0.0
            diff2=diff[i]+cols_to_modify_sum[i] #if here, diff<0 and partially compensated by cols_to_modify_sum[i]
            correctionsExcludedCols[i]=1+diff2/excluded_cols_sum[i]
    if i==26: print(diff[i],diff2,correctionsCols_to_modify[i],correctionsExcludedCols[i])

In [None]:
#for i in range(len(correctionsCols_to_modify)):
#    print(i,correctionsCols_to_modify[i])

In [None]:
#for i in range(len(correctionsExcludedCols)):
#    print(i,correctionsExcludedCols[i])

In [None]:
#APPLAY CORRECTIONS
for i in range(len(diff)-1):
    for j in range(1,67):
        if j in cols_to_modify:
            invTableNama.iloc[i,j]*=correctionsCols_to_modify[i]
        else:
            invTableNama.iloc[i,j]*=correctionsExcludedCols[i]

In [None]:
# Get all rows and columns except the last one in both directions (the totals)
data_part = invTableNama.iloc[:-1, 1:-1]

#Fill last column with row totals
invTableNama.iloc[:-1, -1] = data_part.sum(axis=1)

#THE LAST COL HAS CHANDED if compared with buying values, see below activating the print

# Fill last row with column totals
invTableNama.iloc[-1, 1:-1] = data_part.sum(axis=0)

#THE LAST ROW HAS CHANGED if compared with GrossFixedCapitalFormation[col]*correctionCoef, see below activating the print


In [None]:
#sum of the last col excluding the last row
invTableNama.iloc[:-1, -1].sum()

In [None]:
#sum of the last row excluding the last col
invTableNama.iloc[-1, 1:-1].sum()

In [None]:
# Fill bottom-right cell with grand total
invTableNama.iloc[-1, -1] = data_part.values.sum()
invTableNama.iloc[-1, -1]

In [None]:
invTableNama.shape[0]

In [None]:
invTableNama.shape[1]

In [None]:
invTableNama

In [None]:
invTableNama.iloc[:-3, 1:-1] #= invTableNama.iloc[:-3, 1:-1].div(invTableNama.iloc[:-3, -1], axis=0)
for i in range(len(diff)-3):
    for j in range(1,66):
        invTableNama.iloc[i, j]/=invTableNama.iloc[i, 66]


In [None]:
pd.options.display.float_format = '{:.4f}'.format
invTableNama

In [None]:
invTableNama.iloc[:-1,:-1].shape

In [None]:
invTableNama.iloc[:-1,:-1].to_pickle("./invTableNama.xp")

<span style="color: DodgerBlue;">
==============================================================================================  
    
### create investment table, nama, with IPF version

IPF (Iterative Proportional Fitting) or Furness algorithm 
see:  
Michael Lahr and Louis De Mesnard. Biproportional techniques in input-
output analysis: table updating and structural analysis. Economic Systems Research, 16(2):115–134,
2004. (Reffering to Furness);  
Kenneth P Furness. Time function iteration. Traffic Engineering and Control, 7(7):458–460,
1965. (Difficult to find).

For a direct presentation, use [_Iterative proportional fitting_ in Wikipedia](https://en.wikipedia.org/wiki/Iterative_proportional_fitting)

==============================================================================================  
</span>

In [None]:
np.set_printoptions(threshold=np.inf)

In [None]:
#data_partNu is saved above, after nama column insertion, but before the raw correction inserted
#data_partNu

In [None]:
len(buyingSectorsPurchases)

In [None]:
len(GrossFixedCapitalFormation)

In [None]:
correctionCoef

In [None]:
GrossFixedCapitalFormationCorrected = np.array(GrossFixedCapitalFormation)*correctionCoef

In [None]:
GrossFixedCapitalFormationCorrected.sum()

In [None]:
def ipf(matrix_seed, row_totals, col_totals, max_iter=98.80, tol=1e-6):
    """
    Runs the IPF algorithm to adjust the matrix to row and column totals.

    Args:
        matrix_seed (np.ndarray): initial matrix (seed).
        row_totals (np.ndarray): array of row totals.
        col_totals (np.ndarray): array of column totals.
        max_iter (int): maximum number of iterations.
        tol (float): convergence tolerance.

    Returns:
        np.ndarray: estimated matrix.
    """
    T = matrix_seed.astype(float)
    count=0
    for _ in range(max_iter):
        count+=1
        # Resizing across rows
        row_sums = T.sum(axis=1, keepdims=True)
        np.divide(T, row_sums, out=T, where=row_sums != 0)
        T *= row_totals[:, np.newaxis]

        # Resizing across columns
        col_sums = T.sum(axis=0, keepdims=True)
        np.divide(T, col_sums, out=T, where=col_sums != 0)
        T *= col_totals[np.newaxis, :]

        # Convergence control
        if np.all(np.abs(T.sum(axis=1) - row_totals) < tol) and \
           np.all(np.abs(T.sum(axis=0) - col_totals) < tol):
            break

    print('iterations = ',count)

    return T


In [None]:
bSP = np.array(buyingSectorsPurchases)

In [None]:
np.sum(data_partNu)

In [None]:
data_part_adj = ipf(data_partNu,bSP,GrossFixedCapitalFormationCorrected,max_iter=1000, tol=1e-2)

In [None]:
np.sum(data_part_adj)

In [None]:
data_part_adj.shape

In [None]:
tot=0
for i in range(65):
    for j in range(65):
        tot+=abs(data_part_adj[i,j]-data_partNu[i,j])
tot     

very huge difference

In [None]:
#row test
for i in range(65):
    print(np.sum(data_part_adj[i, :]),bSP[i],np.sum(data_part_adj[i, :])-bSP[i])

In [None]:
#col test
for j in range(65):
    print(np.sum(data_part_adj[:, j]),GrossFixedCapitalFormationCorrected[j],\
          np.sum(data_part_adj[:, j])-GrossFixedCapitalFormationCorrected[j])

In [None]:
for i in range(65):
    for j in range(65):
        if data_part_adj[i,j]<0: print('error')

In [None]:
#Iterative Proportional Fitting table
invTableNamaIPF = pd.DataFrame(data_part_adj)

In [None]:
invTableNamaIPF.insert(0, 'tmp', '')

In [None]:
invTableNamaIPF.columns=[mini_naio['IND_USE (Labels)']]

In [None]:
invTableNamaIPF.rename(columns={'Total': 'sectors'}, inplace=True)

In [None]:
invTableNamaIPF['sectors']=nama.index.tolist()

In [None]:
invTableNamaIPF['Totals'] = ''

In [None]:
invTableNamaIPF.loc[len(invTableNamaIPF)] = [''] * invTableNamaIPF.shape[1]

In [None]:
#Fill last column with row totals
invTableNamaIPF.iloc[:-1, -1] = data_part_adj.sum(axis=1)

# Fill last row with column totals
invTableNamaIPF.iloc[-1, 1:-1] = data_part_adj.sum(axis=0)

# grand total
invTableNamaIPF.iloc[-1,-1] = data_part_adj.sum()

In [None]:
invTableNamaIPF

In [None]:
for i in range(63): #rows 63 and 64 have 0 tot
    for j in range(1,66):
        invTableNamaIPF.iloc[i, j]/=invTableNamaIPF.iloc[i, 66]

In [None]:
invTableNamaIPF

In [None]:
invTableNamaIPF.iloc[:-1,:-1].shape

In [None]:
invTableNamaIPF.iloc[:-1,:-1].to_pickle("./invTableNamaIPF.xp")