# Vergabe NRW Data Wrangling / creating csv data_nrw_clean

### When this data was collected?
07 /01/2021  , see Vergabe NRW CSV file

### What does this script do?
This script analizes how what data is available in Vergabe NRW. Its intention is to search for vulnerabilities, and raise up questions about the codebook.

### Step 1: Defining the problem: What we want to see with this data?
    
During almost two years, the Federal and regional governments in Germany needed to aqquire a series of things to deal with COVID-19 pandemics. Due to factors such as lack of vendors, scarcity of material, urgency of purchesing among others, it is expected that many of these purchases are expected to have been made under extraordinary arrangements, involving exclusion from bidding, among others. But should all the purchases be made in this regime.

**We want to discover if there is red flags for mismanagement in COVID-19 measures by the government of Nordrhein-Westfallen in Germany.** To do that, we need to answer the following questions:

> 1. Are all the purchases regarding COVID-19 available ?
> 2. How many purchases were made on an extraordinary basis (no bidding, direct negotiation with seller, etc) ?
> 3. When these purchases were made?
> 4. By the time of extraordinary purchases, was there actually a lack of vendors of urgency of for this purchase?

And how can we beggin to answer those questions?

1. Are all the purchases regarding COVID-19 available ?
    - 1.1 Look at CPV Codes
    - 1.2 Compare the purchases made it here with external databases (there is another Vergabe portal)
    - 1.3 Validate the data with local data (for example, the city of Köln, which might have their own transparency portal)
    - 1.4 Validate the data of a municipality with a Fredom of Information Request
<br><br>
2. How many purchases were made on an extraordinary basis (no bidding, direct negotiation with seller, etc) ?
    - 2.1 See if the data from Vergabe NRW has information about the bidding format and type of contract
<br><br>    
3. When these purchases were made?
    - 3.1 See if the data from Vergabe NRW has information about the data of purchase 
    - 3.2 See if the data from Vergabe NRW has information about the date of contracts
<br><br>
4. By the time of extraordinary purchases, was there actually a lack of vendors of urgency of for this purchase?
    - 4.1 Cross the date information with COVID-19 cases in Germany
    - 4.2 Cross the date information with Hospital Capacity
    - 4.3 See if we find intresting information, milestones or similar information in news articles

### Step 2: Exploring the data

In this specific notebook I'll see the state of art of the data collected from Vergabe NRW and see what problems does it has. The following will be verified:



1. Is there duplicates?
2. Is there missing data?
    - what is this missing data doing in my db?
    
3. Is there encoding problem?
4. Do the cities have always the same name? Or do they have different writings?
5. Is there any information regarding costs? Are the derzeitige Werten (valores correntes)?
6. What each collumn means?
7. What each observation is?
8. Which of the above questions (Step 1) can we answer?

In [1]:
import pandas as pd
import matplotlib 
import numpy as np
import janitor

In [2]:
# This does not belong to the analysis, I'm just creating a function to help me lates:

def glimpse(df, maxvals=10, maxlen=110):
    print('Shape: ', df.shape)
    
    def pad(y):
        max_len = max([len(x) for x in y])
        return [x.ljust(max_len) for x in y]
    
    # Column Name
    toprnt = pad(df.columns.tolist())
    
    # Column Type
    toprnt = pad([toprnt[i] + ' ' + str(df.iloc[:,i].dtype) for i in range(df.shape[1])])
    
    # Num NAs
    num_nas = [df.iloc[:,i].isnull().sum() for i in range(df.shape[1])]
    num_nas_ratio = [int(round(x*100/df.shape[0])) for x in num_nas]
    num_nas_str = [str(x) + ' (' + str(y) + '%)' for x,y in zip(num_nas, num_nas_ratio)]
    max_len = max([len(x) for x in num_nas_str])
    num_nas_str = [x.rjust(max_len) for x in num_nas_str]
    toprnt = [x + ' ' + y + ' NAs' for x,y in zip(toprnt, num_nas_str)]
    
    # Separator
    toprnt = [x + ' : ' for x in toprnt]
    
    # Values
    toprnt = [toprnt[i] + ', '.join([str(y) for y in df.iloc[:min([maxvals,df.shape[0]]), i]]) for i in range(df.shape[1])]
    
    # Trim to maxlen
    toprnt = [x[:min(maxlen, len(x))] for x in toprnt]
    
    for x in toprnt:
        print(x)
        

####sec function:
def print_full(x):
    pd.set_option('display.max_rows', len(x))
    print(x)
    pd.reset_option('display.max_rows')

In [4]:
#importing scraped data
data_nrw = pd.read_csv("df_vergabe_nrw_jan2021.csv", low_memory=False)

We have already some problems in importing this data: DtypeWarning: Columns (14) have mixed types.Specify dtype option on import or set low_memory=False.

In [5]:
#Verificando as cinco primeiras colunas
data_nrw.head()

Unnamed: 0.1,Unnamed: 0,_id,created_at,updated_at,result_id,result_title,result_description,result_procedure_type,result_order_type,result_publication_date,...,result_buyer_postal_code,result_seller_name,result_seller_town,result_seller_country,result_geo_lon,result_geo_lat,result_value,result_created_at,result_updated_at,result_buyer_country
0,0,CXPNY42D0ZS,2022-01-04T23:00:18.455Z,2022-01-06T23:00:14.633Z,CXPNY42D0ZS,Öffnen/Verschließen von Türen/Toren aller Art ...,['Schüsseldienste kamen im Kalenderjahr 2021 z...,Öffentliche Ausschreibung,UVGO,,...,44139.0,,,,7.46023,51.49958,,2022-01-04T23:00:18.455Z,2022-01-06T23:00:14.633Z,
1,1,CXPNY42D474,2021-05-26T22:20:10.194Z,2021-06-08T22:19:38.411Z,,,,,,,...,,,,,,,,2021-05-26T22:20:10.194Z,2021-06-08T22:19:38.411Z,
2,2,CXPNY42D4QL,2021-05-26T22:18:23.840Z,2021-05-31T22:17:09.010Z,,,,,,,...,,,,,,,,2021-05-26T22:18:23.840Z,2021-05-31T22:17:09.010Z,
3,3,CXPNY42DR40,2020-05-11T00:12:51.144Z,2020-06-26T23:21:14.223Z,,,,,,,...,,,,,,,,2020-05-11T00:12:51.144Z,2020-06-26T23:21:14.223Z,
4,4,CXPNY42DRDN,2020-04-20T22:14:01.166Z,2020-05-19T00:33:44.632Z,CXPNY42DRDN,Videobeobachtung Dortmund Münsterstraße,"['Software, Hardware, Installations- und Monta...",Öffentliche Ausschreibung,UVGO,,...,44139.0,,,,7.46023,51.49958,,2020-04-20T22:14:01.166Z,2020-05-19T00:33:44.632Z,


What period is this data about?

In [6]:
print("min created_at", min(data_nrw.created_at), "\n max created_at",max(data_nrw.created_at),
      "\n min updated_at", min(data_nrw.updated_at), "\n max updated_at", max(data_nrw.updated_at))

min created_at 2019-12-17T14:29:56.168Z 
 max created_at 2022-01-07T02:11:51.498Z 
 min updated_at 2019-12-17T14:43:09.787Z 
 max updated_at 2022-01-07T02:12:23.505Z


### Is there duplicates?

In [8]:
glimpse(data_nrw)

Shape:  (47502, 24)
Unnamed: 0               int64        0 (0%) NAs : 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
_id                      object       0 (0%) NAs : CXPNY42D0ZS, CXPNY42D474, CXPNY42D4QL, CXPNY42DR40, CXPNY42
created_at               object       0 (0%) NAs : 2022-01-04T23:00:18.455Z, 2021-05-26T22:20:10.194Z, 2021-05
updated_at               object       0 (0%) NAs : 2022-01-06T23:00:14.633Z, 2021-06-08T22:19:38.411Z, 2021-05
result_id                object  10930 (23%) NAs : CXPNY42D0ZS, nan, nan, nan, CXPNY42DRDN, nan, CXPNY42YT80, 
result_title             object  11462 (24%) NAs : Öffnen/Verschließen von Türen/Toren aller Art im Wege der E
result_description       object  10930 (23%) NAs : ['Schüsseldienste kamen im Kalenderjahr 2021 zu insgesamt 4
result_procedure_type    object  13554 (29%) NAs : Öffentliche Ausschreibung, nan, nan, nan, Öffentliche Aussc
result_order_type        object  13554 (29%) NAs : UVGO, nan, nan, nan, UVGO, nan, OTHER, nan, UVGO, UVGO
result_publicatio

In [9]:
#Olhando qtde de linhas e colunas do meu df
data_nrw.shape

(47502, 24)

In [7]:
#Removendo duplicatas:

data_nrw = data_nrw.drop_duplicates()

If we have any duplicates, it is not due to scraping, because the number of hits specified in the API and our final DB lenght is the same (47502) - see file '1. Vergabe NRW CSV file' - so we are cheking now if there are any duplicates in the original database. We are here considering duplicates when two lines (two inputs) are 100% equal. 


In [10]:
data_nrw.shape

(47502, 24)

No complete duplicates.
Now we need to verify if we have any ID duplicated:

In [11]:
#See if we have duplicated ID:

id_test = data_nrw.groupby(by = '_id').agg({'_id':['count']})

In [12]:
id_test.shape

(47502, 1)

There are no repetead IDs.
Now we need to see if the `result_id` is always equal to the `_id` . I already know that the main answer is no, since I have NaNs in `result_id` and I don't have them in `_id`, so I'll first drop the NaN in `result_id` and then check if the Ids might be different.

In [13]:
x = data_nrw
x = x[x['result_id'].notna()]

x.isnull().sum() #we do not have empty columns in result_id

#See if they are the same
x['_id'].equals(x['result_id'])

True

**Answer: We do not have duplicate entries** 
<br><br>
### 2. Is there missing data?

As we saw above, we do have both NaN data and empty Data:

**Missing observations without excluding any data (original Dataset)**

- The information missing the most is about the seller: te columns `result_seller_name`, `result_seller_town`, `result_seller_country`, `result_geo_lon` and `result_value` have all 97% of missing data.<br><br>
- The columns `result_buyer_country`comes in seccond place with 94% of missing data. A possibility for it is that maybe all purchases are made in Germany. Need to dig into this information later.<br><br>
- The column `result_publication_date` has 78% of the missing data. <br><br>
- The columns `result_id`, `result_title`, `result_description`, `result_procedure_type`, `result_order_type`, `result_cpv_codes`, `result_buyer_name`, `result_buyer_address`, `result_buyer_town`, `result_buyer_postal_code`, `result_geo_lon` and `result_geo_lat` have from 23% to 29% of missing data.

However, if we consider that the valid bidinds are the ones where `result_id`!= NaN, then we have the following scenario:



In [14]:
y = data_nrw
y = y[y['result_id'].notna()]

glimpse(y)

Shape:  (36572, 24)
Unnamed: 0               int64        0 (0%) NAs : 0, 4, 6, 7, 8, 9, 10, 11, 12, 13
_id                      object       0 (0%) NAs : CXPNY42D0ZS, CXPNY42DRDN, CXPNY42YT80, CXPNY42YWAX, CXPNY43
created_at               object       0 (0%) NAs : 2022-01-04T23:00:18.455Z, 2020-04-20T22:14:01.166Z, 2019-12
updated_at               object       0 (0%) NAs : 2022-01-06T23:00:14.633Z, 2020-05-19T00:33:44.632Z, 2020-01
result_id                object       0 (0%) NAs : CXPNY42D0ZS, CXPNY42DRDN, CXPNY42YT80, CXPNY42YWAX, CXPNY43
result_title             object     532 (1%) NAs : Öffnen/Verschließen von Türen/Toren aller Art im Wege der E
result_description       object       0 (0%) NAs : ['Schüsseldienste kamen im Kalenderjahr 2021 zu insgesamt 4
result_procedure_type    object    2624 (7%) NAs : Öffentliche Ausschreibung, Öffentliche Ausschreibung, Öffen
result_order_type        object    2624 (7%) NAs : UVGO, UVGO, OTHER, nan, UVGO, UVGO, UVGO, UVGO, UVGO, UVGO
result_pu

**Missing observations whithin valid result_id:**

- 96% of missing data in columns `result_seller_name`, `result_seller_town`, `result_seller_country`, `result_value`, and 93% in `result_buyer_country` <br><br>
- 71% of missing data in column `result_publication_date`<br><br>
- 7% of missing data in columns `result_procedure_type` and `result_order_type` <br><br>

Therefore, we will create a **new working dataframe with which we will work up now:** 

In [15]:
data_nrw_valid_ids = y

#Exportando em CSV
data_nrw_valid_ids.to_csv(r'data_nrw_valid_ids.csv')

## 3 Is there any encoding problem? / 4. Do the cities have always the same name? Or do they have different writings?

We have two ways to solve this: lat lon or city name. 

Let's clean the city names to see how many unique values do we actually have:

In [29]:
#creating a function to clean german names:

def clean_german(df, x):
    return (
        df
        .replace({x: {'\W': ' '}}, regex = True)
        .replace({x: {'ä': 'ae', 'ö': 'oe', 'ü': 'ue', 'β': 'ss'}},  regex = True)
        .replace({x: {'\s{2,}': ''}}) #more than one whitespace
        .replace({x: {'^\s+': ''}})   #whitespace at the beggining
        .replace({x: {'\s+$': ''}})   #whitespace at the end
    )

In [33]:
#creating a new column with cleaned and fixed names:

data_nrw_clean = data_nrw_valid_ids

# Cria nova coluna "clean buyer town" e tranforma em lowecase
data_nrw_clean['clean_town'] = data_nrw_clean['result_buyer_town'].str.lower()

data_nrw_clean = clean_german(data_nrw_clean, 'clean_town')

#other required replacements:
data_nrw_clean = data_nrw_clean.replace({'clean_town': {'schleiden / 53937' : 'schleiden', '59192' :  'bergkamen', 
                                             '50259 pulheim' : 'pulheim',
                                             'stadt pulheim' : 'pulheim',
                                             'euv stadtbetrieb castrop rauxel aoer' : 'castrop rauxel',
                                              'schleiden   53937': 'schleiden',
                                              'gronau  westf  ': 'gronau'}}, regex = True)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data_nrw_clean['clean_town'] = data_nrw_clean['result_buyer_town'].str.lower()


In [34]:
data_nrw_clean.clean_town.unique()

array(['dortmund', 'herne', 'duesseldorf', 'essen', 'koeln', 'bonn',
       'gummersbach', 'aachen', 'moenchengladbach', 'soest', 'bochum',
       'schwelm', 'wuppertal', 'gelsenkirchen', 'siegen', 'siegburg',
       'sankt augustin', 'paderborn', 'lemgo', 'ratingen', 'krefeld',
       'schwerte', 'duisburg', 'hamm', 'muenster', 'steinfurt', 'werl',
       'wesel', 'hagen', 'castrop rauxel', 'meschede', 'olpe',
       'warburg scherfede', 'huerth', 'horn bad meinberg', 'detmold',
       'willich', 'emmerich am rhein', 'oberhausen', 'dueren', 'monschau',
       'heinsberg', 'selm', 'herford', 'billerbeck', 'rosendahl',
       'hilchenbach', 'iserlohn', 'attendorn', 'froendenberg',
       'remscheid', 'euskirchen', 'geldern', 'bielefeld', 'juelich',
       'bergisch gladbach', 'hattingen', 'kerpen', 'recklinghausen',
       'arnsberg', 'minden', 'coesfeld', 'osnabrueck', 'dillenburg',
       'netphen', 'leverkusen', 'berlin', 'werne', 'unna', 'boenen',
       'holzwickede', 'schhwelm', '

Ok, now I need to see in the clean Data Frame which municipalities are not mentioned:

Ok, we have here a first problem, since according to wikipedia NRW has 396 Städten. So some municipalities are not listed here. Let's continuing checking the funny characters and then we will see which municipalities are missing and which are here.

In [None]:
#openning oficial names from  cities in NRW:
stadt_nrw_oficial = pd.read_csv("georef-germany-gemeinde.csv", sep=';').clean_names()

#cleaning names:

stadt_nrw_oficial['clean_town'] = stadt_nrw_oficial['gemeinde_name_short_'].str.lower()

stadt_nrw_oficial = clean_german(stadt_nrw_oficial, 'clean_town')
stadt_nrw_oficial['clean_town'] = stadt_nrw_oficial['clean_town'].replace({'\s{2,}': ' '}).str.strip()


#creating the series in alphabetical order
oficial_names = stadt_nrw_oficial['clean_town'].sort_values(ascending=True)

In [None]:
oficial_names.unique()

Let's look at the names:

In [None]:
stadt_nrw_oficial['gemeinde_name_short_'].unique()

In [None]:
#cleaning names:

stadt_nrw_oficial['gemeinde_name_short_'] = stadt_nrw_oficial['gemeinde_name_short_'].str.lower()

dict_changes = {'ä': 'ae', 'ö': 'oe', 'ü': 'ue', 'β': 'ss'}
    
stadt_nrw_oficial.replace({'gemeinde_name_short_': dict_changes }, regex = True, inplace = True)

#creating the series in alphabetical order
oficial_names = stadt_nrw_oficial['gemeinde_name_short_'].sort_values(ascending=True)

First, see if there is any town name in data_nrw_clean that is not at the oficial names:

In [None]:
df1 = data_nrw_clean[~data_nrw_clean['result_buyer_town'].isin(oficial_names)]
df1.result_buyer_town.unique()

In [None]:
#using the function I created above
print_full(oficial_names)

In [None]:
dict_new_changes = {'warburg-scherfede' : 'warburg', 
                    'froendenberg$' : 'froendenberg/ruhr',
                    'schhwelm' : 'schwelm',
                    'stadt pulheim' : 'pulheim',
                    'emmerich$':'emmerich am rhein', 
                    'euv stadtbetrieb castrop-rauxel aoer' :'castrop-rauxel',
                    'bergisch-gladbach' : 'bergisch gladbach',
                    'bergsich gladbach' : 'bergisch gladbach',
                    'hennef$' : 'hennef (sieg)',
                    'langenfeld' : 'langenfeld (rhld.)',
                    'leichlingen' : 'leichlingen (rhld.)',
                    'troisdorf-sieglar' : 'troisdorf',
                    'swisttal-miel' : 'swisttal',
                    'bruehl-ost' : 'bruehl',
                    'bielefdeld': 'bielefeld',
                    'gronau$' : 'gronau (westf.)',
                    'delbrueck-westenholz': 'delbrueck',
                    'sundern' : 'sundern (sauerland)',
                    'stolberg' : 'stolberg (rhld.)',
                    'stolbeerg' : 'stolberg (rhld.)',
                    'nettersheim-zingsheim': 'nettersheim',
                    'uebach palenberg' : 'uebach-palenberg'}



#list of municipalities that don't belong to NRW:
rem = ['osnabrueck', 'dillenburg', 'berlin' ]

#cleaning:
data_nrw_clean.replace({'result_buyer_town': dict_new_changes }, regex = True, inplace = True)

#removing:
data_nrw_clean = data_nrw_clean[~data_nrw_clean['result_buyer_town'].isin(rem)]


# cheking again non-matching names:
df1 = oficial_names[~data_nrw_clean['result_buyer_town'].isin(oficial_names)]
df1.result_buyer_town.unique()

Ok, now all the names are the correct speling names. Let's save this clean DB


In [None]:
data_nrw_clean.to_csv('cleaned_df_vergabe_nrw_jan2021.csv')

In [None]:
data_nrw_clean = pd.read_csv('cleaned_df_vergabe_nrw_jan2021.csv')

### This is the DF that we are going to work from now on!

### 4.1 What cities are missing?

In [None]:
cities_with_information = data_nrw_clean['result_buyer_town'].sort_values(ascending=True)
cities_with_information = cities_with_information.unique()

df_missing = stadt_nrw_oficial[~stadt_nrw_oficial['gemeinde_name_short_'].isin(cities_with_information)]
missing_cities = df_missing.gemeinde_name_short_.unique()

In [None]:
len(missing_cities) #181
len(oficial_names) #396
len(cities_with_information) #216 = 215 cities + nan (empty spaces)

**Conclusion**

So, we have information over 215 cities and 181 cities are missing. The list of the missing cities is shown bellow. A follow-up question: does this cities have something in common ? Or are they not procuring?

In [None]:
 missing_cities

In [None]:
missing_cities = pd.DataFrame(missing_cities)
missing_cities.to_csv('missing_cities.csv')

In [None]:
data_nrw_clean['result_buyer_town'].isna().sum()

We have 10,930 entries with no information regarding buyer town. Let's take a look on it 

In [None]:
glimpse(stadt_nrw_oficial)

In [None]:
glimpse(data_nrw_clean)

In [None]:
#removing the naresult_id values
data_nrw_clean = data_nrw_clean[data_nrw_clean.result_id.notnull()]

In [None]:
data_nrw_clean.to_csv('cleaned_df_vergabe_nrw_jan2021.csv')

### THIS IS OUR DF RIGHT NOW ^^^^^^^^^^^^^^^^ 

**Finding** 

Mara just checked 'kalkar' and this city is publishing at their own website and at the on service.bund.de . We need to check if other municipalities are also publishing there!

## 5.Is there any information regarding costs? Are the derzeitige Werten (valores correntes)?


We have just one column that mention value ( `result_value` ) and this column has 97% of Nans considering already non-nan `result_id` . Let's see how this column looks like when we have the values:

In [None]:
pd.read_csv('cleaned_df_vergabe_nrw_jan2021.csv')

In [None]:
non_empy_value = data_nrw_clean[data_nrw_clean.result_value.notnull()]
non_empy_value[1:5]

In [None]:
non_empy_value.shape

Only 1436 values are non-nan

In [None]:
values = non_empy_value.result_value
values.dtypes # dtype('float64')

print('minimum value:', min(values), ' | max value', max(values))

In [None]:
#median:
values.median()

In [None]:
print_full(values)

We have many cases with value == 1.00 or value == 0.01... let's check unporobable values:

In [None]:
df_values = pd.DataFrame({'values' : values})
df_values.value_counts()

We have 131 entries with a 0.01 value, 77 with 1.00 value and 5 with 0.10 value.
Let's see with values <2, which 

In [None]:
#df1 = non_empy_value.query("result_value < 2")
df1 = data_nrw_clean[['result_value', 'result_buyer_town']]
df1['valiable_value'] = np.where(df1['result_value'] < 2, '0', '1')
df1.drop("result_value", axis=1, inplace=True)
#print_full(df1.value_counts())
print_full(df1.value_counts())

In [None]:
1436 - 131 -77 -5

So far, it looks that have or not a value looks random

In [None]:
data_nrw_clean.shape

In [None]:
glimpse(data_nrw_clean)

In [None]:
data_nrw_clean.query('result_id == "CXPNY4ZDEGR"')

In [None]:
data_nrw_clean['result_buyer_town'].value_counts()

In [None]:
data_nrw_clean['result_description'].unique()

In [None]:
data_nrw_clean['result_seller_name'].unique()

We found two different documentations for this data. In none of those it specifies all the information retrieved in the data, but only the columns *created_at* and *updated_at* . All the other explanations are so far a guess based on our observation. 

| Column Name | Description |
| ----------- | ----------- |
|'Unnamed: 0',| Index created while retaining the data, only information that is not original from the Vergabe NRW|
|'_id'| Id of input |
|'created_at| indicates when the document was first indexed. This value is also returned via the HTTP response header Date when retrieving a single document|
|'updated_at'|indicates when the document was *last updated*. This does not necessarily mean that the content has changed, only that it was last updated. This value is also returned via the HTTP response header Last-Modified when retrieving a single document. |
|'result_id| Id of a valid result. It has nan values and, when its non-nan, the value is the same as the '_id' |
|'result_title'| Short description of the object of the tender or what is that tender for |
|'result_description'| Description of the purchases / services related to the tender. |
|'result_procedure_type'| Assume one of the following values: 'Ex post Veröffentlichung (§ 30 Abs. 1)','Ex ante Veröffentlichung (Binnenmarktrelevanz)', 'Ex post Veröffentlichung', 'Ex post Veröffentlichung (§ 19 Abs.2)', 'Verhandlungsvergabe mit öffentlichem Teilnahmewettbewerb', 'Beschränkte Ausschreibung mit öffentlichem Teilnahmewettbewerb', 'Ex post Veröffentlichung (Binnenmarktrelevanz)', 'Ex post Veröffentlichung (§ 20 Abs.3)','Ex ante Veröffentlichung', 'Beschränkte Ausschreibung mit Teilnahmewettbewerb','Teilnahmewettbewerb', 'Ex ante Veröffentlichung (§ 19 Abs. 5)' |
|'result_order_type'| assume values 'UVGO', 'OTHER', 'VOB', 'VOL' or empty|
|'result_publication_date'| Publication of the tender (?) |
|'result_cpv_codes'| CPV codes related to te tender. It can have more than one. |
|'result_buyer_name'| Office for whom the tender is destinated, for ex: 'Polizeipräsidium Dortmund - ZA 13.1 - Zentrale Vergabestelle' |
|'result_buyer_address'| Address of the buyer's office |
|'result_buyer_town'| Town where the buyer's office is located|
|'result_buyer_postal_code'| Postal code of the buyer's office |
|'result_seller_name'| Company which owned the tender|
|'result_seller_town'| City where the winning company is located|
|'result_seller_country' | Country where the winning company is located |
|'result_geo_lon' | We are not sure if this lon refers to the buyer or seller|
|'result_geo_lat'| We are not sure if this lat refers to the buyer or seller|
|'result_value'| Value of the tender |
|'result_created_at'| data of creating of result (its non-nan when result_id is non-nan) | 
|'result_updated_at'| data of result update |
|'result_buyer_country'| Country where buyer's office is located|




Código para converter esse script para html:

"/Users/user/Documents/Scripts e notebooks/Python notebooks/COVID19_DE_BR/Germany/2_vergabe_nrw_data_wrangling.ipynb"


jupyter nbconvert --to html --template hidecode "/Users/user/Documents/Scripts e notebooks/Python notebooks/COVID19_DE_BR/Germany/2_vergabe_nrw_data_wrangling.ipynb"

jupyter nbconvert "/Users/user/Documents/Scripts e notebooks/Python notebooks/COVID19_DE_BR/Germany/2_vergabe_nrw_data_wrangling.ipynb"

 --no-input
