# NTLK BOOK. Chapter 2.

In [None]:
%pprint

In [None]:
import nltk
from nltk.corpus import state_union, brown, stopwords, gutenberg, wordnet as wn, nps_chat, udhr, names
from nltk import Text, FreqDist, ConditionalFreqDist, ConcordanceIndex

from matplotlib import pyplot
import random

Definimos unas variables que se utilizarán en varios ejercicios del capítulo.

In [None]:
punctuation = ". , : ; ! ? - < > ' ''  ` `` -- .' ,' ?' !' * --'".split() +\
'," ." ?" !" "'.split()
stops = nltk.corpus.stopwords.words('english')
stops_punct = stops + punctuation

### Exercise 4.

Read in the texts of the State of the Union addresses, using the state_union corpus reader. Count occurrences of men, women, and people in each document. What has happened to the usage of these words over time?

In [None]:
su_cfd = ConditionalFreqDist(
          (fileid[:4], target)
          for fileid in state_union.fileids()
          for word in state_union.words(fileid)
          for target in ['men', 'women', 'people']          
          if word.lower()== target          
)
su_cfd.tabulate()

"People" siempre ha sido más usado que "men" and "women".  

Antes de los finales de los 70, "men" era un poco más frecuente que "women", ya que en algunas ocasiones se utilizaba en lugar de "people".Sobre todo esto se nota en los discursos del presidente Johnson de los años 1965-1967.  

A partir del 1978 y salvo alguna excepción, "women" es más usado que "men", ya que se menciona en el contexto de discriminación.   

El pequeño repunte de "men" and "women" (en comparación con el uso de "people") en los años 2004-2006, durante el mandato de George W. Bush, podría tener que ver tanto con las preferencias lingüísticas del presidente como con la guerra en Irak, ya que en varias ocasiones "our men and women" se usa para referirse a los soldados luchando en Irak.

### Exercise 5

Investigate the holonym-meronym relations for some nouns. Remember that there are three kinds of holonym-meronym relation, so you need to use: member_meronyms(), part_meronyms(), substance_meronyms(), member_holonyms(), part_holonyms(), and substance_holonyms().

In [None]:
my_synsets = [wn.synset('nose.n.01'), wn.synset('telephone.n.01'), wn.synset('tree.n.01'), wn.synset('water.n.01')]

for synset in my_synsets:
    print(synset.name(), '\t', synset.definition())
    print('is member of {} '.format(synset.member_holonyms()))
    print('its members are {}'.format(synset.member_meronyms()))
    print('is part of {} '.format(synset.part_holonyms()))
    print('its parts are {}'.format(synset.part_meronyms()))
    print('is as a substance used in {} '.format(synset.substance_holonyms()))
    print('is made of the following substances {}'.format(synset.substance_meronyms()))
    print('\n')

### Exercise 7

According to Strunk and White's Elements of Style, the word however, used at the start of a sentence, means "in whatever way" or "to whatever extent", and not "nevertheless". They give this example of correct usage: However you advise him, he will probably do as he thinks best. (http://www.bartleby.com/141/strunk3.html) Use the concordance tool to study actual usage of this word in the various texts we have been considering. See also the LanguageLog posting "Fossilized prejudices about 'however'" at http://itre.cis.upenn.edu/~myl/languagelog/archives/001913.html

In [None]:
for category in brown.categories():
    text = brown.words(categories=category)
    print("Category: {}".format(category))
    print (ConcordanceIndex(text).print_concordance("However", width=200, lines=100))
    print('\n')
        

El método "concordance" de nltk es case-insensitive, por eso utilizo la clase ConcordanceIndex que sí permite sacar las concordancias solo para las palabras que empiezan en mayúsculas y así, es decir solo para los casos en los que "however" está al principio de una oración.  

De los más de 170 casos de uso de "however" al inicio de una oración en el Brown Corpus, solo 8, bajo mi opinión, podrían ser sustituidos por "to whatever extent", lo que nos lleva a pensar que se trata de un uso no muy frecuente.

### Exercise 8.

Define a conditional frequency distribution over the Names corpus that allows you to see which initial letters are more frequent for males vs. females.

In [None]:
for fileid in names.fileids():
    print("There are {} {} names".format(len(names.words(fileid)), fileid[:-4]))

gender_cfd = ConditionalFreqDist(
          (fileid[:-4], first_name[0])
          for fileid in names.fileids()
          for first_name in names.words(fileid)              
)
gender_cfd.tabulate()

Hay más nombres masculinos que femeninos en el corpus, por lo que si quisieramos comparar las frecuencias deberíamos primero haberlas normalizado. Sin embargo, a simple vista ya se puede ver que el porcentaje de nombres en "H", "O", "Q", "T", "U", "W", "X", "Y", "Z" entre los nombres masculinos es más elevado que entre los femeninos.

### Exercise 9

Pick a pair of texts and study the differences between them, in terms of vocabulary, vocabulary richness, genre, etc. Can you find pairs of words which have quite different meanings across the two texts, such as monstrous in Moby Dick and in Sense and Sensibility?

In [None]:
files = ['chesterton-thursday.txt', 'carroll-alice.txt']
texts = [Text(gutenberg.words(file), file) for file in files]
words = "rich poor".split()
row_format = '{:<25} {:<10} {:<10} {:<10} {:<10}'

def text_richness(file):
    tokens = len(text)
    types = len(set(text))
    lexical_diversity = round(types/tokens, 2)
    stops_num = 0
    for word in text:
        if word in stops:
            stops_num += 1
    stops_part = round(stops_num/tokens, 2)
    return(tokens, types, lexical_diversity, stops_part)

def get_dist(text, stops=None):
    text = [w.lower() for w in text]
    if stops:
        text = [w for w in text if w not in stops_punct]
    fdist = FreqDist(text)
    return fdist

def get_collocations(text):
    return text.collocations(35)

def get_concordance(word, text):
    return text.concordance(word)
    
    
print(row_format.format('', 'tokens', 'types', 'diversity', 'stopwords'))

for text in texts:
    tokens, types, lexical_diversity, stops_part = text_richness(text)
    print(row_format.format(text.name, tokens, types, lexical_diversity, stops_part))

print('\n\n')

for text in texts:
    print("Collocations: ", text.name)
    print(get_collocations(text))
    print ('\n\n')

for text in texts:
    print(get_dist(text, stops + punctuation).most_common(50))
    print ('\n\n')
    
for word in words:
    for text in texts:
        print(word, ': ', text.name)
        print(get_concordance(word, text))
        print('\n\n')

He comparado "The Man Who Was Thursday" de Chesterton con "Alice in Wonderland" de Carrol.

Los textos tienen unos índices de diversidad léxica casi idénticos (0,1 y 0,09) y la parte del texto ocupada por stopwords también es casi la misma (0.39 y 0.38).

De las colocaciones de "The Man Who Was Thursday" se desprende que se podría tratar de una novela policíaca ("Inspector Ratcliffe", "Scotland Yard", "police station"). Y basándonos en las colocaciones de "Alice", podemos deducir que muchos de los personajes son animales ("Mock Turtle", "March Hare", etc.), por lo que debería tratarse de un cuento fantástico. También algunas colocaciones parecen indicar que en el libro hay mucho diálogo ("said Alice", "trembling voice", "dead silence", "yer honor", "Alice replied").

Finalmente he mirado las concordancias de "rich" and "poor" en los dos textos. En el texto de Chesterton ambas palabras se usan tanto en relación con la situación económica de una persona o entidad ("rich man", "the poor have been rebels"), como en el sentido figurativo de "abundante" o "suntuoso" para "rich" ("rich athmosphere") y de desgraciado para "poor" ("poor old Colonel"). En el libro de Carrol la riqueza material, al parecer, no tiene mucha transcendencia. "Rich" se usa solo una vez para hablar de la sopa, mientras que "poor" se utiliza 27 veces y casi siempre para expresar compasión ("poor Alice", "poor little things", "my poor hands").

### Exercise 10

Read the BBC News article: UK's Vicky Pollards 'left behind' http://news.bbc.co.uk/1/hi/education/6173441.stm. The article gives the following statistic about teen language: "the top 20 words used, including yeah, no, but and like, account for around a third of all words." How many word types account for a third of all word tokens, for a variety of text sources? What do you conclude about this statistic? Read more about this on LanguageLog, at http://itre.cis.upenn.edu/~myl/languagelog/archives/003993.html.

In [None]:
categories = ['adventure', 'belles_lettres', 'editorial', 'fiction', 'government', 'hobbies',
'humor', 'learned', 'lore', 'mystery', 'news', 'religion', 'reviews', 'romance',
'science_fiction']


def types_in_part(fdist, n):
    # Set the number of tokens corresponding to n% of all the tokens
    mark = fdist.N() * n
    tokens = 0
    for count, (word, freq) in enumerate(fdist.most_common()):
        tokens += freq
        if tokens >= mark:
            types_number = count + 1
            break
    return types_number

brown_cfdist = ConditionalFreqDist(
        (category, word.lower())
        for category in categories
        for word in brown.words(categories=category)
        if word.lower() not in stops_punct
    )

for fdist in brown_cfdist:
    length = len(brown_cfdist[fdist])
    types = types_in_part(brown_cfdist[fdist], 0.33)
    print("Category: {}. \nNumber of types in category: {}. "
          "\nWord types accounting for 1/3 of tokens: {}."
          "\n% of types accounting for 1/3 of tokens: {}.\n"
          .format(fdist, length, types, round(types/length,2)))
    

Creo que para poder comparar estos números deberíamos ver el porcentaje y no el número absoluto de los types que forman 1/3 de los tokens.  

Parece que "science fiction", "humor" y "reviews" son las categorías en las que las palabras más comunes ocupan un menor porcentaje del número total de tokens. Por el momento, no se me ocurre ninguna explicación.

### Exercise 13

What percentage of noun synsets have no hyponyms? You can get all noun synsets using wn.all_synsets('n').

In [None]:
all_sets = list(wn.all_synsets('n'))
hypless = 0

for synset in all_sets:
    if not len(synset.hyponyms()):
        hypless += 1
   
print('{} percent of nouns have no hyponyms.'.format(round(hypless/len(all_sets)*100, 2)))

### Exercise 14

Define a function supergloss(s) that takes a synset s as its argument and returns a string consisting of the concatenation of the definition of s, and the definitions of all the hypernyms and hyponyms of s.

In [None]:
def supergloss(s):
    data = []
    data.append(s.definition())
    for nyms in [s.hypernyms(), s.hyponyms()]:
        for synset in nyms:
            data.append(synset.definition())
    return '\n '.join(data)

print(supergloss(wn.synset('table.n.01')))
    

### Exercise 15

Write a program to find all words that occur at least three times in the Brown Corpus.

In [None]:
def word_occurence(text, min_freq):
    text = [w.lower() for w in text]
    fdist = FreqDist(text)
    common_words = sorted([w for w in set(text) if fdist[w] >= min_freq])
    return common_words

brown_3 = word_occurence(brown.words(), 3)
print("There are {} words that occur at least 3 times in the Brown corpus.".format(len(brown_3)))

### Ejercicio 16

Write a program to generate a table of lexical diversity scores (i.e. token/type ratios), as we saw in 1.1. Include the full set of Brown Corpus genres (nltk.corpus.brown.categories()). Which genre has the lowest diversity (greatest number of tokens per type)? Is this what you would have expected?

In [None]:
categories = nltk.corpus.brown.categories()
for category in brown.categories():
    words = brown.words(categories=category)
#     words = [word.lower() for word in words]
    lexical_diversity =  len(set(words)) / len(words)
    print(category, round(lexical_diversity, 3), len(words))

Las tres categorías con menor diversidad léxica son las de "learned", "belles lettres" y "government". Para "government" esto se podría explicar por el hecho de que se trata de un área muy específica que excluye casi por completo el uso de vocabulario propio del lenguaje coloquial, periodístico, etc. En el caso de "government" y "learned" la única explicación que se me ocurre es que los dos son de mayor extensión que el resto de los subcorpous de Brown (un texto de 10000 tendría una menor diversidad léxica que uno de 100). De este mismo modo podríamos explicar la diversidad en las categorías de "science fiction", "reviews" y "religion".

### Exercise 17

Write a function that finds the 50 most frequently occurring words of a text that are not stopwords.

In [None]:
def most_frequent_50(text):
    text = [word for word in text if word.lower() not in stops]
    fdist = FreqDist(text)
    return fdist.most_common(50)

most_frequent_50(brown.words(categories='fiction'))

### Exercise 18

Write a program to print the 50 most frequent bigrams (pairs of adjacent words) of a text, omitting bigrams that contain stopwords.

In [None]:
def most_frequent_bigrams(sents, num):
    bigrams = []
    for sent in sents:
        bigrams += list(nltk.bigrams(sent))
    bigrams = [(w_1, w_2) for (w_1, w_2) in bigrams 
               if w_1.lower() not in stops_punct 
               and w_2.lower() not in stops_punct]
    fdist = FreqDist(bigrams)
    return fdist.most_common(num)

most_frequent_bigrams(brown.sents(categories='fiction'), 50)

Además de los stopwords he omitido también los signos de puntuación.  

En vez de pasarle una lista de palabras a la función le paso una lista de oraciones, así se evita que se considere como bigram la combinación de la última y primera palabras de las frases adyacentes.

### Exercise 19

Write a program to create a table of word frequencies by genre, like the one given in 1 for modals. Choose your own words and try to find words whose presence (or absence) is typical of a genre. Discuss your findings.

In [None]:
my_words = ["can't", "cannot"]
categories = brown.categories()

cfd = ConditionalFreqDist(
    (category, target)
    for category in categories
    for word in brown.words(categories=category)
    for target in my_words
    if word.lower() == target
)

cfd.tabulate()

Hemos comparado el uso de "can't" y "cannot" en los texto de distintas categorías. Como era de esperar, "can't" es más presente que "cannot" en las cateogrías que corresponde a las novelas, ya que en éstas se imita mucho el lenguaje coloquial. En las noticias "can't" se usa con la misma frecuencia que "cannot", lo que se podría atribuir a la abundancia de citas directas en las noticias. En las demás categorías vemos una mayor frecuencia de "cannot".  

### Exercise 20

Write a function word_freq() that takes a word and the name of a section of the Brown Corpus as arguments, and computes the frequency of the word in that section of the corpus.

In [None]:
def word_freq(word, section):
    text = brown.words(categories=section)
    text = [w.lower() for w in text]
    fdist = FreqDist(text)
    return fdist[word]

word_freq('my', 'fiction')

### Exercise 22

Define a function hedge(text) which processes a text and produces a new version with the word 'like' between every third word.

In [None]:
def hedge(text):
    new_text = []
    order = 0
    for word in text:
        new_text.append(word)
        order += 1
        if order == 3:
            if word not in punctuation:
                new_text.append('like')
                order = 0
            else:
                order = 2
    return new_text

hedge(brown.words(categories='fiction'))[:70]

### Exercise 23

Let f(w) be the frequency of a word w in free text. Suppose that all the words of a text are ranked according to their frequency, with the most frequent word first. Zipf's law states that the frequency of a word type is inversely proportional to its rank (i.e. f × r = k, for some constant k). For example, the 50th most common word type should occur three times as frequently as the 150th most common word type.

a. Write a function to process a large text and plot word frequency against word rank using pylab.plot. Do you confirm Zipf's law? (Hint: it helps to use a logarithmic scale). What is going on at the extreme ends of the plotted line?

In [None]:
def plot_rank_freq(text):
    text = [w.lower() for w in text]
    fdist = FreqDist(text)
    word_freq = fdist.most_common()
    ranks, freqs = list(), list()
    for count, tupla in enumerate(word_freq):
        ranks.append(count+1)
        freqs.append(tupla[1])
    pyplot.plot(ranks, freqs)
    pyplot.title('Word rank against word frequency')
    pyplot.xlabel('Rank')
    pyplot.ylabel('Frequency')
    pyplot.yscale('log')
    pyplot.xscale('log')
    pyplot.show()

plot_rank_freq(brown.words())

La ley de Zipf no funciona en los extremos. En un texto normal, es prácticamente imposible que el primer type más común sea 2 veces más frecuente que el segundo más común. En el otro extremo, el type número 25000 y el número 50000 y todos los que están entre ellos y los siguen tendrán una frecuencia de 1.

b. Generate random text, e.g., using random.choice("abcdefg "), taking care to include the space character. You will need to import random first. Use the string concatenation operator to accumulate characters into a (very) long string. Then tokenize this string, and generate the Zipf plot as before, and compare the two plots. What do you make of Zipf's Law in the light of this?

In [None]:
alpha_string = "abcdefg "
random_string = str()
for i in range(2000000):
    random_string += random.choice(alpha_string)
random_words = random_string.split()

plot_rank_freq(random_words)
    

Este plot se desvía más aún de la ley de Zipf y no solo en los extremos. Creo que es debido a que las palabras de la misma longitud tienden a tener unas frecuencias muy similares. En una distribución sacada de un texto normal un type de dos letras puede darse 1 o 1000 veces, mientras que aquí eso no va a suceder. Por la misma razón, ciertas frecuencias simplemente nunca se van dar, de ahí este gráfico en "escalones". 

### Exercise 24

Modify the text generation program in 2.2 further, to do the following tasks:

a. Store the n most likely words in a list words then randomly choose a word from the list using random.choice(). (You will need to import random first.)  


b. Select a particular genre, such as a section of the Brown Corpus, or a genesis translation, one of the Gutenberg texts, or one of the Web texts. Train the model on this corpus and get it to generate random text. You may have to experiment with different start words. How intelligible is the text? Discuss the strengths and weaknesses of this method of generating random text.

In [None]:
def generate_model(cfdist, word, length, candidates_number):
    for i in range(length):
        print(word, end=' ')
        candidates = [candidate for (candidate, frequency) in cfdist[word].most_common(candidates_number)]
        word = random.choice(candidates)

categories = ['fiction', 'news']
candidates_nums = [3,6,10,15,40]
words = ['the', 'orange']

for category in categories:
    bigrams = nltk.bigrams(brown.words(categories=category))
    cfd = nltk.ConditionalFreqDist(bigrams)
    print(category)
    print('\n')
    for word in words:
        print('The word is {}'.format(word), '\n')
        for cand_num in candidates_nums:
            print('Choosing from {} bigrams:'.format(cand_num), end=' ')
            try:
                generate_model(cfd, word, 40, cand_num)
            except IndexError:
                print("No such word.")
                break
            else:
                print('\n')
        print('\n') 
    print('\n')            
                

Creo que la "inteligibilidad" del texto generado no cambia dependiendo del número de bigrams de los que se elige al azar la siguiente palabra, lo que sí cambia, obviamente, es la diversidad de las palabras utilizadas. 

También cuanto menos frecuente la palabra por la que empezamos (por ejemplo, "orange"), más probable que generemos unos resultados parecidos, al menos en la primera parte. 

   ### Exercise 25

Define a function find_language() that takes a string as its argument, and returns a list of languages that have that string as a word. Use the udhr corpus and limit your searches to files in the Latin-1 encoding.

In [None]:
languages = [language for language in udhr.fileids() if 'Latin1' in language]

def find_language(word):
    languages_with_word = [language for language in languages
                          if word.lower() in [w.lower() for w in udhr.words(language)]]
    return languages_with_word

find_language('human')

### Exercise 26

What is the branching factor of the noun hypernym hierarchy? I.e. for every noun synset that has hyponyms — or children in the hypernym hierarchy — how many do they have on average? You can get all noun synsets using wn.all_synsets('n').

In [None]:
all_sets = list(wn.all_synsets('n'))
total_hyponyms = 0
sets_with_hyponyms = 0

for sset in all_sets:
    if sset.hyponyms():
        total_hyponyms += len(sset.hyponyms())
        sets_with_hyponyms += 1
average_hyponyms = total_hyponyms / sets_with_hyponyms
round(average_hyponyms, 2)


### Exercise 27

The polysemy of a word is the number of senses it has. Using WordNet, we can determine that the noun dog has 7 senses with: len(wn.synsets('dog', 'n')). Compute the average polysemy of nouns, verbs, adjectives and adverbs according to WordNet.

In [None]:
poses = ['n', 'v', 'a', 'r']

for pos in poses:
    all_sets = wn.all_synsets(pos)
    words = [lemma.name() 
             for synset in list(all_sets) 
             for lemma in synset.lemmas()]
    words = set(words)
    synsets_number = 0
    for word in words:
        synsets_number += len(wn.synsets(word, pos))
    average_synsets_number = synsets_number / len(words)
    print(pos, round(average_synsets_number, 2))
        

### Exercise 28

Use one of the predefined similarity measures to score the similarity of each of the following pairs of words. Rank the pairs in order of decreasing similarity. How close is your ranking to the order given here, an order that was established experimentally by (Miller & Charles, 1998): car-automobile, gem-jewel, journey-voyage, boy-lad, coast-shore, asylum-madhouse, magician-wizard, midday-noon, furnace-stove, food-fruit, bird-cock, bird-crane, tool-implement, brother-monk, lad-brother, crane-implement, journey-car, monk-oracle, cemetery-woodland, food-rooster, coast-hill, forest-graveyard, shore-woodland, monk-slave, coast-forest, lad-wizard, chord-smile, glass-magician, rooster-voyage, noon-string.

In [None]:
pairs = [{'words': ('car', 'automobile')}, {'words': ('gem', 'jewel')}, {'words': ('journey', 'voyage')},
         {'words': ('boy', 'lad')}, {'words': ('coast', 'shore')}, {'words': ('asylum', 'madhouse')}, 
         {'words': ('magician', 'wizard')}, {'words': ('midday', 'noon')}, {'words': ('furnace', 'stove')}, 
         {'words': ('food', 'fruit')}, {'words': ('bird', 'cock')}, {'words': ('bird', 'crane')}, 
         {'words': ('tool', 'implement')}, {'words': ('brother', 'monk')}, {'words': ('lad', 'brother')}, 
         {'words': ('crane', 'implement')}, {'words': ('journey', 'car')}, {'words': ('monk', 'oracle')}, 
         {'words': ('cemetery', 'woodland')}, {'words': ('food', 'rooster')}, {'words': ('coast', 'hill')}, 
         {'words': ('forest', 'graveyard')}, {'words': ('shore', 'woodland')}, {'words': ('monk', 'slave')},
         {'words': ('coast', 'forest')}, {'words': ('lad', 'wizard')}, {'words': ('chord', 'smile')}, 
         {'words': ('glass', 'magician')}, {'words': ('rooster', 'voyage')}, {'words': ('noon', 'string')}]


for pair in pairs:
    # compare each word1 synset with each word2 synset
    # and get the highest score
    highest_score = 0
    for synset1 in wn.synsets(pair['words'][0]):
        for synset2 in wn.synsets(pair['words'][1]):
            score = wn.path_similarity(synset1, synset2)
            try:
                if score > highest_score:
                    highest_score = score
            except TypeError:
                continue
    pair['score'] = round(highest_score, 2)

sorted_by_score = sorted(pairs, key=lambda k: k['score'], reverse=True)

for elm in sorted_by_score:
    print("Pair: {}. Score: {}\n".format(elm['words'], elm['score']))

La clasificación de Miller y Charles con los scores se puede ver por ejemplo [aquí](https://arxiv.org/ftp/arxiv/papers/1204/1204.0245.pdf).  

El ranking generado por wordnet en general no se aparta mucho del de Miller y Charles, aunque hay excepciones: por ejemplo, para Miller y Charles la pareja 'lad-wizard' ocupa el quinto sitio desde el final, mientras que aquí está a la mitad del ranking con un score de 0.2.

## Extra exercise

Hemos descubierto que el gráfico 1.2 del capítulo 2 no corresponde a la función que supuestamente lo genera. La función pretende "to examine the differences in word lengths for a selection of languages included in the udhr corpus" devuelve una distribución con valores absolutos, mientras en el gráfico del libro vemos unos porcentajes. Este es mi intento de modificar la función para poder generar un gráfico similar al del libro. Para simplificar las cosas solo he incluido dos lenguas en mi distribución.

In [None]:
from copy import copy, deepcopy

languages = ['Chickasaw', 'English']
cfd = nltk.ConditionalFreqDist(
      (lang, len(word))
      for lang in languages
      for word in udhr.words(lang + '-Latin1'))

rel_cfd = deepcopy(cfd) # Creamos una copia de la distribución condicional

# Creamos copias de cada una de las distribuciones de frecuencias
rel_fd1 = copy(cfd['Chickasaw'])
rel_fd2 = copy(cfd['English'])

def fdist_per(fd, i=100):
    """In a frequency distribution, replace absolute frequencies
    by relative frequencies per every i items.
    """
    length = fd.N()
    for key, value in fd.items():
        fd[key] = round(value / length * i, 3)
    return fd

#sustituimos las distribuciones con valores absolutos por distribuciones con valores relativos
rel_cfd['Chickasaw'] = fdist_per(rel_fd1)
rel_cfd['English'] = fdist_per(rel_fd2)


rel_cfd.plot(cumulative=True)