# Short Spacy demo


import `spacy` and load the language model that fits your needs.
[here](https://spacy.io/usage) you can find recommendations that fit your operating system/ type of tasks.


In [None]:
!pip install -U pip setuptools wheel
!pip install -U spacy
!python -m spacy download en_core_web_sm

In [119]:
import spacy 
from spacy import displacy
nlp = spacy.load("en_core_web_sm")

## Exploring Spacy

In [85]:
text ='''Mannheim, officially the University City of Mannheim (German: Universitätsstadt Mannheim), is the second-largest city in the German state of Baden-Württemberg after the state capital of Stuttgart, and Germany's 21st-largest city, with a 2020 population of 309,119 inhabitants.[5] The city is the cultural and economic centre of the Rhine-Neckar Metropolitan Region, Germany's seventh-largest metropolitan region with nearly 2.4 million inhabitants and over 900,000 employees.[6]
Mannheim is located at the confluence of the Rhine and the Neckar in the Kurpfalz (Electoral Palatinate) region of northwestern Baden-Württemberg. The city lies in the Upper Rhine Plain, Germany's warmest region. Together with Hamburg, Mannheim is the only city bordering two other federal states. It forms a continuous conurbation of around 480,000 inhabitants with Ludwigshafen am Rhein in the neighbouring state of Rhineland-Palatinate, on the other side of the Rhine. Some northern suburbs of Mannheim belong to Hesse. Upstream along the Neckar lies Heidelberg, the fifth-largest city of Baden-Württemberg and the third-largest of the Rhine-Neckar Region.
Mannheim is unusual among German cities in that the city center's streets and avenues are laid out in a grid pattern, leading to its nickname Quadratestadt (Square City). Within a ring of avenues surrounding the city centre, there are squares numbered from A1 to U6 instead of street names. At the southern base of that system sits Mannheim Palace, one of the largest palace complexes in the world, and the second-largest in Baroque style after Versailles. It was the former home of the Prince-elector of the Electoral Palatinate, and now houses the University of Mannheim, which repeatedly receives top marks in business administration and is sometimes known as the "Harvard of Germany".[7][8][9] 
The Mannheim May Market is the largest regional consumer exhibition of Germany.[10] 
The civic symbol of Mannheim is the Romanesque Mannheim Water Tower, completed in 1886 and rising to 60 metres (200 feet) above the highest point of the art nouveau area Friedrichsplatz. Mannheim is well-known for its inventions, including the automobile,[11][12] the bicycle,[13][12] and the tractor,[12] which is why the city is often called the "city of inventions".[14][15][16] The city is the starting and finishing point of the Bertha Benz Memorial Route that follows the tracks of the first long-distance automobile trip in history.'''


Create a an `nlp` object: this is often referred to as the `doc container`. This containers has a lot of information stored inside...

In [86]:
doc = nlp(text)

let's inspect what is inside...

In [42]:
print(doc)

Mannheim, officially the University City of Mannheim (German: Universitätsstadt Mannheim), is the second-largest city in the German state of Baden-Württemberg after the state capital of Stuttgart, and Germany's 21st-largest city, with a 2020 population of 309,119 inhabitants.[5] The city is the cultural and economic centre of the Rhine-Neckar Metropolitan Region, Germany's seventh-largest metropolitan region with nearly 2.4 million inhabitants and over 900,000 employees.[6]
Mannheim is located at the confluence of the Rhine and the Neckar in the Kurpfalz (Electoral Palatinate) region of northwestern Baden-Württemberg. The city lies in the Upper Rhine Plain, Germany's warmest region. Together with Hamburg, Mannheim is the only city bordering two other federal states. It forms a continuous conurbation of around 480,000 inhabitants with Ludwigshafen am Rhein in the neighbouring state of Rhineland-Palatinate, on the other side of the Rhine. Some northern suburbs of Mannheim belong to Hesse

In [93]:
print(f"The length of the original str is: {len(text)}")
print(f"The length of the doc container object is: {len(doc)}")

The length of the original str is: 2462
The length of the doc container object is: 452


Inspect what is in the `doc` container

In [98]:
[e for e in doc][:10] #prints all first 10 elements in the doc container

[Mannheim, ,, officially, the, University, City, of, Mannheim, (, German]

In [101]:
f = doc[0]

In [103]:
f.ent_type_ # use tab completion to get an overview of the associated methods

'GPE'

In [105]:
spacy.explain('GPE') #ask spacy what this GPE means

'Countries, cities, states'

In [117]:
lemmatized_docs = [token.lemma_ for token in doc]
print("Lets inspect the results!!\n\n")
print(f"tokenized:\n\n{lemmatized_docs[:25]}\n\n") 
print(f"original: \n\n{text.split()[:25]}")

Lets inspect the results!!


tokenized:

['Mannheim', ',', 'officially', 'the', 'University', 'City', 'of', 'Mannheim', '(', 'german', ':', 'Universitätsstadt', 'Mannheim', ')', ',', 'be', 'the', 'second', '-', 'large', 'city', 'in', 'the', 'german', 'state']


original: 

['Mannheim,', 'officially', 'the', 'University', 'City', 'of', 'Mannheim', '(German:', 'Universitätsstadt', 'Mannheim),', 'is', 'the', 'second-largest', 'city', 'in', 'the', 'German', 'state', 'of', 'Baden-Württemberg', 'after', 'the', 'state', 'capital', 'of']


## Entities

In [118]:
[(e, type(e)) for e in doc.ents] #this gives you all the entities inside the doc container

[(Mannheim, spacy.tokens.span.Span),
 (University City, spacy.tokens.span.Span),
 (Mannheim, spacy.tokens.span.Span),
 (German, spacy.tokens.span.Span),
 (Universitätsstadt Mannheim, spacy.tokens.span.Span),
 (second, spacy.tokens.span.Span),
 (German, spacy.tokens.span.Span),
 (Baden-Württemberg, spacy.tokens.span.Span),
 (Stuttgart, spacy.tokens.span.Span),
 (Germany, spacy.tokens.span.Span),
 (21st, spacy.tokens.span.Span),
 (2020, spacy.tokens.span.Span),
 (309,119, spacy.tokens.span.Span),
 (Germany, spacy.tokens.span.Span),
 (seventh, spacy.tokens.span.Span),
 (nearly 2.4 million, spacy.tokens.span.Span),
 (over 900,000, spacy.tokens.span.Span),
 (Mannheim, spacy.tokens.span.Span),
 (Rhine, spacy.tokens.span.Span),
 (Neckar, spacy.tokens.span.Span),
 (Kurpfalz, spacy.tokens.span.Span),
 (Baden-Württemberg, spacy.tokens.span.Span),
 (the Upper Rhine Plain, spacy.tokens.span.Span),
 (Germany, spacy.tokens.span.Span),
 (Hamburg, spacy.tokens.span.Span),
 (Mannheim, spacy.tokens.span

In [120]:
displacy.render(doc, style="ent")




## Part of Speech

In [125]:
doc = nlp("hello my name is Anne")

In [126]:
displacy.render(doc)

In [121]:
spacy.explain('INTJ')

'interjection'

In [122]:
spacy.explain('poss')

'possession modifier'

Wether something is a noun or a verb is called **part of speech**. The method `.pos_` associated with elements in the doc container will give you information on the part of speech. 

The relationships between the words are referenced to as **dependencies**. You can access this using the method `.dep_`.

In [127]:
for x in doc:
    print(x, x.pos_, x.dep_)

hello INTJ intj
my PRON poss
name NOUN nsubj
is AUX ROOT
Anne PROPN attr


https://spacy.io/usage/linguistic-features