# Intake für Datenwissenschaftler

Intake erleichtert das Laden vieler verschiedener Formate und Typen. Um einen vollständigen Überblick zu erhalten, schaut euch das [Plugin Directory](https://intake.readthedocs.io/en/latest/plugin-directory.html) und das [Intake Project Dashboard](https://intake.github.io/status/) an. Intake überführt die Daten dann in übliche Speicherformate wie Pandas DataFrames, Numpy-Arrays oder Python-Listen. Anschließend sind sie leicht durchsuchbar und auch für verteilte Systeme zugänglich. Sollte euch ein Plugin fehlen, könnt ihr auch selbst welche estellen, wie in [Making Drivers](https://intake.readthedocs.io/en/latest/making-plugins.html) beschrieben.

## Laden einer Datenquelle

Im Folgenden werden wir zwei csv-Datensätze lesen und in einen Intake-Katalog überführen. 

In [1]:
import intake
ds = intake.open_csv('states_*.csv')
print(ds)

<intake.source.csv.CSVSource object at 0x7fc5575ccd50>


Mit der `open_*`-Funktion von Intake lassen sich verschiedenen Datenquellen einlesen. Je nach Datenformat oder Dienst lassen sich unterschiedliche Argmuente verwenden.

### Konfigurieren des Suchpfades für Datenquellen

Intake überprüft die Intake-Konfigurationsdatei nach `catalog_pat`und die Umgebungsvariable `"INTAKE_PATH"` auf eine durch Doppelpunkte getrennte Liste von Pfaden bzw. Semikolon in Windows, um nach Katalogdateien zu suchen. Beim Import `intake` werden alle Einträge aus allen Katalogen angezeigt, auf die als Teil eines globalen Katalogs von `intake.cat` referenziert werden.

## Daten lesen

Intake liest Daten in Container verschiedener Formate:

* Tabellen in Pandas DataFrames
* Mehrdimensionale Array in Numpy Arrays
* Semistrukturierte Daten in Python-Listen von Objekten, üblicherweise Dictionaries

Um herauszufinden, in welchem Containerformat Intake die Daten vorhält, könnt ihr das `container`-Attribut verwenden:

In [2]:
ds.container

'dataframe'

Neben `dataframe` kann das Ergebnis auch `ndarray` oder `python` sein.

In [3]:
df = ds.read()
df.head()

Unnamed: 0,state,slug,code,nickname,website,admission_date,admission_number,capital_city,capital_url,population,population_rank,constitution_url,state_flag_url,state_seal_url,map_image_url,landscape_background_url,skyline_background_url,twitter_url,facebook_url
0,Alabama,alabama,AL,Yellowhammer State,http://www.alabama.gov,1819-12-14,22,Montgomery,http://www.montgomeryal.gov,4833722,23,http://alisondb.legislature.state.al.us/alison...,https://cdn.civil.services/us-states/flags/ala...,https://cdn.civil.services/us-states/seals/ala...,https://cdn.civil.services/us-states/maps/alab...,https://cdn.civil.services/us-states/backgroun...,https://cdn.civil.services/us-states/backgroun...,https://twitter.com/alabamagov,https://www.facebook.com/alabamagov
1,Alaska,alaska,AK,The Last Frontier,http://alaska.gov,1959-01-03,49,Juneau,http://www.juneau.org,735132,47,http://www.legis.state.ak.us/basis/folioproxy....,https://cdn.civil.services/us-states/flags/ala...,https://cdn.civil.services/us-states/seals/ala...,https://cdn.civil.services/us-states/maps/alas...,https://cdn.civil.services/us-states/backgroun...,https://cdn.civil.services/us-states/backgroun...,https://twitter.com/alaska,https://www.facebook.com/AlaskaLocalGovernments
2,Arizona,arizona,AZ,The Grand Canyon State,https://az.gov,1912-02-14,48,Phoenix,https://www.phoenix.gov,6626624,15,http://www.azleg.gov/Constitution.asp,https://cdn.civil.services/us-states/flags/ari...,https://cdn.civil.services/us-states/seals/ari...,https://cdn.civil.services/us-states/maps/ariz...,https://cdn.civil.services/us-states/backgroun...,https://cdn.civil.services/us-states/backgroun...,,
3,Arkansas,arkansas,AR,The Natural State,http://arkansas.gov,1836-06-15,25,Little Rock,http://www.littlerock.org,2959373,32,http://www.arkleg.state.ar.us/assembly/Summary...,https://cdn.civil.services/us-states/flags/ark...,https://cdn.civil.services/us-states/seals/ark...,https://cdn.civil.services/us-states/maps/arka...,https://cdn.civil.services/us-states/backgroun...,https://cdn.civil.services/us-states/backgroun...,https://twitter.com/arkansasgov,https://www.facebook.com/Arkansas.gov
4,California,california,CA,Golden State,http://www.ca.gov,1850-09-09,31,Sacramento,http://www.cityofsacramento.org,38332521,1,http://www.leginfo.ca.gov/const-toc.html,https://cdn.civil.services/us-states/flags/cal...,https://cdn.civil.services/us-states/seals/cal...,https://cdn.civil.services/us-states/maps/cali...,https://cdn.civil.services/us-states/backgroun...,https://cdn.civil.services/us-states/backgroun...,https://twitter.com/cagovernment,


In [4]:
for chunk in ds.read_chunked(): print('Chunk: %d' % len(chunk))

Chunk: 24
Chunk: 26


In [5]:
ddf = ds.to_dask()
ddf.head()

Unnamed: 0,state,slug,code,nickname,website,admission_date,admission_number,capital_city,capital_url,population,population_rank,constitution_url,state_flag_url,state_seal_url,map_image_url,landscape_background_url,skyline_background_url,twitter_url,facebook_url
0,Alabama,alabama,AL,Yellowhammer State,http://www.alabama.gov,1819-12-14,22,Montgomery,http://www.montgomeryal.gov,4833722,23,http://alisondb.legislature.state.al.us/alison...,https://cdn.civil.services/us-states/flags/ala...,https://cdn.civil.services/us-states/seals/ala...,https://cdn.civil.services/us-states/maps/alab...,https://cdn.civil.services/us-states/backgroun...,https://cdn.civil.services/us-states/backgroun...,https://twitter.com/alabamagov,https://www.facebook.com/alabamagov
1,Alaska,alaska,AK,The Last Frontier,http://alaska.gov,1959-01-03,49,Juneau,http://www.juneau.org,735132,47,http://www.legis.state.ak.us/basis/folioproxy....,https://cdn.civil.services/us-states/flags/ala...,https://cdn.civil.services/us-states/seals/ala...,https://cdn.civil.services/us-states/maps/alas...,https://cdn.civil.services/us-states/backgroun...,https://cdn.civil.services/us-states/backgroun...,https://twitter.com/alaska,https://www.facebook.com/AlaskaLocalGovernments
2,Arizona,arizona,AZ,The Grand Canyon State,https://az.gov,1912-02-14,48,Phoenix,https://www.phoenix.gov,6626624,15,http://www.azleg.gov/Constitution.asp,https://cdn.civil.services/us-states/flags/ari...,https://cdn.civil.services/us-states/seals/ari...,https://cdn.civil.services/us-states/maps/ariz...,https://cdn.civil.services/us-states/backgroun...,https://cdn.civil.services/us-states/backgroun...,,
3,Arkansas,arkansas,AR,The Natural State,http://arkansas.gov,1836-06-15,25,Little Rock,http://www.littlerock.org,2959373,32,http://www.arkleg.state.ar.us/assembly/Summary...,https://cdn.civil.services/us-states/flags/ark...,https://cdn.civil.services/us-states/seals/ark...,https://cdn.civil.services/us-states/maps/arka...,https://cdn.civil.services/us-states/backgroun...,https://cdn.civil.services/us-states/backgroun...,https://twitter.com/arkansasgov,https://www.facebook.com/Arkansas.gov
4,California,california,CA,Golden State,http://www.ca.gov,1850-09-09,31,Sacramento,http://www.cityofsacramento.org,38332521,1,http://www.leginfo.ca.gov/const-toc.html,https://cdn.civil.services/us-states/flags/cal...,https://cdn.civil.services/us-states/seals/cal...,https://cdn.civil.services/us-states/maps/cali...,https://cdn.civil.services/us-states/backgroun...,https://cdn.civil.services/us-states/backgroun...,https://twitter.com/cagovernment,


In [6]:
cat = intake.open_catalog('us_states.yml')

In [7]:
list(cat)

['states']

In [8]:
cat.states.to_dask()[['state','slug']].head()

Unnamed: 0,state,slug
0,Alabama,alabama
1,Alaska,alaska
2,Arizona,arizona
3,Arkansas,arkansas
4,California,california


In [9]:
cat.states(csv_kwargs={'header': None, 'skiprows': 1}).read().head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18
0,Alabama,alabama,AL,Yellowhammer State,http://www.alabama.gov,1819-12-14,22,Montgomery,http://www.montgomeryal.gov,4833722,23,http://alisondb.legislature.state.al.us/alison...,https://cdn.civil.services/us-states/flags/ala...,https://cdn.civil.services/us-states/seals/ala...,https://cdn.civil.services/us-states/maps/alab...,https://cdn.civil.services/us-states/backgroun...,https://cdn.civil.services/us-states/backgroun...,https://twitter.com/alabamagov,https://www.facebook.com/alabamagov
1,Alaska,alaska,AK,The Last Frontier,http://alaska.gov,1959-01-03,49,Juneau,http://www.juneau.org,735132,47,http://www.legis.state.ak.us/basis/folioproxy....,https://cdn.civil.services/us-states/flags/ala...,https://cdn.civil.services/us-states/seals/ala...,https://cdn.civil.services/us-states/maps/alas...,https://cdn.civil.services/us-states/backgroun...,https://cdn.civil.services/us-states/backgroun...,https://twitter.com/alaska,https://www.facebook.com/AlaskaLocalGovernments
2,Arizona,arizona,AZ,The Grand Canyon State,https://az.gov,1912-02-14,48,Phoenix,https://www.phoenix.gov,6626624,15,http://www.azleg.gov/Constitution.asp,https://cdn.civil.services/us-states/flags/ari...,https://cdn.civil.services/us-states/seals/ari...,https://cdn.civil.services/us-states/maps/ariz...,https://cdn.civil.services/us-states/backgroun...,https://cdn.civil.services/us-states/backgroun...,,
3,Arkansas,arkansas,AR,The Natural State,http://arkansas.gov,1836-06-15,25,Little Rock,http://www.littlerock.org,2959373,32,http://www.arkleg.state.ar.us/assembly/Summary...,https://cdn.civil.services/us-states/flags/ark...,https://cdn.civil.services/us-states/seals/ark...,https://cdn.civil.services/us-states/maps/arka...,https://cdn.civil.services/us-states/backgroun...,https://cdn.civil.services/us-states/backgroun...,https://twitter.com/arkansasgov,https://www.facebook.com/Arkansas.gov
4,California,california,CA,Golden State,http://www.ca.gov,1850-09-09,31,Sacramento,http://www.cityofsacramento.org,38332521,1,http://www.leginfo.ca.gov/const-toc.html,https://cdn.civil.services/us-states/flags/cal...,https://cdn.civil.services/us-states/seals/cal...,https://cdn.civil.services/us-states/maps/cali...,https://cdn.civil.services/us-states/backgroun...,https://cdn.civil.services/us-states/backgroun...,https://twitter.com/cagovernment,
