Kaggle给初学者们提供了简单的上手教学，这篇帖子将其关于Pandas的课程搬运至此，目前总共六节课，  
原文链接https://www.kaggle.com/learn/pandas

In [2]:
import pandas as pd

# Creating, Reading & Writing
## Creating data
Pandas中有两个核心对象，DataFrame和Series  
DataFrame是二维表格，竖列称为column，横行称为index，例如创建一个名为fruits的DataFrame对象：

In [3]:
# 方法一：先按行输入表格数据，然后分别对columns和index作补充说明
fruits = pd.DataFrame([[10, 20], [30, 40]], columns=["Apples", "Bananas"], index=["Price", "Amount"])

# 方法二：按列来输入每一个column的数据，然后对index作单独补充说明
fruits = pd.DataFrame({"Apples": [10, 30], "Bananas": [20, 40]}, index=["Price", "Amount"])

fruits

Unnamed: 0,Apples,Bananas
Price,10,20
Amount,30,40


Series是一维表格，即只有单列column，可以把一个DataFrame看作是多个Series组合起来的合体，它们总是相互关联。不对column单独命名，只有一个总的表格名，例如创建一个名为things的Series对象：

In [4]:
# 方法类似，先对表格内容进行输入，然后对index和name补充说明
things = pd.Series([1, 2, 3], index=["Milk", "Eggs", "Spam"], name="Dinner")

things

Milk    1
Eggs    2
Spam    3
Name: Dinner, dtype: int64

注意：在上述创建过程中，如果index不作特殊说明，那么就会是从0开始的连续自然数
## Reading file
数据可以存储在多种文件格式中，目前最基础的是CSV格式的文件，CSV（comma-seperated values）格式是数据被逗号隔开的一种表格，我们使用read_csv()函数把数据读取到一个DataFrame对象中：

In [None]:
# 此时pandas会自动加入从0开始的index
wine_reviews = pd.read_csv("../input/wine-reviews/winemag-data-130k-v2.csv")

# 如果原数据有自己的index，比如在第一列，则可用index_col=0来描述，这样第一列就会作为index
wine_reviews = pd.read_csv("../input/wine-reviews/winemag-data-130k-v2.csv", index_col=0)

另一种常见的数据格式是SQL（Structured Query Language），它的存储能力相当惊人，SQL有很多不同种类，每一种都需要各自的connector，它的读取没有CSV那么方便，目前在Kaggle唯一支持的种类是SQLite，应用举例：

In [None]:
import sqlite3

conn = sqlite3.connect("../input/188-million-us-wildfires/FPA_FOD_20170508.sqlite")
fires = pd.read_sql_query("SELECT * FROM fires", conn)
# SELECT是每个SQL开始陈述时的固定表达，星号*意为everything，FROM fires表示只从数据中读取名为fires的表格

## Writing file
使用to_csv()函数将数据写入CSV格式的文件：

In [None]:
wine_reviews.to_csv("wine_reviews.csv")

# Indexing, Selecting & Assigning

In [7]:
import pandas as pd
reviews = pd.read_csv("wine-reviews/winemag-data-130k-v2.csv", index_col=0)
pd.set_option("display.max_rows", 5)  # 数据量大，这里设置为最多显示5行

In [8]:
reviews

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
...,...,...,...,...,...,...,...,...,...,...,...,...,...
129969,France,"A dry style of Pinot Gris, this is crisp with ...",,90,32.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Marcel Deiss 2012 Pinot Gris (Alsace),Pinot Gris,Domaine Marcel Deiss
129970,France,"Big, rich and off-dry, this is powered by inte...",Lieu-dit Harth Cuvée Caroline,90,21.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Schoffit 2012 Lieu-dit Harth Cuvée Car...,Gewürztraminer,Domaine Schoffit


In [20]:
reviews.head(n=1)  # 若不指定，则默认n=5

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia


选取特定的某一列信息，可以用 **DataFrame.column** 或者 **DataFrame["column"]**

In [12]:
reviews.country   
reviews['country']  

0            Italy
1         Portugal
            ...   
129969      France
129970      France
Name: country, Length: 129971, dtype: object

选取特定的某一行某一列的元素，可以用 **DataFrame["column"]["index"]**

In [13]:
reviews['country'][0]

'Italy'

Index-based selection：基于数据的数字位置来选取数据，使用iloc
loc与iloc都是先行后列的操作顺序，这意味着检索行比检索列更容易

In [14]:
reviews.iloc[0]         # 选取第一行

country                                                    Italy
description    Aromas include tropical fruit, broom, brimston...
                                     ...                        
variety                                              White Blend
winery                                                   Nicosia
Name: 0, Length: 13, dtype: object

In [15]:
reviews.iloc[:, 0]         # 选取第一列，:表示所有

0            Italy
1         Portugal
            ...   
129969      France
129970      France
Name: country, Length: 129971, dtype: object

In [16]:
reviews.iloc[:3, 0]        # 选取第一列的前3个数据

0       Italy
1    Portugal
2          US
Name: country, dtype: object

In [17]:
reviews.iloc[[0, 3, 5], 0]      # 选取第一列的第1、4、6个数据

0    Italy
3       US
5    Spain
Name: country, dtype: object

In [23]:
reviews.iloc[-1]      # 选取最后一行：若是负数，则是从数据末尾开始计算，类似于python的列表功能

country                                                   France
description    Big, rich and off-dry, this is powered by inte...
                                     ...                        
variety                                           Gewürztraminer
winery                                          Domaine Schoffit
Name: 129970, Length: 13, dtype: object

Label-based selection：基于标签的选取，使用loc

In [29]:
reviews.loc[0, 'country']

'Italy'

In [27]:
reviews.loc[:, ['taster_name', 'taster_twitter_handle', 'points']]

Unnamed: 0,taster_name,taster_twitter_handle,points
0,Kerin O’Keefe,@kerinokeefe,87
1,Roger Voss,@vossroger,87
...,...,...,...
129969,Roger Voss,@vossroger,90
129970,Roger Voss,@vossroger,90


总结一下iloc与loc的区别：  
* iloc适用于基于索引进行检索，loc可以使用表格的标签进行检索  
* iloc对于range是前闭后开，loc是前后都闭

In [31]:
reviews.country == 'Italy'

0          True
1         False
          ...  
129969    False
129970    False
Name: country, Length: 129971, dtype: bool

In [32]:
reviews.loc[reviews.country == 'Italy']

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
6,Italy,"Here's a bright, informal red that opens with ...",Belsito,87,16.0,Sicily & Sardinia,Vittoria,,Kerin O’Keefe,@kerinokeefe,Terre di Giurfo 2013 Belsito Frappato (Vittoria),Frappato,Terre di Giurfo
...,...,...,...,...,...,...,...,...,...,...,...,...,...
129961,Italy,"Intense aromas of wild cherry, baking spice, t...",,90,30.0,Sicily & Sardinia,Sicilia,,Kerin O’Keefe,@kerinokeefe,COS 2013 Frappato (Sicilia),Frappato,COS
129962,Italy,"Blackberry, cassis, grilled herb and toasted a...",Sàgana Tenuta San Giacomo,90,40.0,Sicily & Sardinia,Sicilia,,Kerin O’Keefe,@kerinokeefe,Cusumano 2012 Sàgana Tenuta San Giacomo Nero d...,Nero d'Avola,Cusumano


In [33]:
reviews.loc[(reviews.country == 'Italy') & (reviews.points >= 90)]

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
120,Italy,"Slightly backward, particularly given the vint...",Bricco Rocche Prapó,92,70.0,Piedmont,Barolo,,,,Ceretto 2003 Bricco Rocche Prapó (Barolo),Nebbiolo,Ceretto
130,Italy,"At the first it was quite muted and subdued, b...",Bricco Rocche Brunate,91,70.0,Piedmont,Barolo,,,,Ceretto 2003 Bricco Rocche Brunate (Barolo),Nebbiolo,Ceretto
...,...,...,...,...,...,...,...,...,...,...,...,...,...
129961,Italy,"Intense aromas of wild cherry, baking spice, t...",,90,30.0,Sicily & Sardinia,Sicilia,,Kerin O’Keefe,@kerinokeefe,COS 2013 Frappato (Sicilia),Frappato,COS
129962,Italy,"Blackberry, cassis, grilled herb and toasted a...",Sàgana Tenuta San Giacomo,90,40.0,Sicily & Sardinia,Sicilia,,Kerin O’Keefe,@kerinokeefe,Cusumano 2012 Sàgana Tenuta San Giacomo Nero d...,Nero d'Avola,Cusumano


In [34]:
reviews.loc[(reviews.country == 'Italy') | (reviews.points >= 90)]

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
6,Italy,"Here's a bright, informal red that opens with ...",Belsito,87,16.0,Sicily & Sardinia,Vittoria,,Kerin O’Keefe,@kerinokeefe,Terre di Giurfo 2013 Belsito Frappato (Vittoria),Frappato,Terre di Giurfo
...,...,...,...,...,...,...,...,...,...,...,...,...,...
129969,France,"A dry style of Pinot Gris, this is crisp with ...",,90,32.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Marcel Deiss 2012 Pinot Gris (Alsace),Pinot Gris,Domaine Marcel Deiss
129970,France,"Big, rich and off-dry, this is powered by inte...",Lieu-dit Harth Cuvée Caroline,90,21.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Schoffit 2012 Lieu-dit Harth Cuvée Car...,Gewürztraminer,Domaine Schoffit


isin用来选取存在于列表中的数据

In [35]:
reviews.loc[reviews.country.isin(['Italy', 'France'])]

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
6,Italy,"Here's a bright, informal red that opens with ...",Belsito,87,16.0,Sicily & Sardinia,Vittoria,,Kerin O’Keefe,@kerinokeefe,Terre di Giurfo 2013 Belsito Frappato (Vittoria),Frappato,Terre di Giurfo
...,...,...,...,...,...,...,...,...,...,...,...,...,...
129969,France,"A dry style of Pinot Gris, this is crisp with ...",,90,32.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Marcel Deiss 2012 Pinot Gris (Alsace),Pinot Gris,Domaine Marcel Deiss
129970,France,"Big, rich and off-dry, this is powered by inte...",Lieu-dit Harth Cuvée Caroline,90,21.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Schoffit 2012 Lieu-dit Harth Cuvée Car...,Gewürztraminer,Domaine Schoffit


isnull（notnull）用来确认数据是否为空

In [36]:
reviews.loc[reviews.price.notnull()]

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,US,"Tart and snappy, the flavors of lime flesh and...",,87,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm
...,...,...,...,...,...,...,...,...,...,...,...,...,...
129969,France,"A dry style of Pinot Gris, this is crisp with ...",,90,32.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Marcel Deiss 2012 Pinot Gris (Alsace),Pinot Gris,Domaine Marcel Deiss
129970,France,"Big, rich and off-dry, this is powered by inte...",Lieu-dit Harth Cuvée Caroline,90,21.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Schoffit 2012 Lieu-dit Harth Cuvée Car...,Gewürztraminer,Domaine Schoffit


Assigning data：给数据赋值

In [37]:
reviews['critic'] = 'everyone'
reviews['critic']

0         everyone
1         everyone
            ...   
129969    everyone
129970    everyone
Name: critic, Length: 129971, dtype: object

# Summary functions & maps

# Grouping & Sorting

# Data types & dealing with missing data

# Renaming & Combining