# Indexing,  selecting,  assigning reference

- You'll select specific values of a pandas DataFrame or Series to work on in most data operations, so it's a foundational skill for data science.

In [1]:
import pandas as pd
file_path = "C:/Users/teamo/PycharmProjects/Data-Analyse/Winemagz/winemag-data-130k-v2.csv"
reviews = pd.read_csv(file_path)
pd.set_option("display.max_rows",5)

In [2]:
reviews.head()

Unnamed: 0.1,Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,2,US,"Tart and snappy, the flavors of lime flesh and...",,87,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm
3,3,US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,St. Julian 2013 Reserve Late Harvest Riesling ...,Riesling,St. Julian
4,4,US,"Much like the regular bottling from 2012, this...",Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,Pinot Noir,Sweet Cheeks


In [3]:
desc = reviews.description
desc

0         Aromas include tropical fruit, broom, brimston...
1         This is ripe and fruity, a wine that is smooth...
                                ...                        
129969    A dry style of Pinot Gris, this is crisp with ...
129970    Big, rich and off-dry, this is powered by inte...
Name: description, Length: 129971, dtype: object

In [4]:
desc = reviews['description']
desc

0         Aromas include tropical fruit, broom, brimston...
1         This is ripe and fruity, a wine that is smooth...
                                ...                        
129969    A dry style of Pinot Gris, this is crisp with ...
129970    Big, rich and off-dry, this is powered by inte...
Name: description, Length: 129971, dtype: object

In [5]:
type(desc)

pandas.core.series.Series

## 1.iloc() 

In [6]:
help(reviews.iloc)

Help on _iLocIndexer in module pandas.core.indexing object:

class _iLocIndexer(_LocationIndexer)
 |  Purely integer-location based indexing for selection by position.
 |  
 |  ``.iloc[]`` is primarily integer position based (from ``0`` to
 |  ``length-1`` of the axis), but may also be used with a boolean
 |  array.
 |  
 |  Allowed inputs are:
 |  
 |  - An integer, e.g. ``5``.
 |  - A list or array of integers, e.g. ``[4, 3, 0]``.
 |  - A slice object with ints, e.g. ``1:7``.
 |  - A boolean array.
 |  - A ``callable`` function with one argument (the calling Series, DataFrame
 |    or Panel) and that returns valid output for indexing (one of the above).
 |    This is useful in method chains, when you don't have a reference to the
 |    calling object, but would like to base your selection on some value.
 |  
 |  ``.iloc`` will raise ``IndexError`` if a requested indexer is
 |  out-of-bounds, except *slice* indexers which allow out-of-bounds
 |  indexing (this conforms with python/num

### 1)向iloc[]中传递需要访问的值对应的索引

In [7]:
first_description = reviews.description.iloc[0]
first_description

"Aromas include tropical fruit, broom, brimstone and dried herb. The palate isn't overly expressive, offering unripened apple, citrus and dried sage alongside brisk acidity."

### 2)传递访问对象所在的索引，该对象可以是一个Series,或是一个值-value

In [8]:
#例1
first_row = reviews.iloc[0]
first_row

Unnamed: 0              0
country             Italy
                 ...     
variety       White Blend
winery            Nicosia
Name: 0, Length: 14, dtype: object

In [9]:
type(first_row)

pandas.core.series.Series

In [10]:
#例2
type(first_description)

str

### 3)向iloc[]中传递所需访问df中的一组值所对应的索引(可用切片表示该访问对象的索引范围)

In [15]:
first_description = reviews.description.iloc[:10]
first_description

0    Aromas include tropical fruit, broom, brimston...
1    This is ripe and fruity, a wine that is smooth...
                           ...                        
8    Savory dried thyme notes accent sunnier flavor...
9    This has great depth of flavor with its fresh ...
Name: description, Length: 10, dtype: object

In [18]:
reviews.iloc[:11,0]

0      0
1      1
      ..
9      9
10    10
Name: Unnamed: 0, Length: 11, dtype: int64

## 2.loc() 

### 1)传递所需访问对象的行索引

In [None]:
indices = [1, 2, 3, 5, 8]
sample_reviews = reviews.loc[indices]
sample_reviews

### 2)传递需要访问的对象索引，并可指定想要呈现df中的列

In [None]:
cols = ['country', 'province', 'region_1', 'region_2']
indices = [0, 1, 10, 100]
df = reviews.loc[indices, cols]
df

### 其中，indices表示行索引，cols表示由df中columns组成的字符串列表。 

In [None]:
type(cols)

### 3)以上为例，indices还可用切片形式表示所需索引的范围。

In [None]:
cols = ['country', 'variety']
df = reviews.loc[:90, cols]
df

## 3.创建DF时可设立条件筛选出所需的值

In [None]:
#例1
italian_wines = reviews[reviews.country == 'Italy']
italian_wines

In [None]:
#例2
top_oceania_wine = reviews.loc[(reviews.country.isin(['Australia', 'New Zealand']))
                              & (reviews.points >= 95)]
top_oceania_wine