# Python 简介


推荐的社区:

- Anaconda
- Github
- Stackoverflows



# 配置环境

推荐直接安装 Anaconda， 默认安装大部分数据科学需要的 packages

- Downloads: https://www.anaconda.com/download/#windows
- Docments: https://docs.anaconda.com/


# 常见的IDE

- Pycharm
- **Jupyter notebook**
- Spyder
- Rodeo

# 如何管理 envs / packages

- GUI, 使用 Anaconda-naviga
- cmd, conda tools
    - https://conda.io

# Jupyter notebook 基础

## 启动 Jupyter notebook

```shell
$ jupyter notebook
[I 15:20:52.739 NotebookApp] Serving notebooks from local directory:
/home/wesm/code/pydata-book
[I 15:20:52.739 NotebookApp] 0 active kernels
[I 15:20:52.739 NotebookApp] The Jupyter Notebook is running at:
http://localhost:8888/
[I 15:20:52.740 NotebookApp] Use Control-C to stop this server and shut down
all kernels (twice to skip confirmation).
Created new window in existing browser session.
```

## 帮助及快捷键参见 toolbar - Help

## Basic cell

In [1]:
import numpy as np
import pandas as pd

In [26]:
# code and output in the same cell
data = {i : np.random.randn() for i in range(7)}
data

{0: -1.0567078173668294,
 1: 0.6999336144861797,
 2: -0.12320427981953767,
 3: -0.08124380994239834,
 4: -1.4348227343016708,
 5: 0.7503583818141778,
 6: -0.6350639269907561}

## Tab Completion

In [24]:
an_apple = 27
an_example = 42
an

In [None]:
np.

## 如何查看代码内部的说明

In [18]:
np.arange?

In [20]:
?np.arange

In [27]:
# go deeper
??pd.DataFrame

In [8]:
?help

In [7]:
help(np.arange)

Help on built-in function arange in module numpy.core.multiarray:

arange(...)
    arange([start,] stop[, step,], dtype=None)
    
    Return evenly spaced values within a given interval.
    
    Values are generated within the half-open interval ``[start, stop)``
    (in other words, the interval including `start` but excluding `stop`).
    For integer arguments the function is equivalent to the Python built-in
    `range <http://docs.python.org/lib/built-in-funcs.html>`_ function,
    but returns an ndarray rather than a list.
    
    When using a non-integer step, such as 0.1, the results will often not
    be consistent.  It is better to use ``linspace`` for these cases.
    
    Parameters
    ----------
    start : number, optional
        Start of interval.  The interval includes this value.  The default
        start value is 0.
    stop : number
        End of interval.  The interval does not include this value, except
        in some cases where `step` is not an integer and

click just after read_csv( in the cell below and press Shift+Tab

In [None]:
pd.read_csv(

## Magic function


In [29]:
%timeit np.random.randint(10)

The slowest run took 14.12 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 1.68 µs per loop


In [30]:
%time np.random.randint(10)

CPU times: user 25 µs, sys: 1e+03 ns, total: 26 µs
Wall time: 27.2 µs


7

In [33]:
?%save

In [44]:
%%writefile sample.py

def f(x, y, z):
    return (x + y) / z

a = 5
b = 6
c = 7.5

result = f(a, b, c)
print("results: {}".format(result))

Overwriting sample.py


In [None]:
%load sample.py

In [47]:
%run sample.py

results: 1.4666666666666666


# Python Language Basics

## Language Semantics

### Indentation, not braces

```python
for x in array:
    if x < pivot:
        less.append(x)
    else:
        greater.append(x)
```

```python
a = 5; b = 6; c = 7
```

### Everything is an object

### Comments

```python
results = []
for line in file_handle:
    # keep the empty lines for now
    # if len(line) == 0:
    #   continue
    results.append(line.replace('foo', 'bar'))
```

```python
print("Reached this line")  # Simple status report
```

### Function and object method calls

```python
result = f(x, y, z)
g()
```


```python
obj.some_method(x, y, z)
```


```python
result = f(a, b, c, d=5, e='foo')
```

## Variables and argument passing

In [1]:
a = [1, 2, 3]

In [2]:
b = a

In [3]:
a.append(4)
b

[1, 2, 3, 4]

In [4]:
a = 4.5
b = 2
# String formatting, to be visited later
print('a is {0}, b is {1}'.format(type(a), type(b)))
a / b

a is <class 'float'>, b is <class 'int'>


2.25

In [5]:
a = 5
isinstance(a, int)

True

In [6]:
a = 5; b = 4.5
isinstance(a, (int, float))

True

In [7]:
isinstance(b, (int, float))

True

## Attributes and methods

```python
In [1]: a = 'foo'

In [2]: a.<Press Tab>
a.capitalize  a.format      a.isupper     a.rindex      a.strip
a.center      a.index       a.join        a.rjust       a.swapcase
a.count       a.isalnum     a.ljust       a.rpartition  a.title
a.decode      a.isalpha     a.lower       a.rsplit      a.translate
a.encode      a.isdigit     a.lstrip      a.rstrip      a.upper
a.endswith    a.islower     a.partition   a.split       a.zfill
a.expandtabs  a.isspace     a.replace     a.splitlines
a.find        a.istitle     a.rfind       a.startswith
```

## Imports

```python
# some_module.py
PI = 3.14159

def f(x):
    return x + 2

def g(a, b):
    return a + b
```
```python
import some_module
result = some_module.f(5)
pi = some_module.PI

from some_module import f, g, PI
result = g(5, PI)

import some_module as sm
from some_module import PI as pi, g as gf

r1 = sm.f(pi)
r2 = gf(6, pi)
```

## Binary operators and comparisons

In [9]:
print(5 - 7)
print(12 + 21.5)
print(5 <= 2)

-2
33.5
False


In [10]:
a = [1, 2, 3]
b = a
c = list(a)

print(a is b)
print(a is not c)

True
True


In [11]:
a == c

True

In [12]:
a = None
a is None

True

## Mutable and immutable objects

In [13]:
a_list = ['foo', 2, [4, 5]]
a_list[2] = (3, 4)
a_list

['foo', 2, (3, 4)]

In [14]:
a_tuple = (3, 5, (4, 5))
a_tuple[1] = 'four'

TypeError: 'tuple' object does not support item assignment

## String

In [1]:
a = 'one way of writing a string'
b = "another way"

c = """
This is a longer string that
spans multiple lines
"""

In [2]:
c.count('\n')

3

In [5]:
c.count('n')

4

In [11]:
a = 'this is a string'
a[10] = 'f'

TypeError: 'str' object does not support item assignment

In [12]:
b = a.replace('string', 'longer string')
b

'this is a longer string'

In [13]:
a = 5.6
s = str(a)
print(s)

5.6


In [14]:
s = 'python'
list(s)

['p', 'y', 't', 'h', 'o', 'n']

In [15]:
s = r'this\has\no\special\characters'
s

'this\\has\\no\\special\\characters'

In [16]:
a = 'this is the first half '
b = 'and this is the second half'
a + b

'this is the first half and this is the second half'

In [17]:
template = '{0:.2f} {1:s} are worth US${2:d}'
template.format(4.5560, 'Argentine Pesos', 1)

'4.56 Argentine Pesos are worth US$1'

## Booleans

In [18]:
True and True
False or True

True

## Dates and times

In [19]:
from datetime import datetime, date, time
dt = datetime(2011, 10, 29, 20, 30, 21)

In [None]:
dt.day

In [None]:
dt.minute

In [None]:
dt.date()

In [None]:
dt.time()

In [None]:
dt.strftime('%m/%d/%Y %H:%M')

In [None]:
datetime.strptime('20091031', '%Y%m%d')

In [None]:
dt.replace(minute=0, second=0)

In [None]:
dt2 = datetime(2011, 11, 15, 22, 30)
delta = dt2 - dt
print(delta)
type(delta)

In [None]:
dt
dt + delta

## Control Flow

### if, elif, and else

#### single condition
```Python
if x < 0:
    print('It's negative')
```

#### multiple conditions

```Python
if x < 0:
    print('It's negative')
elif x == 0:
    print('Equal to zero')
elif 0 < x < 5:
    print('Positive but smaller than 5')
else:
    print('Positive and larger than or equal to 5')
```

#### or condition

```Python
a = 5; b = 7
c = 8; d = 4

if a < b or c > d:
    print('Made it')
```

### Chain conditions

In [21]:
if 4 > 3 > 2 > 1:
    print("Chain condition is True !")

Chain condition is True !


In [23]:
if 4 > 3 > 2 < 1:
    print("Chain condition is True !")
else:
    print("Chain condition is Not True !")

Chain condition is Not True !


## for loops

```Python
# 1
for value in collection:
    # do something with value

# 2
sequence = [1, 2, None, 4, None, 5]
total = 0

for value in sequence:
    if value is None:
        continue
    total += value

#3
sequence = [1, 2, 0, 4, 6, 5, 2, 1]
total_until_5 = 0
for value in sequence:
    if value == 5:
        break
    total_until_5 += value
```

In [24]:
# two loop for 2-d tuple
for i in range(4):
    for j in range(4):
        if j > i:
            break
        print((i, j))

(0, 0)
(1, 0)
(1, 1)
(2, 0)
(2, 1)
(2, 2)
(3, 0)
(3, 1)
(3, 2)
(3, 3)


## while loops

In [26]:
x = 256
total = 0

while x > 0:
    if total > 500:
        break
    total += x
    x = x // 2

print(x)

4


## range

In [50]:
help(range)

Help on class range in module builtins:

class range(object)
 |  range(stop) -> range object
 |  range(start, stop[, step]) -> range object
 |  
 |  Return an object that produces a sequence of integers from start (inclusive)
 |  to stop (exclusive) by step.  range(i, j) produces i, i+1, i+2, ..., j-1.
 |  start defaults to 0, and stop is omitted!  range(4) produces 0, 1, 2, 3.
 |  These are exactly the valid indices for a list of 4 elements.
 |  When step is given, it specifies the increment (or decrement).
 |  
 |  Methods defined here:
 |  
 |  __contains__(self, key, /)
 |      Return key in self.
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __getitem__(self, key, /)
 |      Return self[key].
 |  
 |  __gt__(self, value, /)
 |      Return self>value.
 |  
 |  __hash__(self, /)
 |      Return hash(self).
 |  
 |  __iter__(se

In [27]:
range(10)

range(0, 10)

In [28]:
list(range(10))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [30]:
list(range(0, 20, 2))

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

In [29]:
list(range(5, 0, -1))

[5, 4, 3, 2, 1]

In [47]:
[x for x in range(10) if not x % 3]

[0, 3, 6, 9]

In [48]:
list(filter(lambda x: not x % 3, range(10)))

[0, 3, 6, 9]