## 常见的DataFrame导入和导出方式
**CSV文件：**pd.read_csv | df.to_csv

**JSON：**pd.read_json | df.to_json

**HTML：**pd.read_html | df.to_html

**剪切板：**pd.read_clipboard | df.to_clipboard

**Excel文件：**pd.read_excel | df.to_excel

**SQL数据库：**pd.read_sql | df.to_sql

总结：
- 导入用pd函数（pandas），read_xxx
- 导出用df方法（DataFrame），to_xxx

In [4]:
import pandas as pd

In [32]:
df = pd.read_excel('xl/course_participants.xlsx')

In [33]:
print(type(df))

<class 'pandas.core.frame.DataFrame'>


In [34]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 6 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   user_id    4 non-null      int64  
 1   name       4 non-null      object 
 2   age        4 non-null      int64  
 3   country    4 non-null      object 
 4   score      4 non-null      float64
 5   continent  4 non-null      object 
dtypes: float64(1), int64(2), object(3)
memory usage: 324.0+ bytes


In [35]:
df

Unnamed: 0,user_id,name,age,country,score,continent
0,1001,Mark,55,Italy,4.5,Europe
1,1000,John,33,USA,6.7,America
2,1002,Tim,41,USA,3.9,America
3,1003,Jenny,12,Germany,9.0,Europe


In [40]:
df = df.rename(columns={'user_id':'uid'})
df

Unnamed: 0,uid,name,age,country,score,continent
0,1001,Mark,55,Italy,4.5,Europe
1,1000,John,33,USA,6.7,America
2,1002,Tim,41,USA,3.9,America
3,1003,Jenny,12,Germany,9.0,Europe


In [43]:
df = df.reset_index().set_index('uid')
df

Unnamed: 0_level_0,index,name,age,country,score,continent
uid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1001,0,Mark,55,Italy,4.5,Europe
1000,1,John,33,USA,6.7,America
1002,2,Tim,41,USA,3.9,America
1003,3,Jenny,12,Germany,9.0,Europe


In [46]:
df = df.drop(columns=['index'])

In [47]:
df

Unnamed: 0_level_0,name,age,country,score,continent
uid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1001,Mark,55,Italy,4.5,Europe
1000,John,33,USA,6.7,America
1002,Tim,41,USA,3.9,America
1003,Jenny,12,Germany,9.0,Europe


In [48]:
print(type(df))

<class 'pandas.core.frame.DataFrame'>


In [49]:
df.loc[1001, :] = ['Judy', 7, 'China', 7.5, 'Asia']

In [50]:
df

Unnamed: 0_level_0,name,age,country,score,continent
uid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1001,Judy,7,China,7.5,Asia
1000,John,33,USA,6.7,America
1002,Tim,41,USA,3.9,America
1003,Jenny,12,Germany,9.0,Europe


In [52]:
df.loc[1000, :] = ['Sophia', 5, 'China', 8, 'Asia']

In [53]:
df

Unnamed: 0_level_0,name,age,country,score,continent
uid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1001,Judy,7,China,7.5,Asia
1000,Sophia,5,China,8.0,Asia
1002,Tim,41,USA,3.9,America
1003,Jenny,12,Germany,9.0,Europe


In [54]:
df.to_csv('xl/course_participants.csv')

In [55]:
df1 = pd.read_csv('xl/GDP-China.csv')
df1

Unnamed: 0,年份,国民总收入,国内生产总值,第一产业增加值,第二产业增加值,第三产业增加值,人均国内生产总值
0,2018,896915.6,900309.5,64734.0,366000.9,469574.6,64644
1,2017,820099.5,820754.3,62099.5,332742.7,425912.1,59201
2,2016,737074.0,740060.8,60139.2,296547.7,383373.9,53680
3,2015,683390.5,685992.9,57774.6,282040.3,346178.0,50028
4,2014,642097.6,641280.6,55626.3,277571.8,308082.5,47005
...,...,...,...,...,...,...,...
62,1956,1030.7,1030.7,443.9,280.4,306.4,166
63,1955,911.6,911.6,421.0,221.5,269.1,150
64,1954,859.8,859.8,392.0,210.8,257.0,144
65,1953,824.4,824.4,378.0,191.6,254.8,142


In [56]:
print(type(df1))

<class 'pandas.core.frame.DataFrame'>


**注意：**

一般在导入了一个上千行的数据时，我们要做的首要任务有：
- 执行info方法，大致了解数据内容
- 执行head方法和tail方法，查看前5行和后5行数据
- 执行describe方法，获取表格内容统计信息

In [57]:
df1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 67 entries, 0 to 66
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   年份        67 non-null     int64  
 1   国民总收入     67 non-null     float64
 2   国内生产总值    67 non-null     float64
 3   第一产业增加值   67 non-null     float64
 4   第二产业增加值   67 non-null     float64
 5   第三产业增加值   67 non-null     float64
 6   人均国内生产总值  67 non-null     int64  
dtypes: float64(5), int64(2)
memory usage: 3.8 KB


In [59]:
df1.head()

Unnamed: 0,年份,国民总收入,国内生产总值,第一产业增加值,第二产业增加值,第三产业增加值,人均国内生产总值
0,2018,896915.6,900309.5,64734.0,366000.9,469574.6,64644
1,2017,820099.5,820754.3,62099.5,332742.7,425912.1,59201
2,2016,737074.0,740060.8,60139.2,296547.7,383373.9,53680
3,2015,683390.5,685992.9,57774.6,282040.3,346178.0,50028
4,2014,642097.6,641280.6,55626.3,277571.8,308082.5,47005


In [60]:
df1.tail()

Unnamed: 0,年份,国民总收入,国内生产总值,第一产业增加值,第二产业增加值,第三产业增加值,人均国内生产总值
62,1956,1030.7,1030.7,443.9,280.4,306.4,166
63,1955,911.6,911.6,421.0,221.5,269.1,150
64,1954,859.8,859.8,392.0,210.8,257.0,144
65,1953,824.4,824.4,378.0,191.6,254.8,142
66,1952,679.1,679.1,342.9,141.1,195.1,119


In [63]:
df1.loc[:, ['国民总收入', '国内生产总值', '人均国内生产总值']].describe()

Unnamed: 0,国民总收入,国内生产总值,人均国内生产总值
count,67.0,67.0,67.0
mean,126149.523881,126622.786567,9499.313433
std,225878.714753,226568.590697,16477.251117
min,679.1,679.1,119.0
25%,1925.45,1925.45,252.0
50%,9123.6,9098.9,866.0
75%,114878.3,116290.25,9111.5
max,896915.6,900309.5,64644.0
