## 创建DataFrame

- pandas 中有广播的功能，比如B列只有一个值，但是最后会有3个。


In [2]:
import numpy as np
import pandas as pd

dft = pd.DataFrame(
    {
        "A" : np.random.rand(3),
        "B" : 1,
        "C" : "foo",
        "D" : pd.Timestamp("20230321"),
        "E" : pd.Series([1.0] * 3).astype("float32"),
        "F" : False,
        "G" : pd.Series([1] * 3, dtype="int8"),
    }
)
dft

Unnamed: 0,A,B,C,D,E,F,G
0,0.271319,1,foo,2023-03-21,1.0,False,1
1,0.836963,1,foo,2023-03-21,1.0,False,1
2,0.499924,1,foo,2023-03-21,1.0,False,1


## 查看类型

- 使用`df.dtypes`可以查看各列的类型，或者可以使用`df.info()`来打印出更全的信息。

In [3]:
dft.dtypes

A           float64
B             int64
C            object
D    datetime64[ns]
E           float32
F              bool
G              int8
dtype: object

In [4]:
dft.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 7 columns):
 #   Column  Non-Null Count  Dtype         
---  ------  --------------  -----         
 0   A       3 non-null      float64       
 1   B       3 non-null      int64         
 2   C       3 non-null      object        
 3   D       3 non-null      datetime64[ns]
 4   E       3 non-null      float32       
 5   F       3 non-null      bool          
 6   G       3 non-null      int8          
dtypes: bool(1), datetime64[ns](1), float32(1), float64(1), int64(1), int8(1), object(1)
memory usage: 242.0+ bytes


## 类型转换

- 使用`df.astype("target_type")`来进行类型转换。
- 创建的时候通过`dtype`来指定参数。
- 创建自定义的函数来完成转换。

In [6]:
temp_df = pd.DataFrame({"close" : [1, 2, 3, 4, 5]})
temp_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   close   5 non-null      int64
dtypes: int64(1)
memory usage: 168.0 bytes


In [8]:
temp_df = temp_df.astype("float32")
temp_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   close   5 non-null      float32
dtypes: float32(1)
memory usage: 148.0 bytes


In [10]:
temp_df = pd.DataFrame({"close" : [1, "app", 3, 4, 5]})
temp_df = temp_df.astype("float32")
temp_df.info()

ValueError: could not convert string to float: 'app'

In [13]:
def my_convert_func(value) -> float:
    if isinstance(value, str):    
        new_value = value.replace(value, '123')
    else:
        new_value = value

    return np.float64(new_value)

temp_df['close'] = temp_df['close'].apply(my_convert_func)
print(temp_df.info())
print(temp_df)


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   close   5 non-null      float64
dtypes: float64(1)
memory usage: 168.0 bytes
None
   close
0    1.0
1  123.0
2    3.0
3    4.0
4    5.0


## AKshare示例

- 通过`stock_zh_a_hist()`获取量价数据。

In [4]:
import akshare as ak
import pandas as pd

stock_zh_a_hist_df = ak.stock_zh_a_hist()
stock_zh_a_hist_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7633 entries, 0 to 7632
Data columns (total 11 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   日期      7633 non-null   object 
 1   开盘      7633 non-null   float64
 2   收盘      7633 non-null   float64
 3   最高      7633 non-null   float64
 4   最低      7633 non-null   float64
 5   成交量     7633 non-null   int64  
 6   成交额     7633 non-null   float64
 7   振幅      7633 non-null   float64
 8   涨跌幅     7633 non-null   float64
 9   涨跌额     7633 non-null   float64
 10  换手率     7633 non-null   float64
dtypes: float64(9), int64(1), object(1)
memory usage: 656.1+ KB


In [None]:
stock_zh_a_hist_df['日期'] = pd.to_datetime(stock_zh_a_hist_df['日期'])
stock_zh_a_hist_df.info()