十一年前，

九年前的一个清晨，初升的太阳，朗朗的书声，清扫着周遭的寒气。教学楼后罚球线上，刚上高一的我，拿着还不怎么会拍的球，投进了人生中的第一个空心。清脆刷网声响起的那一刻，我知道，篮球，这个我曾经最害怕的运动，将完成角色的反转，成为我的一生所爱，伴我走过每一个春夏与秋冬，直到生命的尽头。

加入“运筹OR帷幄”公众号数据科学版块一个多月了，我一直在构思自己的第一篇原创文章应该写点什么，我是应该写篇技术性的文章，还是写篇分析性的文章，想来想去我发现我想得太多，反而束缚了我的手脚。于是我决定放飞自我，想到什么写什么。就这样，原本的标题“投篮数据分析”摇身一变，成了现在的模样。

最近，火箭队总经理莫雷的言论，总裁肖华的补刀，将 NBA 推上了舆论的风口浪尖。NBA 与中国关系的乌云何时散去还未可知，但我希望，也相信，万里晴空终会到来。

美国，作为篮球运动的诞生地，她的职业联赛，依旧代表着世界篮球发展的最高水平。在 NBA 官方统计网站 https://stat.nba.com 上，不仅提供了大量的汇总数据，还提供了大量的原始数据。在那里，我们可以找到每个球员的详细个人信息，可以找到每场比赛的详细过程，也可以找到每个球员每次出手的文字与视频记录。当拿着 CBA 的统计网站和 NBA 对比一下，你能感受到，那扑面而来的差距。

# 1 获取数据

我是在微信公众号“法纳斯特”的文章“<a href="https://mp.weixin.qq.com/s/Qevx7ijb-ymn1YGpBw51Sw">NBA 球员投篮数据可视化</a>”中，找到球员投篮数据页面地址（URL）的。该 URL 的方案（scheme）、主机（host）、路径（path）部分为 https://stats.nba.com/stats/shotchartdetail? ，其查询（query）部分涉及 19 个参数，包括赛季类型（SeasonType）、球员 ID（PlayerID）等。你可能会问方案、主机、路径、查询是什么意思，看看简书的这篇博客“<a href="https://www.jianshu.com/p/406d19dfabd3">快速搞懂URL的构成</a>”或者我的这篇博客“<a href="https://www.longzf.com/CS_intro/5/">计算机科学导论(5):计算机网络</a>”中关于 HTTP 协议的部分，你就知道了。

在 Github 上查找 NBA，我发现了一个名为 nba_py 的项目，在这个项目的文档中，我找到了获取全部球员 ID 信息的 URL，该 URL 的方案、主机、路径部分为 https://stats.nba.com/stats/commonallplayers? ，其查询部分涉及 3 个参数：LeagueID、Season 和 IsOnlyCurrentSeason。这个文档中还提供了许多 URL，不过并没有说明通过请求这些 URL 可以获得什么数据，以后有时间再慢慢研究。

至此，获取数据的途径就有了。提起数据获取，我总不免激动，不免吐槽一下我们的统计学教育。依我的看法，统计学本科教育的第一课应该是数据的获取，而不是数学分析与高等代数。数学的确很重要，但如果没有见过现实世界复杂而多样的数据，不知道如何获取这些数据，统计学的学习与研究就是无源之水、无本之木。很难想象成天泡在数学公式里，很少和实际数据打交道的人能提出多少新的数据分析思想。

这份数据的爬取，将会包含上千次网络请求，其中第一次请求获取球员 ID 信息，后面的请求获取所有球员常规赛的投篮数据，不同球员对应的页面地址不同，有多少名球员，就有多少次请求。需要说明的是：为了演示的方便，下面的 Python 代码只请求了 5 个球员的页面，如需请求所有球员，将 playerIDList[0:5] 修改为 playerIDList 即可。

在我的个人电脑上，整个数据的爬取花费了近 5 个小时，最终获得的数据量为 400 多万条，文件大小近 900 M. 这份数据及球员 ID 数据我已上传至百度网盘，链接为 https://pan.baidu.com/s/1DJJLZWDZdvgxFXJwi0QUfg&shfl=shareset ，提取码为 4kcf。

为了加快爬取的效率，我曾尝试协程（使用 asyncio 和 aiohttp 库）或多进程（使用 multiprocessing 库），遗憾的是，使用协程时被服务器强制关闭了连接，使用多进程时则出现了大量连接超时请求失败的页面。如果你知道如何使用多进程或协程来提高这份数据的爬取效率，请在评论区留言或与我个人联系（微信号：xiaozhou13171317），非常感谢。5 个小时，实在太，太慢了！

In [17]:
import requests
import json
import pandas as pd
import os


# 获取球员 ID 信息
url = "https://stats.nba.com/stats/commonallplayers?"
params = {
    "LeagueID": "00",
    "Season": "2019",
    "IsOnlyCurrentSeason": 0
}
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
           + 'AppleWebKit/537.36 (KHTML, like Gecko) '
           + 'Chrome/77.0.3865.90 Safari/537.36'}
try:
    idInfo = (requests.get(url, params=params, headers=headers)
              .json()["resultSets"][0])
except Exception as e:
    print("\n错误：球员 ID 信息获取失败，请确认网络连接正常后重启程序！")
    exit()
else:
    print("\n成功：球员 ID 信息获取成功\n")
    idInfo = pd.DataFrame(idInfo["rowSet"], columns=idInfo["headers"])
    playerIDList = idInfo["PERSON_ID"].tolist()


# 获取球员常规赛投篮数据
shotDF, errorList, emptyList = pd.DataFrame(), [], []
# 如需请求所有球员页面，请将 playerIDList[0:5] 修改为 playerIDList
for i, playerID in enumerate(playerIDList[0:5]):
    url = 'https://stats.nba.com/stats/shotchartdetail?'
    params = {
        "SeasonType": "Regular Season",
        "TeamID": 0,
        "PlayerID": playerID,
        "PlayerPosition": '',
        "GameID": '',
        "Outcome": '',
        "Location": '',
        "Month": 0,
        "SeasonSegment": '',
        "DateFrom": '',
        "DateTo": '',
        "OpponentTeamID": 0,
        "VsConference": '',
        "VsDivision": '',
        "RookieYear": '',
        "GameSegment": '',
        "Period": 0,
        "LastNGames": 0,
        "ContextMeasure": "FGA",
    }
    try:
        shotDFSec = (requests.get(url, params=params,                             
                     headers=headers).json()["resultSets"][0])
    except Exception as e:
        errorList.append(playerID)
        print('错误：第{0}个球员（ID:{1}）数据获取失败'.format(i + 1,  
              playerID))
    else:
        print('成功：第{0}个球员（ID:{1}）数据获取成功'.format(i + 1, 
              playerID))
        if shotDFSec["rowSet"] != []:
            shotDFSec = pd.DataFrame(shotDFSec["rowSet"], 
                                     columns=shotDFSec["headers"])
            shotDF = shotDF.append(shotDFSec)
        else:
            emptyList.append(playerID)
            print('警告：第{0}个球员（ID:{1}）数据为空'.format(i+1, 
                  playerID))
    #print('\n')

if emptyList != []:
    print('警告：以下球员 ID 数据为空\n{0}\n'.format(emptyList))

if errorList != []:
    print('错误：以下球员 ID 数据获取失败\n{0}\n'.format(errorList))


# 将数据保存到外部文件
shotDF.to_csv('shotInfo.csv')
print('数据已输出到外部文件：', os.getcwd()+'shotInfo.csv')


成功：球员 ID 信息获取成功

成功：第1个球员（ID:76001）数据获取成功
警告：第1个球员（ID:76001）数据为空
成功：第2个球员（ID:76002）数据获取成功
警告：第2个球员（ID:76002）数据为空
成功：第3个球员（ID:76003）数据获取成功
警告：第3个球员（ID:76003）数据为空
成功：第4个球员（ID:51）数据获取成功
成功：第5个球员（ID:1505）数据获取成功
警告：以下球员 ID 数据为空
[76001, 76002, 76003]

数据已输出到外部文件： F:\shotInfo.csv


# 2 数据概览

非常感谢 Unit8 数据科学家 Rudolf Höhn 先生发表在博客平台 Medium 的文章 "From Pandas-wan to Pandas-master"，我在这份数据的处理上用到了先生在这篇文章中自定义的 convert_df 函数，在对 shotDF 数据框应用该函数后，其内存消耗由 3643 M 骤降至 134 M。仔细看看 convert_df 函数，其实它只做了一件事情：那就是当某列去重后元素个数小于原来元素个数的 50 % 时，转换列类型为 category，老子爷爷的《道德经》说得好：万物之始，大道至简，衍化至繁。

In [16]:
import pandas as pd

# 使数据框在显示时不隐藏部分行列
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

# 转换数据框列类型的函数
def convert_df(df: pd.DataFrame, deep_copy: bool = True) -> pd.DataFrame:
    """Automatically converts columns that are worth stored as
    ``categorical`` dtype.
    Parameters
    ----------
    df: pd.DataFrame
        Data frame to convert.
    deep_copy: bool
        Whether or not to perform a deep copy of the original data frame.
    Returns
    -------
    pd.DataFrame
        Optimized copy of the input data frame.
    """
    return df.copy(deep=deep_copy).astype({
        col: 'category' for col in df.columns
        if df[col].nunique() / df[col].shape[0] < 0.5})


# 读取外部文件，执行列类型转换，降低内存消耗
shotDF = pd.read_csv('F:/web_crawler_results/NBA/shotInfo.csv').iloc[:,1:].pipe(convert_df)
print('数据框大小:',shotDF.shape)

# 查看数据框前 5 行
shotDF.head()

数据框大小: (4463258, 24)


Unnamed: 0,GRID_TYPE,GAME_ID,GAME_EVENT_ID,PLAYER_ID,PLAYER_NAME,TEAM_ID,TEAM_NAME,PERIOD,MINUTES_REMAINING,SECONDS_REMAINING,EVENT_TYPE,ACTION_TYPE,SHOT_TYPE,SHOT_ZONE_BASIC,SHOT_ZONE_AREA,SHOT_ZONE_RANGE,SHOT_DISTANCE,LOC_X,LOC_Y,SHOT_ATTEMPTED_FLAG,SHOT_MADE_FLAG,GAME_DATE,HTM,VTM
0,Shot Chart Detail,20000054,369,51,Mahmoud Abdul-Rauf,1610612763,Vancouver Grizzlies,3,0,38,Missed Shot,Jump Shot,2PT Field Goal,Mid-Range,Center(C),16-24 ft.,20,64,200,1,0,20001106,VAN,ATL
1,Shot Chart Detail,20000143,131,51,Mahmoud Abdul-Rauf,1610612763,Vancouver Grizzlies,2,9,22,Made Shot,Jump Shot,2PT Field Goal,In The Paint (Non-RA),Center(C),8-16 ft.,9,1,97,1,1,20001118,VAN,DAL
2,Shot Chart Detail,20000174,313,51,Mahmoud Abdul-Rauf,1610612763,Vancouver Grizzlies,3,6,42,Missed Shot,Jump Shot,2PT Field Goal,Mid-Range,Right Side(R),16-24 ft.,18,163,82,1,0,20001124,DET,VAN
3,Shot Chart Detail,20000174,352,51,Mahmoud Abdul-Rauf,1610612763,Vancouver Grizzlies,3,2,42,Missed Shot,Jump Shot,2PT Field Goal,Mid-Range,Left Side Center(LC),16-24 ft.,16,-111,127,1,0,20001124,DET,VAN
4,Shot Chart Detail,20000174,360,51,Mahmoud Abdul-Rauf,1610612763,Vancouver Grizzlies,3,2,18,Missed Shot,Jump Shot,2PT Field Goal,Mid-Range,Right Side(R),8-16 ft.,15,150,49,1,0,20001124,DET,VAN


In [119]:
def exam_col_value(df, col):
    
    if isinstance(col, int):
        colName = df.columns[col]
        colIndex = col
    else:
        colName = col
        colIndex = df.columns.tolist().index(col)
        
    dfCol = df[colName]
    values = dfCol.drop_duplicates().sort_values().tolist()
    valuesCount = len(values)
    nullMark = dfCol.isnull()

    if any(nullMark):
        nullIndex = dfCol[nullMark].index.tolist()
    else:
        nullIndex = None

    examResult = {
                  'col_index':   colIndex, 
                  'col_name':     colName,
                  'values_count':   valuesCount,  
                  'values': values, 
                  'null_index': nullIndex
                 }
                 
    return examResult

examDfResultL = []
for col in shotDF.columns:
    examDfResultL.append(exam_col_value(shotDF, col))

pd.DataFrame(examDfResultL).to_csv('shotDF_exam_result.csv')

### ACTION_TYPE 列存在非典型缺失情况，缺失不是为空，而是被记为 No shot

In [88]:
isinstance(1,int)

True

In [26]:
shotDF[shotDF.ACTION_TYPE=='No Shot']

Unnamed: 0,GRID_TYPE,GAME_ID,GAME_EVENT_ID,PLAYER_ID,PLAYER_NAME,TEAM_ID,TEAM_NAME,PERIOD,MINUTES_REMAINING,SECONDS_REMAINING,EVENT_TYPE,ACTION_TYPE,SHOT_TYPE,SHOT_ZONE_BASIC,SHOT_ZONE_AREA,SHOT_ZONE_RANGE,SHOT_DISTANCE,LOC_X,LOC_Y,SHOT_ATTEMPTED_FLAG,SHOT_MADE_FLAG,GAME_DATE,HTM,VTM
11188,Shot Chart Detail,29700014,462,949,Shareef Abdur-Rahim,1610612763,Vancouver Grizzlies,4,3,55,Missed Shot,No Shot,2PT Field Goal,Restricted Area,Center(C),Less Than 8 ft.,0,0,0,1,0,19971031,VAN,DAL
78619,Shot Chart Detail,21500105,11,2754,Tony Allen,1610612763,Memphis Grizzlies,1,10,45,Missed Shot,No Shot,2PT Field Goal,Restricted Area,Center(C),Less Than 8 ft.,0,2,3,1,0,20151109,LAC,MEM
91101,Shot Chart Detail,21601141,174,202329,Al-Farouq Aminu,1610612757,Portland Trail Blazers,2,7,40,Made Shot,No Shot,2PT Field Goal,Restricted Area,Center(C),Less Than 8 ft.,0,-4,-1,1,1,20170401,POR,PHX
106876,Shot Chart Detail,21601191,185,1626147,Justin Anderson,1610612755,Philadelphia 76ers,2,4,46,Missed Shot,No Shot,2PT Field Goal,Restricted Area,Center(C),Less Than 8 ft.,0,-9,2,1,0,20170408,PHI,MIL
106877,Shot Chart Detail,21601191,285,1626147,Justin Anderson,1610612755,Philadelphia 76ers,3,9,4,Made Shot,No Shot,2PT Field Goal,Restricted Area,Center(C),Less Than 8 ft.,0,0,1,1,1,20170408,PHI,MIL
123152,Shot Chart Detail,20000256,216,1000,Shandon Anderson,1610612745,Houston Rockets,2,0,20,Made Shot,No Shot,2PT Field Goal,Restricted Area,Center(C),Less Than 8 ft.,0,0,0,1,1,20001205,HOU,DAL
131681,Shot Chart Detail,21601191,39,203507,Giannis Antetokounmpo,1610612749,Milwaukee Bucks,1,7,3,Made Shot,No Shot,2PT Field Goal,Restricted Area,Center(C),Less Than 8 ft.,0,0,1,1,1,20170408,PHI,MIL
176526,Shot Chart Detail,21601125,267,2772,Trevor Ariza,1610612745,Houston Rockets,2,1,21,Made Shot,No Shot,2PT Field Goal,Restricted Area,Center(C),Less Than 8 ft.,0,2,7,1,1,20170330,POR,HOU
264726,Shot Chart Detail,21500105,431,2440,Matt Barnes,1610612763,Memphis Grizzlies,4,10,17,Made Shot,No Shot,2PT Field Goal,Restricted Area,Center(C),Less Than 8 ft.,2,-25,8,1,1,20151109,LAC,MEM
311602,Shot Chart Detail,21500104,173,203382,Aron Baynes,1610612765,Detroit Pistons,2,6,10,Missed Shot,No Shot,2PT Field Goal,Restricted Area,Center(C),Less Than 8 ft.,0,3,0,1,0,20151109,GSW,DET


这份详细的投篮数据有 24 个变量，共计 446 万 3258 条，稍有遗憾的是：这只是最近 20 多年的数据，因为 NBA 官方统计网站上只记载了 96-97 赛季以来球员的详细投篮数据。这 24 个变量的含义如下：

* GRID_TYPE: 代表数据的含义，这里都是 Shot Chart Detail，即投篮图细节
* GAME_ID：标识比赛，不同的比赛有不同的 ID
* GAME_EVENT_ID：标识比赛事件，比赛中每次出手投篮有不同的 ID
* PLAYER_ID：标识球员，不同的球员有不同的 ID
* PLAYER_NAME：球员姓名
* TEAM_ID：标识球队，不同的球队有不同的 ID
* TEAM_NAME：球队名称
* PERIOD：投篮时比赛的节数，常规时间取 1,2,3 或 4，若加时，第 1 个加时记为 5，第 2 个记为 6，以此类推
* MINUTES_REMAINING：投篮时距离该节比赛结束的时间的分钟数
* SECONDS_REMAINING：投篮时距离该节比赛结束的时间的秒数
* EVENT_TYPE：球是否投进，投进为 Made Shot，否则为 Missed Shot
* ACTION_TYPE：投篮类型，共计 9 大类，70 小类
* SHOT_ZONE_BASIC：投篮基本区域
* SHOT_ZONE_AREA：投篮具体区域
* SHOT_ZONE_RANGE：投篮距离范围
* SHOT_DISTANCE：投篮距离
* LOC_X：投篮点 X 轴坐标
* LOC_Y：投篮点 Y 轴坐标
* SHOT_ATTEMPTED_FLAG：标识是否投篮，这里均为 1
* SHOT_MADE_FLAG：标识是否投进，投进为 1，否则为 0
* GAME_DATE：比赛日期
* HTM：主队简称
* VTM：客队简称

### 查看 GAME_ID 取值情况

In [44]:
import numpy as np
gameID = shotDF.GAME_ID.tolist()
np.array(set(list(map(lambda x: int(str(x)[-4:]), gameID))))

array({1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 2

### 通过 GAME_ID 构成规律查看各赛季比赛场次是否完整

比赛场次是完整的，NBA 从 96-97 赛季到 03-04 赛季，只有 29 支球队，每支球队要打 82 场常规赛，因此正常情况下比赛总场数为 $29\times 41=1189$，98-99 赛季因劳资谈判只进行了 50 场常规赛，故总共只有 725 场比赛；从 04-05 赛季开始，有 30 支球队，正常情况下比赛总场数为 1230，11-12 赛季因劳资谈判只进行了 66 场常规赛，故总共只有 990 场比赛，12-13 赛季因波士顿爆炸事件取消了 1 场比赛，故只有 1229 场比赛。

In [52]:
gameID = shotDF.GAME_ID.unique().tolist()
year = list(map(lambda x: str(x)[1:3], gameID))
gameNum = list(map(lambda x: int(str(x)[-4:]), gameID))
df_yn = pd.DataFrame(np.array([year, gameNum]).T, columns=['season','game_num'])
df_yn.groupby('season').count()

Unnamed: 0_level_0,game_num
season,Unnamed: 1_level_1
0,1189
1,1189
2,1189
3,1189
4,1230
5,1230
6,1230
7,1230
8,1230
9,1230


12-13 赛季缺少的那场比赛为当赛季第 1214 场比赛，由波士顿凯尔特人对阵印第安纳步行者，维基百科有记录

In [79]:
a = df_yn[df_yn.season=='12'].game_num.tolist()
b = sorted(list(map(int,a)))
c = np.array(b[1:])-np.array(b[:-1])
np.where(c!=1)
print(b[1212],b[1213])

1213 1215


### 查看 GAME_EVENT_ID 取值情况

In [78]:
shotDF[shotDF.GAME_ID == 20000143].sort_values(by='GAME_EVENT_ID')

Unnamed: 0,GRID_TYPE,GAME_ID,GAME_EVENT_ID,PLAYER_ID,PLAYER_NAME,TEAM_ID,TEAM_NAME,PERIOD,MINUTES_REMAINING,SECONDS_REMAINING,EVENT_TYPE,ACTION_TYPE,SHOT_TYPE,SHOT_ZONE_BASIC,SHOT_ZONE_AREA,SHOT_ZONE_RANGE,SHOT_DISTANCE,LOC_X,LOC_Y,SHOT_ATTEMPTED_FLAG,SHOT_MADE_FLAG,GAME_DATE,HTM,VTM
363028,Shot Chart Detail,20000143,2,1710,Mike Bibby,1610612763,Vancouver Grizzlies,1,11,46,Missed Shot,Jump Shot,2PT Field Goal,In The Paint (Non-RA),Center(C),Less Than 8 ft.,4,-42,13,1,0,20001118,VAN,DAL
982894,Shot Chart Detail,20000143,4,93,Hubert Davis,1610612742,Dallas Mavericks,1,11,40,Made Shot,Layup Shot,2PT Field Goal,Restricted Area,Center(C),Less Than 8 ft.,0,7,-2,1,1,20001118,VAN,DAL
1053527,Shot Chart Detail,20000143,5,1722,Michael Dickerson,1610612763,Vancouver Grizzlies,1,11,21,Missed Shot,Jump Shot,2PT Field Goal,Mid-Range,Right Side Center(RC),16-24 ft.,21,142,158,1,0,20001118,VAN,DAL
1269228,Shot Chart Detail,20000143,7,714,Michael Finley,1610612742,Dallas Mavericks,1,11,15,Missed Shot,Jump Shot,2PT Field Goal,Mid-Range,Right Side(R),8-16 ft.,12,129,13,1,0,20001118,VAN,DAL
1689200,Shot Chart Detail,20000143,9,970,Othella Harrington,1610612763,Vancouver Grizzlies,1,11,5,Made Shot,Layup Shot,2PT Field Goal,Restricted Area,Center(C),Less Than 8 ft.,1,-4,13,1,1,20001118,VAN,DAL
2956975,Shot Chart Detail,20000143,13,959,Steve Nash,1610612742,Dallas Mavericks,1,10,43,Missed Shot,Jump Shot,2PT Field Goal,In The Paint (Non-RA),Center(C),Less Than 8 ft.,6,-12,63,1,0,20001118,VAN,DAL
982895,Shot Chart Detail,20000143,16,93,Hubert Davis,1610612742,Dallas Mavericks,1,10,14,Made Shot,Jump Shot,3PT Field Goal,Above the Break 3,Left Side Center(LC),24+ ft.,27,-189,198,1,1,20001118,VAN,DAL
3405417,Shot Chart Detail,20000143,17,735,Bryant Reeves,1610612763,Vancouver Grizzlies,1,9,55,Missed Shot,Jump Shot,2PT Field Goal,Restricted Area,Center(C),Less Than 8 ft.,3,-33,7,1,0,20001118,VAN,DAL
1689201,Shot Chart Detail,20000143,19,970,Othella Harrington,1610612763,Vancouver Grizzlies,1,9,52,Made Shot,Jump Shot,2PT Field Goal,Mid-Range,Left Side(L),8-16 ft.,14,-105,97,1,1,20001118,VAN,DAL
1269229,Shot Chart Detail,20000143,20,714,Michael Finley,1610612742,Dallas Mavericks,1,9,41,Missed Shot,Jump Shot,3PT Field Goal,Right Corner 3,Right Side(R),24+ ft.,23,235,19,1,0,20001118,VAN,DAL


In [85]:
np.array(shotDF.GAME_EVENT_ID.unique().tolist())

array([ 369,  131,  313,  352,  360,  433,   86,   97,  107,  380,  432,
        445,  268,  282,  301,  310,  318,  161,  214,  225,  231,  399,
        411,  508,  521,  316,  320,  331,  371,  373,  382,  448,  469,
        486,  128,  133,  149,  343,  444,  111,  137,  159,  166,  367,
        389,  394,  400,  420,  454,  358,  363,  375,  406,  421,  431,
        440,  463,  487,  119,  122,  163,  335,  321,  346,  356,  390,
        147,  126,  134,  151,  101,  109,  273,  278,  302,  319,   99,
        106,  132,  105,  125,  140,  350,  372,  381,  402,  139,  157,
        178,  127,  129,  336,  413,   74,   84,   92,  306,  328,  379,
         94,  100,  264,  274,  284,  295,  117,  123,  407,  422,  446,
         88,  113,  386,   72,   77,  114,  345,  392,  130,  142,  164,
        376,  428,  354,  366,   89,  124,  339,  355,  110,  169,  174,
        341,  353,  368,  398,  304,  308,  311,  324,  330,  322,  325,
        370,  145,  148,  154,  200,  202,  290,  3

### 查看 PLAYER_ID 取值及其取值规律

按 ID 排序，找不到规律，比较乱，从 1 位到 6 位整数不等

In [83]:
idInfo = pd.read_csv('F:/web_crawler_results/NBA/idInfo.csv', index_col=0)
idInfo.sort_values(by='PERSON_ID')

Unnamed: 0,PERSON_ID,DISPLAY_LAST_COMMA_FIRST,DISPLAY_FIRST_LAST,ROSTERSTATUS,FROM_YEAR,TO_YEAR,PLAYERCODE,TEAM_ID,TEAM_CITY,TEAM_NAME,TEAM_ABBREVIATION,TEAM_CODE,GAMES_PLAYED_FLAG,OTHERLEAGUE_EXPERIENCE_CH
3606,2,"Scott, Byron",Byron Scott,0,1983,1996,byron_scott,0,,,,,Y,0
2438,3,"Long, Grant",Grant Long,0,1988,2002,grant_long,0,,,,,Y,0
3581,7,"Schayes, Dan",Dan Schayes,0,1981,1998,dan_schayes,0,,,,,Y,0
4015,9,"Threatt, Sedale",Sedale Threatt,0,1983,1996,sedale_threatt,0,,,,,Y,0
2212,12,"King, Chris",Chris King,0,1993,1998,chris_king,0,,,,,Y,0
3209,15,"Piatkowski, Eric",Eric Piatkowski,0,1994,2007,eric_piatkowski,0,,,,,Y,0
1062,17,"Drexler, Clyde",Clyde Drexler,0,1983,1997,clyde_drexler,0,,,,,Y,0
111,21,"Anthony, Greg",Greg Anthony,0,1991,2001,greg_anthony,0,,,,,Y,0
3792,22,"Smits, Rik",Rik Smits,0,1988,1999,rik_smits,0,,,,,Y,0
3467,23,"Rodman, Dennis",Dennis Rodman,0,1986,1999,dennis_rodman,0,,,,,Y,0


### 查看 PLAYER_NAME 取值，检查是否有 1 个 ID 对应多个姓名的情况

总共有 2114 名球员，有 11 个球员名字拥有两名球员

In [86]:
# category 类型，缺失值不在计算之列
print(shotDF.PLAYER_ID.nunique())
print(shotDF.PLAYER_NAME.nunique())

2114
2103


In [9]:
import numpy as np

df2 = (shotDF[['PLAYER_ID', 'PLAYER_NAME']]
.drop_duplicates()
.sort_values(by='PLAYER_NAME')
.assign(dup=lambda df: np.append(np.array(df.PLAYER_NAME[1:])==np.array(df.PLAYER_NAME[:-1]),False))
)

df2[df2.dup==True]

Unnamed: 0,PLAYER_ID,PLAYER_NAME,dup
3669903,1520,Charles Smith,True
2169012,203187,Chris Johnson,True
4412100,202874,Chris Wright,True
546060,200793,Dee Brown,True
3412482,779,Glen Rice,True
4344701,200766,Marcus Williams,True
2090924,2229,Mike James,True
1228975,121,Patrick Ewing,True
4365985,199,Reggie Williams,True
3713020,120,Steven Smith,True


### 了解 TEAM_ID 和 TEAM_NAME

In [53]:
exam_value(shotDF, 'TEAM_ID')


数据框列：TEAM_ID
去重后元素个数：30
取值情况：
[1610612737 1610612738 1610612739 1610612740 1610612741 1610612742
 1610612743 1610612744 1610612745 1610612746 1610612747 1610612748
 1610612749 1610612750 1610612751 1610612752 1610612753 1610612754
 1610612755 1610612756 1610612757 1610612758 1610612759 1610612760
 1610612761 1610612762 1610612763 1610612764 1610612765 1610612766]


In [31]:
shotDF.TEAM_NAME.unique()

[Vancouver Grizzlies, Sacramento Kings, Denver Nuggets, Dallas Mavericks, Orlando Magic, ..., Boston Celtics, Seattle SuperSonics, Washington Bullets, New Orleans/Oklahoma City Hornets, Los Angeles Lakers]
Length: 38
Categories (38, object): [Vancouver Grizzlies, Sacramento Kings, Denver Nuggets, Dallas Mavericks, ..., Seattle SuperSonics, Washington Bullets, New Orleans/Oklahoma City Hornets, Los Angeles Lakers]

In [15]:
shotDF[['TEAM_ID', 'TEAM_NAME']].drop_duplicates().sort_values(by='TEAM_ID')

Unnamed: 0,TEAM_ID,TEAM_NAME
4449,1610612737,Atlanta Hawks
51042,1610612738,Boston Celtics
21362,1610612739,Cleveland Cavaliers
48833,1610612740,New Orleans Hornets
29059,1610612740,New Orleans Pelicans
94603,1610612740,New Orleans/Oklahoma City Hornets
49583,1610612741,Chicago Bulls
1699,1610612742,Dallas Mavericks
1443,1610612743,Denver Nuggets
21398,1610612744,Golden State Warriors


### 检查 PERIOD，MINUTES_REMAING，SECONDS_REMAING 是否符合常识

In [38]:
np.sort(np.array(shotDF.PERIOD.drop_duplicates()))

array([1, 2, 3, 4, 5, 6, 7, 8], dtype=int64)

In [27]:
np.sort(np.array(shotDF.MINUTES_REMAINING.drop_duplicates()))

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12], dtype=int64)

In [26]:
np.sort(np.array(shotDF.SECONDS_REMAINING.drop_duplicates()))

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
       34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
       51, 52, 53, 54, 55, 56, 57, 58, 59], dtype=int64)

In [18]:
import numpy as np
nadict = {}
for col in shotDF.columns:
    shotDFcol = shotDF[col]
    if any(shotDFcol.isnull()):
        indexList = shotDFcol[shotDFcol.isnull()].index.tolist()
        nadict[col] = indexList
        print('{0} 列存在缺失，所在行索引为 {1}'.format(col, indexList))

nadict
#shotDF.at[2870012, 'SHOT_TYPE'] = '2PT Field Goal'

PLAYER_NAME 列存在缺失，所在行索引为 [805770, 3656898, 3656899, 3656900, 3656901]
SHOT_TYPE 列存在缺失，所在行索引为 [2870012]


{'PLAYER_NAME': [805770, 3656898, 3656899, 3656900, 3656901],
 'SHOT_TYPE': [2870012]}

In [30]:
idInfo = shotDF[['PLAYER_ID', 'PLAYER_NAME']].drop_duplicates().dropna(how='any')
for i in nadict['PLAYER_NAME']:
    shotDF.at[i, 'PLAYER_NAME'] = idInfo[idInfo.PLAYER_ID==shotDF.at[i, 'PLAYER_ID']].PLAYER_NAME

Unnamed: 0,PLAYER_ID,PLAYER_NAME
0,51,Mahmoud Abdul-Rauf
1443,1505,Tariq Abdul-Wahad
3169,949,Shareef Abdur-Rahim
14685,203518,Alex Abrines
15473,101165,Alex Acker
15565,203112,Quincy Acy
16878,200801,Hassan Adams
17026,1629121,Jaylen Adams
17136,203919,Jordan Adams
17228,203500,Steven Adams


In [17]:
playerID_na = shotDF.iloc[[805770, 3656898, 3656899, 3656900, 3656901], 3].unique().tolist()
playerID_na
for i in playerID_na:
    print(shotDF[shotDF.PLAYER_ID==i]['PLAYER_NAME'].unique())

[Bimbo Coles, NaN]
Categories (1, object): [Bimbo Coles]
[Lionel Simmons, NaN]
Categories (1, object): [Lionel Simmons]


In [9]:
shotDF.PERIOD.unique()

[3, 2, 4, 1, 5, 6, 7, 8]
Categories (8, int64): [3, 2, 4, 1, 5, 6, 7, 8]

In [16]:
list(shotDF.ACTION_TYPE.unique())

['Jump Shot',
 'Running Jump Shot',
 'Layup Shot',
 'Turnaround Jump Shot',
 'Driving Layup Shot',
 'Tip Shot',
 'Slam Dunk Shot',
 'Dunk Shot',
 'Driving Dunk Shot',
 'Finger Roll Shot',
 'Hook Shot',
 'Reverse Layup Shot',
 'Jump Hook Shot',
 'Fadeaway Jump Shot',
 'Alley Oop Layup shot',
 'Alley Oop Dunk Shot',
 'Driving Finger Roll Shot',
 'Running Dunk Shot',
 'Turnaround Hook Shot',
 'Running Layup Shot',
 'Reverse Dunk Shot',
 'Driving Hook Shot',
 'Running Hook Shot',
 'Follow Up Dunk Shot',
 'Jump Bank Shot',
 'Running Finger Roll Shot',
 'Turnaround Fadeaway shot',
 'Step Back Jump shot',
 'No Shot',
 'Floating Jump shot',
 'Pullup Jump shot',
 'Driving Floating Jump Shot',
 'Tip Layup Shot',
 'Cutting Layup Shot',
 'Driving Finger Roll Layup Shot',
 'Driving Reverse Layup Shot',
 'Pullup Bank shot',
 'Running Pull-Up Jump Shot',
 'Putback Layup Shot',
 'Cutting Dunk Shot',
 'Running Reverse Layup Shot',
 'Driving Floating Bank Jump Shot',
 'Putback Dunk Shot',
 'Driving Slam

In [11]:
shotDF.SHOT_TYPE.unique()

[2PT Field Goal, 3PT Field Goal, NaN]
Categories (2, object): [2PT Field Goal, 3PT Field Goal]

In [12]:
shotDF.SHOT_ZONE_AREA.unique()

[Center(C), Right Side(R), Left Side Center(LC), Right Side Center(RC), Left Side(L), Back Court(BC)]
Categories (6, object): [Center(C), Right Side(R), Left Side Center(LC), Right Side Center(RC), Left Side(L), Back Court(BC)]

In [13]:
shotDF.SHOT_ZONE_RANGE.unique()

[16-24 ft., 8-16 ft., Less Than 8 ft., 24+ ft., Back Court Shot]
Categories (5, object): [16-24 ft., 8-16 ft., Less Than 8 ft., 24+ ft., Back Court Shot]

In [14]:
shotDF.SHOT_DISTANCE.unique()

[20, 9, 18, 16, 15, ..., 84, 72, 87, 89, 88]
Length: 90
Categories (90, int64): [20, 9, 18, 16, ..., 72, 87, 89, 88]

In [15]:
shotDF.HTM.unique()

[VAN, DET, NYK, PHX, DEN, ..., NOK, CHA, OKC, NOP, BKN]
Length: 36
Categories (36, object): [VAN, DET, NYK, PHX, ..., CHA, OKC, NOP, BKN]

In [16]:
shotDF.VTM.unique()

[ATL, DAL, VAN, HOU, GSW, ..., CHA, NOK, OKC, BKN, NOP]
Length: 36
Categories (36, object): [ATL, DAL, VAN, HOU, ..., NOK, OKC, BKN, NOP]

In [18]:
shotDF.LOC_X.unique()

[64, 1, 163, -111, 150, ..., -248, -250, -247, -249, 249]
Length: 501
Categories (501, int64): [64, 1, 163, -111, ..., -250, -247, -249, 249]

In [17]:
shotDF.LOC_Y.unique()

[200, 97, 82, 127, 49, ..., 860, 881, 862, 849, 871]
Length: 917
Categories (917, int64): [200, 97, 82, 127, ..., 881, 862, 849, 871]

# 3 数据分析

## 3.1 NBA 球员投篮选择的变化

In [19]:
shotDF.GAME_DATE.unique()

[20001106, 20001118, 20001124, 20001127, 20001211, ..., 20150212, 20061225, 19961225, 20031102, 19971231]
Length: 3646
Categories (3646, int64): [20001106, 20001118, 20001124, 20001127, ..., 20061225, 19961225, 20031102, 19971231]

In [20]:
shotDF.SHOT_ZONE_BASIC.unique()

[Mid-Range, In The Paint (Non-RA), Restricted Area, Right Corner 3, Above the Break 3, Left Corner 3, Backcourt]
Categories (7, object): [Mid-Range, In The Paint (Non-RA), Restricted Area, Right Corner 3, Above the Break 3, Left Corner 3, Backcourt]

In [24]:
def proc_date(x):
    strx = str(x)
    year, month = strx[0:4], strx[4:6]
    if int(month) < 9:
        return str(int(year) - 1) + '-' + year[2:]
    else:
        return year + '-' + str(int(year) + 1)[2:]

def proc_zone(x):
    if x == 'Mid-Range':
        return '2 PT (Mid-Range)'
    elif x == 'In The Paint (Non-RA)' or x == 'Restricted Area':
        return '2 PT (Paint)'
    else:
        return '3 PT'

shotDF = (shotDF.assign(SEASON=lambda df: df.GAME_DATE.apply(proc_date),
                        SHOT_ZONE=lambda df: df.SHOT_ZONE_BASIC.apply(proc_zone)
                       )
                .pipe(convert_df))

In [25]:
shotDF.dtypes

GRID_TYPE              category
GAME_ID                category
GAME_EVENT_ID          category
PLAYER_ID              category
PLAYER_NAME            category
TEAM_ID                category
TEAM_NAME              category
PERIOD                 category
MINUTES_REMAINING      category
SECONDS_REMAINING      category
EVENT_TYPE             category
ACTION_TYPE            category
SHOT_TYPE              category
SHOT_ZONE_BASIC        category
SHOT_ZONE_AREA         category
SHOT_ZONE_RANGE        category
SHOT_DISTANCE          category
LOC_X                  category
LOC_Y                  category
SHOT_ATTEMPTED_FLAG    category
SHOT_MADE_FLAG         category
GAME_DATE              category
HTM                    category
VTM                    category
SEASON                 category
SHOT_ZONE              category
dtype: object

In [26]:
shotDF2_crosstab = pd.crosstab(shotDF2.SEASON, shotDF2.SHOT_ZONE).apply(lambda _row: _row/sum(_row),1)
shotDF2_crosstab

SHOT_ZONE,2 PT (Mid-Range),2 PT (Paint),3 PT
SEASON,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1996-97,0.399253,0.492887,0.10786
1997-98,0.368495,0.473673,0.157833
1998-99,0.370688,0.46138,0.167932
1999-00,0.382379,0.451204,0.166417
2000-01,0.379246,0.450846,0.169908
2001-02,0.371222,0.44762,0.181158
2002-03,0.369363,0.449245,0.181392
2003-04,0.356459,0.456753,0.186788
2004-05,0.354376,0.449946,0.195678
2005-06,0.344422,0.453513,0.202065


In [27]:
import pyecharts.options as opts
from pyecharts.faker import Faker
from pyecharts.charts import Line

roundV = np.vectorize(round)
datax = list(map(lambda x: str(x)[2:7], shotDF2_crosstab.index.tolist()))
datay = roundV(shotDF2_crosstab.values.T, 4)

c = (
     Line()
     .add_xaxis(datax)
     .add_yaxis("2 分（中投）", datay[0].tolist())
     .add_yaxis("2 分（禁区）", datay[1].tolist())
     .add_yaxis("3 分", datay[2].tolist())
     .set_series_opts(
        label_opts=opts.LabelOpts(is_show=False)
     )
     .set_global_opts(
         title_opts=opts.TitleOpts(title="NBA 球员投篮选择的变化"),
         xaxis_opts=opts.AxisOpts(
                                  axistick_opts=opts.AxisTickOpts(is_align_with_label=True),
                                  axislabel_opts=opts.LabelOpts(rotate=45, font_size=12, margin=14)
         ),
     )
)

c.render_notebook()

In [1]:
from matplotlib import pyplot as plt
from matplotlib.patches import Arc, Circle, Rectangle, Polygon
import numpy as np

In [2]:
def Arc_fill(center, radius, theta1, theta2, resolution=50, **kwargs):
    # generate the points
    theta = np.linspace(np.radians(theta1), np.radians(theta2), resolution)
    points = np.vstack((radius*np.cos(theta) + center[0], 
                        radius*np.sin(theta) + center[1]))
    # build the polygon and add it to the axes
    poly = Polygon(points.T, closed=True, **kwargs)
    return poly

def draw_ball_field(color='#003370', lw=2):
    # 新建一个大小为(6,6)的绘图窗口
    plt.figure(figsize=(5.36, 5.06), frameon=False)
    # 获得当前的Axes对象ax,进行绘图
    ax = plt.gca(frame_on=False)
    # 设置坐标轴范围
    ax.set_xlim(-268, 268)
    ax.set_ylim(440.5, -65.5)
    # 消除坐标轴刻度
    ax.set_xticks([])
    ax.set_yticks([])
    # 添加备注信息
    # plt.annotate('By xiao F', xy=(100, 160), xytext=(178, 418))
    # 对篮球场进行底色填充
    lines_outer_rec = Rectangle(xy=(-268, -65.5), width=536, height=506,
                                color='#f1f1f1', fill=True, zorder=0)
    # 设置篮球场填充图层为最底层
    # lines_outer_rec.set_zorder(0)
    # 将rec添加进ax
    ax.add_patch(lines_outer_rec)
    # 绘制篮筐,半径为7.5
    circle_ball = Circle(xy=(0, 0), radius=7.5, linewidth=lw, color=color,
                         fill=False, zorder=4)
    # 将circle添加进ax
    ax.add_patch(circle_ball)
    # 绘制限制区
    restricted_arc = Arc(xy=(0, 0), width=80, height=80, theta1=0,
                         theta2=180, linewidth=lw, color=color, 
                         fill=False, zorder=4)
    ax.add_patch(restricted_arc)
    # 绘制篮板,尺寸为(60,1)
    plate = Rectangle(xy=(-30, -7.5), width=60, height=-1, linewidth=lw,
                      color=color, fill=False, zorder=4)
    # 将rec添加进ax
    ax.add_patch(plate)
    # 绘制2分区的外框线,尺寸为(160,190)
    outer_rec_fill = Rectangle(xy=(-80, -47.5), width=160, height=190,
                               linewidth=lw, color="#fefefe", fill=True, zorder=2)
    outer_rec = Rectangle(xy=(-80, -47.5), width=160, height=190,
                          linewidth=lw, color=color, fill=False, zorder=4)
    # 将rec添加进ax
    ax.add_patch(outer_rec_fill)
    ax.add_patch(outer_rec)
    # 绘制罚球站位点
    lane_space_left1 = Rectangle(xy=(-90, 20.5), width=10, height=0,
                                 linewidth=lw, color=color,
                                 fill=False, zorder=4)
    lane_space_left2 = Rectangle(xy=(-90, 30.5), width=10, height=0,
                                 linewidth=lw, color=color,
                                 fill=False, zorder=4)
    lane_space_left3 = Rectangle(xy=(-90, 60.5), width=10, height=0,
                                 linewidth=lw, color=color,
                                 fill=False, zorder=4)
    lane_space_left4 = Rectangle(xy=(-90, 90.5), width=10, height=0,
                                 linewidth=lw, color=color,
                                 fill=False, zorder=4)
    lane_space_right1 = Rectangle(xy=(80, 20.5), width=10, height=0,
                                 linewidth=lw, color=color,
                                 fill=False, zorder=4)
    lane_space_right2 = Rectangle(xy=(80, 30.5), width=10, height=0,
                                 linewidth=lw, color=color,
                                 fill=False, zorder=4)
    lane_space_right3 = Rectangle(xy=(80, 60.5), width=10, height=0,
                                 linewidth=lw, color=color,
                                 fill=False, zorder=4)
    lane_space_right4 = Rectangle(xy=(80, 90.5), width=10, height=0,
                                 linewidth=lw, color=color,
                                 fill=False, zorder=4)
    ax.add_patch(lane_space_left1)
    ax.add_patch(lane_space_left2)
    ax.add_patch(lane_space_left3)
    ax.add_patch(lane_space_left4)
    ax.add_patch(lane_space_right1)
    ax.add_patch(lane_space_right2)
    ax.add_patch(lane_space_right3)
    ax.add_patch(lane_space_right4)
    # 绘制罚球区域圆圈,半径为60
    circle_punish1 = Arc(xy=(0, 142.5), width=120, height=120, theta1=0,
                         theta2=180, linewidth=lw, color=color, 
                         fill=False, zorder=4)
    circle_punish2 = Arc(xy=(0, 142.5), width=120, height=120, theta1=180,
                         theta2=360, linewidth=lw, linestyle='--', 
                         color=color, fill=False, zorder=4)
    # circle_punish = Circle(xy=(0, 142.5), radius=60, linewidth=lw,
    #                       color=color, fill=False)
    # 将circle添加进ax
    ax.add_patch(circle_punish1)
    ax.add_patch(circle_punish2)
    # 绘制低位防守区域标志线
    hash_marks_left1 = Rectangle(xy=(-110, -47.5), width=0, height=5,
                                linewidth=lw, color=color,
                                fill=False, zorder=4)
    hash_marks_right1 = Rectangle(xy=(110, -47.5), width=0, height=5,
                                 linewidth=lw, color=color,
                                 fill=False, zorder=4)
    hash_marks_left2 = Rectangle(xy=(-50, 82.5), width=5, height=0,
                                linewidth=lw, color=color,
                                fill=False, zorder=4)
    hash_marks_right2 = Rectangle(xy=(45, 82.5), width=5, height=0,
                                 linewidth=lw, color=color,
                                 fill=False, zorder=4)
    ax.add_patch(hash_marks_left1)
    ax.add_patch(hash_marks_right1)
    ax.add_patch(hash_marks_left2)
    ax.add_patch(hash_marks_right2)
    # 绘制三分线的左边线
    three_left_rec_fill = Rectangle(xy=(-220, -47.5), width=440, height=140,
                                    ec="#dfdfdf", fc="#dfdfdf", 
                                    fill=True, zorder=1)
    three_left_rec = Rectangle(xy=(-220, -47.5), width=0, height=140,
                               linewidth=lw, color=color, fill=False, zorder=4)
    # 将rec添加进ax
    ax.add_patch(three_left_rec_fill)
    ax.add_patch(three_left_rec)
    # 绘制三分线的右边线
    three_right_rec = Rectangle(xy=(220, -47.5), width=0, height=140,
                                linewidth=lw, color=color, 
                                fill=False, zorder=4)
    # 将rec添加进ax
    ax.add_patch(three_right_rec)
    # 绘制三分线的圆弧,圆心为(0,0),半径为238.66,起始角度为22.8,结束角度为157.2
    three_arc_fill = Arc_fill(center=(0, 0), radius=238.66, theta1=22.8, 
                              theta2=157.2, resolution=50, linewidth=0,
                              ec="#dfdfdf", fc="#dfdfdf", fill=True, zorder=1)
    three_arc = Arc(xy=(0, 0), width=477.32, height=477.32, theta1=22.8,
                    theta2=157.2, linewidth=lw, color=color,
                    fill=False, zorder=4)
    # 将arc添加进ax
    ax.add_patch(three_arc_fill)
    ax.add_patch(three_arc)
    # 绘制中场标记线
    midcourt_area_marker_left = Rectangle(xy=(-250, 232.5), width=30, height=0,
                                          color=color, linewidth=lw, 
                                          fill=False, zorder=4)
    midcourt_area_marker_right = Rectangle(xy=(220, 232.5), width=30, height=0,
                                           color=color, linewidth=lw,
                                           fill=False, zorder=4)
    ax.add_patch(midcourt_area_marker_left)
    ax.add_patch(midcourt_area_marker_right)
    # 绘制中场处的外半圆,半径为60
    center_outer_arc = Arc(xy=(0, 422.5), width=120, height=120, theta1=180,
                           theta2=0, linewidth=lw, color=color,
                           fill=False, zorder=4)
    # 将arc添加进ax
    ax.add_patch(center_outer_arc)
    # 绘制中场处的内半圆,半径为20
    center_inner_arc = Arc(xy=(0, 422.5), width=40, height=40, theta1=180,
                           theta2=0, linewidth=lw, color=color,
                           fill=False, zorder=4)
    # 将arc添加进ax
    ax.add_patch(center_inner_arc)
    # 绘制篮球场外框线,尺寸为(500,470)
    lines_outer_rec = Rectangle(xy=(-250, -47.5), width=500, height=470,
                                linewidth=lw, color=color,
                                fill=False, zorder=4)
    # 将rec添加进ax
    ax.add_patch(lines_outer_rec)
    return ax