# college-football: a college football scraping Python library
This is a quick demonstration of my project, **college-football**. It is a work in progress, but currently supports gathering data about individual college football players.

This project draws upon data provided by [Sports Reference LLC]('https://www.sports-reference.com'). It uses the **requests** and **BeautifulSoup** libraries to query the website and **pandas** to create data tables.

Future work will involve structuring the repository properly for package distribution through pip, as well as creating more robust error checking and correction.

In [1]:
import college_football as cfb

First, we will create a player - let's pick Ian Book (go Irish).

In [2]:
p = cfb.Player('Ia', 'Boo', 'Notre Dam', '2017-2020')

Searching for Ia Boo (Notre Dam, 2017-2020)...
Name changed to Ian Book.
Team changed to Notre Dame, years changed to 2016-2020.
Found Ian Book (Notre Dame, 2016-2020).


Oops! We typed in some fields wrong (actually all fields). **college-football** is flexible and can utilize Sports Reference's search function to find the best fit for us and correct our parameters.

Now that we've got our player object, we can retrieve some data! Let's get some passing summary information for Ian.

In [3]:
p.get_passing_summary()

Unnamed: 0_level_0,Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Unnamed: 5_level_0,Passing,Passing,Passing,Passing,Passing,Passing,Passing,Passing,Passing
Unnamed: 0_level_1,Year,School,Conf,Class,Pos,G,Cmp,Att,Pct,Yds,Y/A,AY/A,TD,Int,Rate
0,*2017,Notre Dame,Ind,SO,QB,10.0,46,75,61.3,456,6.1,4.7,4,4,119.3
1,*2018,Notre Dame,Ind,JR,QB,10.0,214,314,68.2,2628,8.4,8.6,19,7,154.0
2,*2019,Notre Dame,Ind,SR,QB,13.0,240,399,60.2,3034,7.6,8.6,34,6,149.1
3,*2020,Notre Dame,ACC,SR,QB,12.0,228,353,64.6,2830,8.0,8.5,15,3,144.3
4,Career,Notre Dame,0,0,0,0.0,728,1141,63.8,8948,7.8,8.3,72,20,147.0


**college-football** currently supports all possible summary tables given by Sports Reference. These are listed below:
- **Passing**: `get_passing_summary()`
- **Rushing/Receiving**: `get_rushing_receiving_summary()`
- **Punting/Kicking**: `get_punting_kicking_summary()`
- **Returns**: `get_return_summary()`
- **Defense**: `get_defense_summary()`
- **Scoring**: `get_scoring_summary()`

Attempting to access tables with no records will result in an exception. Ian does not have any kickoff or punt returns, so calling this method will throw an error.

In [4]:
p.get_return_summary()

Exception: No return stats available for Ian Book.

In order to show all of the tables at once, we need a special player. Julian Edelman was an incredibly versatile quarterback at Kent State and has entries in each of these tables.

In [5]:
p2 = cfb.Player('Julian', 'Edelman', 'Kent State', '2006-2008')

Searching for Julian Edelman (Kent State, 2006-2008)...
Found Julian Edelman (Kent State, 2006-2008).


In [6]:
p2.get_rushing_receiving_summary()

Unnamed: 0_level_0,Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Unnamed: 5_level_0,Rushing,Rushing,Rushing,Rushing,Receiving,Receiving,Receiving,Receiving,Scrimmage,Scrimmage,Scrimmage,Scrimmage
Unnamed: 0_level_1,Year,School,Conf,Class,Pos,G,Att,Yds,Avg,TD,Rec,Yds,Avg,TD,Plays,Yds,Avg,TD
0,2006,Kent State,MAC,SO,QB,11.0,169,658,3.9,7,0.0,0.0,0.0,0.0,169,658,3.9,7
1,2007,Kent State,MAC,JR,QB,8.0,118,455,3.9,2,0.0,0.0,0.0,0.0,118,455,3.9,2
2,2008,Kent State,MAC,SR,QB,12.0,215,1370,6.4,13,1.0,11.0,11.0,0.0,216,1381,6.4,13
3,Career,Kent State,0,0,0,0.0,502,2483,4.9,22,1.0,11.0,11.0,0.0,503,2494,5.0,22


In [7]:
p2.get_punting_kicking_summary()

Unnamed: 0_level_0,Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Unnamed: 5_level_0,Punting,Punting,Punting,Kicking,Kicking,Kicking,Kicking,Kicking,Kicking,Kicking
Unnamed: 0_level_1,Year,School,Conf,Class,Pos,G,Punts,Yds,Avg,XPM,XPA,XP%,FGM,FGA,FG%,Pts
0,2006,Kent State,MAC,SO,QB,11.0,4.0,147.0,36.8,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,2007,Kent State,MAC,JR,QB,8.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,2008,Kent State,MAC,SR,QB,12.0,4.0,157.0,39.3,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Career,Kent State,0,0,0,0.0,8.0,304.0,38.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [8]:
p2.get_return_summary()

Unnamed: 0_level_0,Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Unnamed: 5_level_0,Punt Ret,Punt Ret,Punt Ret,Punt Ret,Kick Ret,Kick Ret,Kick Ret,Kick Ret
Unnamed: 0_level_1,Year,School,Conf,Class,Pos,G,Ret,Yds,Avg,TD,Ret,Yds,Avg,TD
0,2006,Kent State,MAC,SO,QB,11.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,2007,Kent State,MAC,JR,QB,8.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,2008,Kent State,MAC,SR,QB,12.0,6.0,25.0,4.2,0.0,0.0,0.0,0.0,0.0
3,Career,Kent State,0,0,0,0.0,6.0,25.0,4.2,0.0,0.0,0.0,0.0,0.0


In [9]:
p2.get_defense_summary()

Unnamed: 0_level_0,Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Unnamed: 5_level_0,Tackles,Tackles,Tackles,Tackles,Tackles,Def Int,Def Int,Def Int,Def Int,Def Int,Fumbles,Fumbles,Fumbles,Fumbles
Unnamed: 0_level_1,Year,School,Conf,Class,Pos,G,Solo,Ast,Tot,Loss,Sk,Int,Yds,Avg,TD,PD,FR,Yds,TD,FF
0,2006,Kent State,MAC,SO,QB,11.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,2007,Kent State,MAC,JR,QB,8.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,2008,Kent State,MAC,SR,QB,12.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Career,Kent State,0,0,0,0.0,2.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [10]:
p2.get_scoring_summary()

Unnamed: 0_level_0,Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Unnamed: 5_level_0,Touchdowns,Touchdowns,Touchdowns,Touchdowns,Touchdowns,Touchdowns,Touchdowns,Touchdowns,Kicking,Kicking,Unnamed: 16_level_0,Unnamed: 17_level_0,Unnamed: 18_level_0
Unnamed: 0_level_1,Year,School,Conf,Class,Pos,G,Rush,Rec,Int,FR,PR,KR,Oth,Tot,XPM,FGM,2PM,Sfty,Pts
0,2006,Kent State,MAC,SO,QB,11.0,7,0.0,0.0,0.0,0.0,0.0,0.0,7,0.0,0.0,0.0,0.0,42
1,2007,Kent State,MAC,JR,QB,8.0,2,0.0,0.0,0.0,0.0,0.0,0.0,2,0.0,0.0,0.0,0.0,12
2,2008,Kent State,MAC,SR,QB,12.0,13,0.0,0.0,0.0,0.0,0.0,0.0,13,0.0,0.0,1.0,0.0,80
3,Career,Kent State,0,0,0,0.0,22,0.0,0.0,0.0,0.0,0.0,0.0,22,0.0,0.0,1.0,0.0,134


We can also view game logs and season splits for specific years. Let's try this with our jack of all trades, Mr. Edelman.

In [11]:
p2.get_game_logs(2008)

Unnamed: 0_level_0,Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Unnamed: 5_level_0,Passing,Passing,Passing,Passing,...,Punt Ret,Punt Ret,Tackles,Tackles,Tackles,Tackles,Tackles,Punting,Punting,Punting
Unnamed: 0_level_1,Rk,Date,School,Unnamed: 3_level_1,Opponent,Unnamed: 5_level_1,Cmp,Att,Pct,Yds,...,Avg,TD,Solo,Ast,Tot,Loss,Sk,Punts,Yds,Avg
0,1.0,2008-08-30,Kent State,0,Boston College,L,10,14,71.4,123,...,0.0,0,0.0,0.0,0.0,0.0,0.0,1,44,44.0
1,2.0,2008-09-06,Kent State,@,Iowa State,L,10,22,45.5,171,...,0.0,0,1.0,0.0,1.0,0.0,0.0,0,0,0.0
2,3.0,2008-09-13,Kent State,0,Delaware State,W,10,20,50.0,113,...,0.0,0,0.0,0.0,0.0,0.0,0.0,0,0,0.0
3,4.0,2008-09-20,Kent State,@,Louisiana,L,10,20,50.0,81,...,0.0,0,0.0,0.0,0.0,0.0,0.0,0,0,0.0
4,5.0,2008-09-27,Kent State,@,Ball State,L,13,24,54.2,177,...,0.0,0,0.0,0.0,0.0,0.0,0.0,0,0,0.0
5,6.0,2008-10-04,Kent State,0,Akron,L,17,31,54.8,157,...,0.0,0,0.0,0.0,0.0,0.0,0.0,1,38,38.0
6,7.0,2008-10-11,Kent State,0,Ohio,L,15,25,60.0,200,...,0.0,0,0.0,0.0,0.0,0.0,0.0,0,0,0.0
7,8.0,2008-10-25,Kent State,@,Miami (OH),W,10,17,58.8,107,...,0.0,0,0.0,0.0,0.0,0.0,0.0,0,0,0.0
8,9.0,2008-11-01,Kent State,@,Bowling Green State,L,16,32,50.0,219,...,0.0,0,0.0,0.0,0.0,0.0,0.0,0,0,0.0
9,10.0,2008-11-12,Kent State,0,Temple,W,18,26,69.2,232,...,4.0,0,0.0,0.0,0.0,0.0,0.0,0,0,0.0


In [12]:
p2.get_splits(2007)

Unnamed: 0_level_0,Unnamed: 0_level_0,Unnamed: 1_level_0,Passing,Passing,Passing,Passing,Passing,Passing,Passing,Rushing,Rushing,Rushing,Rushing,Punting,Punting,Punting
Unnamed: 0_level_1,Split,Value,Cmp,Att,Pct,Yds,TD,Int,Rate,Att,Yds,Avg,TD,Punts,Yds,Avg
0,Total,0,98,189,51.9,1318,7,9,113.1,118,455,3.9,2,1,24,24.0
1,0,0,Passing,Passing,Passing,Passing,Passing,Passing,Passing,Rushing,Rushing,Rushing,Rushing,Punting,Punting,Punting
2,Split,Value,Cmp,Att,Pct,Yds,TD,Int,Rate,Att,Yds,Avg,TD,Punts,Yds,Avg
3,Place,Home,45,79,57.0,655,4,3,135.7,43,183,4.3,1,0,0,0
4,0,Road,53,110,48.2,663,3,6,96.9,75,272,3.6,1,1,24,24.0
5,0,0,Passing,Passing,Passing,Passing,Passing,Passing,Passing,Rushing,Rushing,Rushing,Rushing,Punting,Punting,Punting
6,Split,Value,Cmp,Att,Pct,Yds,TD,Int,Rate,Att,Yds,Avg,TD,Punts,Yds,Avg
7,Result,Win,47,84,56.0,597,5,3,128.2,27,65,2.4,1,0,0,0
8,0,Loss,51,105,48.6,721,2,6,101.1,91,390,4.3,1,1,24,24.0
9,0,0,Passing,Passing,Passing,Passing,Passing,Passing,Passing,Rushing,Rushing,Rushing,Rushing,Punting,Punting,Punting


If we try to use a year outside of the 2006-2008 seasons that he played in, **college-football** will raise an exception.

In [13]:
p2.get_splits(2009)

Exception: No splits available for 2009.

**college-football** is tailored for use in interactive formats like **ipython** and in Jupyter notebooks like this one. These exceptions may be frustrating in traditional scripts, but ideally they help guide the user to find the correct name, team, and years for their intended use.

I'm thinking of a player named John Smith. He played recently, so let's try this:

In [19]:
p3 = cfb.Player('John', 'Smith', 'Georgia', '2012')

Searching for John Smith (Georgia, 2012)...


Exception: Player not found. Please make sure that name, school, and years are correct.

I guess we got that wrong. Unfortunately John Smith is a pretty ambiguous name and neither the team or the year match anything from Sports Reference. How about if I knew he played for San Jose State?

In [22]:
p3 = cfb.Player('John', 'Smith', 'San Jose State', '2012')

Searching for John Smith (San Jose State, 2012)...
Year does not match any results. Choosing match by name and team.
Years changed to 2005-2006.
Found John Smith (San Jose State, 2005-2006).


There's still some bugs to be worked out. Two notable issues occur in the following situations:
- Player page does not contain number/year icon to reset school name and years played (this comes up for less well-documented players from many years ago)
- Cannot access game logs and splits for years that a transferred player was on a previous team

Here are situations that cause this:

In [23]:
p4 = cfb.Player('John', 'Smith', 'Colgate', '1970')

Searching for John Smith (Colgate, 1970)...
Year does not match any results. Choosing match by name and team.


TypeError: 'NoneType' object is not subscriptable

Justin Fields played for Georgia in 2018 before transferring to Ohio State in 2019. While **college-football** can access the summary information from his career at both schools, it cannot currently retrieve logs and splits from his year at UGA. This is functionality I intend to add in future versions.

In [26]:
p5 = cfb.Player('Justin', 'Fields', 'Ohio State', '2018-2020')
p5.get_passing_summary()

Searching for Justin Fields (Ohio State, 2018-2020)...
Years changed to 2019-2020.
Found Justin Fields (Ohio State, 2019-2020).


Unnamed: 0_level_0,Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Unnamed: 5_level_0,Passing,Passing,Passing,Passing,Passing,Passing,Passing,Passing,Passing
Unnamed: 0_level_1,Year,School,Conf,Class,Pos,G,Cmp,Att,Pct,Yds,Y/A,AY/A,TD,Int,Rate
0,*2018,Georgia,SEC,FR,QB,12.0,27,39,69.2,328,8.4,10.5,4,0,173.7
1,*2019,Ohio State,Big Ten,SO,QB,14.0,238,354,67.2,3273,9.2,11.2,41,3,181.4
2,*2020,Ohio State,Big Ten,JR,QB,8.0,158,225,70.2,2100,9.3,10.1,22,6,175.6
3,Career,Overall,0,0,0,0.0,423,618,68.4,5701,9.2,10.7,67,9,178.8
4,0,Georgia,0,0,0,0.0,27,39,69.2,328,8.4,10.5,4,0,173.7
5,0,Ohio State,0,0,0,0.0,396,579,68.4,5373,9.3,10.8,63,9,179.1


In [27]:
p5.get_game_logs(2019)

Unnamed: 0_level_0,Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Unnamed: 5_level_0,Passing,Passing,Passing,Passing,Passing,Passing,Passing,Rushing,Rushing,Rushing,Rushing
Unnamed: 0_level_1,Rk,Date,School,Unnamed: 3_level_1,Opponent,Unnamed: 5_level_1,Cmp,Att,Pct,Yds,TD,Int,Rate,Att,Yds,Avg,TD
0,1.0,2019-08-31,Ohio State,0,Florida Atlantic,W,18,25,72.0,234,4,0,203.4,12,61,5.1,1
1,2.0,2019-09-07,Ohio State,0,Cincinnati,W,20,25,80.0,224,2,0,181.7,9,42,4.7,2
2,3.0,2019-09-14,Ohio State,@,Indiana,W,14,24,58.3,199,3,0,169.2,4,11,2.8,1
3,4.0,2019-09-21,Ohio State,0,Miami (OH),W,14,21,66.7,223,4,0,218.7,9,36,4.0,2
4,5.0,2019-09-28,Ohio State,@,Nebraska,W,15,21,71.4,212,3,0,203.4,12,72,6.0,1
5,6.0,2019-10-05,Ohio State,0,Michigan State,W,17,25,68.0,206,2,1,155.6,11,61,5.5,1
6,7.0,2019-10-18,Ohio State,@,Northwestern,W,18,23,78.3,194,4,0,206.5,6,8,1.3,0
7,8.0,2019-10-26,Ohio State,0,Wisconsin,W,12,22,54.5,167,2,0,148.3,13,28,2.2,1
8,9.0,2019-11-09,Ohio State,0,Maryland,W,16,25,64.0,200,3,0,170.8,5,28,5.6,1
9,10.0,2019-11-16,Ohio State,@,Rutgers,W,15,19,78.9,305,4,0,283.3,3,30,10.0,0


In [28]:
p5.get_game_logs(2018)

Exception: No game logs available for 2018.

Congrats! You made it this far and hopefully you aren't too bored. If you're looking to analyze college football information in Python, I hope this can help.