In [None]:
import pandas
pandas.__version__

CSV is based on 
https://www.bowl.com/Open_Championships/Open_Championships_Home/Past_Results_and_History/

Load the file into Pandas

In [None]:
dframe = pandas.read_csv("bowling_stats.csv")
dframe.head()

Because Pandas assumes the first row is a header, we need to specify that there is no header

In [None]:
dframe = pandas.read_csv("bowling_stats.csv",header=None)
dframe.head()

As a check, look at the CSV content

In [None]:
!head bowling_stats.csv

_Problem_: last column contains comma and is not wrapped in double quotes.

Label the columns to make manipulation easier to understand.

In [None]:
dframe.columns=['year','city','state','count1','count2']
dframe.head()

Alternatively, we could have labeled the columns at load time using the following:

In [None]:
dframe = pandas.read_csv("bowling_stats.csv",
                            header=None,
                            names=['year','city','state','count1','count2'])

dframe.head()

Since the comma is used as delimiter, we'll need to recombine the two columns.

What about the early counts?

In [None]:
dframe.tail()

The combination of the two columns will be tricky since occasionally there is NaN

In [None]:
dframe.dtypes

combine columns 4 and 5: col_4 = col_4 * 1000 + col_5

In [None]:
dframe['count1']*1000 + dframe['count2']

What to do: If 5th column is not NaN, <BR>
    combine columns 4 and 5: col_4 = col_4 * 1000 + col_5

In [None]:
def merge_columns(row):
    if pandas.isna(row['count2']):
        return row['count1']
    else:
        return row['count1']*1000+row['count2']

Apply the function to each row

In [None]:
dframe['total']=dframe.apply(merge_columns,axis=1)

In [None]:
dframe.head()

In [None]:
dframe.tail()

Plot the total versus the year

In [None]:
import matplotlib
import matplotlib.pyplot as plt
matplotlib.__version__

In [None]:
dframe

In [None]:
dframe.plot(x='year',y='total')

In [None]:
dframe

In [None]:
dframe.dtypes

Maybe the x-axis labels broke because the 'year' wasn't numeric.

Let's force the year to be numeric:

In [None]:
dframe['year']=pandas.to_numeric(dframe['year'])

In [None]:
pandas.set_option('max_rows', 500)

In [None]:
dframe

<BR>
<BR>
<BR>
<BR>
<BR>
<BR>
<BR>
<BR>
<BR>
<BR>
<BR>
<BR>
<BR>
<BR>
<BR>
<BR>
<BR>


One solution would be to manually clean up the CSV.

Alternatively, we can drop that row and see if that solves the problem.

In [None]:
dframe=dframe.drop(73)

In [None]:
dframe['year']=pandas.to_numeric(dframe['year'])

No errors reported, so let's look at the plot

In [None]:
dframe.plot(x='year',y='total',style='o')
plt.tick_params(labelsize=14)
plt.xlabel('year',fontsize=14)
plt.ylabel('attendance',fontsize=14)
plt.show()

Just to confirm, let's check the data type of each column

In [None]:
dframe.dtypes