# Exploring physics with python and jupyter notebook
### Lesson 3, importing and manipulating data
## Data from an experiment

One of the main uses of python is to view, manipulate and analyze data. 
In order to work with data, we must first import it into variables. 
In this lesson we will learn how to enter data directly or import it from external files. In particular, we will learn how to import data from the video analysis application [Tracker](http://physlets.org/tracker/), from Microsoft Excel and from text files. 

First upload the pylab libraries by running the next code cell.

In [None]:
%pylab

### Entering data by hand
Perhaps the simplest way to enter data into jupyter notebook is to type it manually.
Consider the following table of measured times and positions of a body moving on a straight line. 

| t[s]| x[m]|
| ----| ----|
| 0.1 | 1.1 |
| 0.2 | 2.3 |
| 0.3 | 4.5 |
| 0.4 | 7.9 |
| 0.5 |12.3 |

We can create variables, say with names similar to the headers for convenience, and enter the data by hand by using the array command as done in the next code cell.


In [None]:
t=array([0.1,0.2,0.3,0.4,0.5])
x=array([1.1,2.3,4.5,7.9,12.3])

Now we can plot the data, or perform other calculations on it.
The following cell plots the data.

In [None]:
print('t=',t)
print('x=',x)
plot(t,x,'.')
xlim(0,0.55)
ylim(0,13)
xlabel('t[s]')
ylabel('x[m]')

While the above way is simple and easy to understand, it may take a long time if there are many values to enter.
A slightly better way is to select each column and use copy-paste. If you copy each column and paste it you get:

In [None]:
0.1
0.2
0.3
0.4
0.5

1.1
2.3
4.5
7.9
12.3


Now, in order to enter these columns as arrays into variables, we can manually add the array commands, square brackets and commas as follows:

In [None]:
t=array([     
0.1,
0.2,
0.3,
0.4,
0.5
])

x=array([
1.1,
2.3,
4.5,
7.9,
12.3
        ])

print('t=',t)
print('x=',x)


Note that we added commas after each value except for the last one.

This is a good time to point out that the command 'array' converts the list of numbers surrounded by square brackets to an array.
In python, a list is a basic type that can contain, well, a list of values.
An array is not a basic type (which is why we must run the %pylab command to import it), and it differs from list in a few important aspects:

1. Arrays can only include numerical values, lists can include multiple values of other types.
2. Arithmetic operations can be performed easily on arrays, but are not defined for lists (because lists may contain values that are not numerical).
3. Arrays, once created, have a fixed size, whereas additional elements can be added to lists.

When using python to perform scientific calculations we will almost always prefer arrays to lists, as we did thus far.

#### Exercise 1
In the following code cell, assign the series of numbers of the following two columns to two variables, 't' and 'v'.
Plot a graph of the values of v vs. the values of t.

|t[s] |  
| --  |
| 10  |
| 15  |
| 25  |
| 40  |

|v[m/s]|
| --   |
| 12.5 |
| 16.5 |
| 20   |
| 23.5 |

### Importing small data sets from the Tracker software
#### Exercise 2
Download the video from [this link](https://anaconda.org/explorephysics/data-1/1/download/BallTossOut.mov) to your computer and open it using the Tracker software (this video is part of the sample videos included with the tracker software).
Follow these steps to create a series of data points tracking the ball location in the first four frames of its motion, after leaving the thrower's hand:
* Set the scale of the video by choosing 'Track -> New -> Calibration Tools -> Calibration Stick'.
Drag the edges of the blue line to overlap the edges of the lighted vertical bar of the video. Note that the ticks on the lighted bar are spaced at 10[cm] intervals. Adjust the length of the blue line to be 1 meter long. Update the field 'calibration stick A length' that appears above the video window to a value of 1, ensuring displacement data is in meters.
* create a new tracked object by clicking the 'Create' botton.
* Find the time in the video from which you want to track the movement of the ball (the first frame at which the ball does not touch the hand of the thrower) by using the slider bar and arrows below the video.
* Mark the location of the object by pressing 'shift' (changing the shape of the mouse pointer) and then clicking on the center of the ball.
* Every time you click, the Tracker software will mark the location of the object and will automatically display the next frame of the video, allowing you to mark the location of the object in the new frame. Proceed to mark the location of the object up to the time point at which you wish to finish. For this exercise, 4 frames will suffice, but for the next exercises you will need to mark the location of the object in all the frames up to the time where it leaves the recorded area.
* Save the file as it will be needed in the next exercises.
* Note that in the lower right corner of the Tracker window there is now a table showing the values of the times and locations in the x and y axes of the ball.
* Import the first four time values to an array named 't', the first four location values on the x axis to an array named 'x', and the first four y values to a variable named 'y'.

It is convenient to measure times with respect to the first time measurement.
To do so, we create a new array, 'tc' and assign to it the time differences between each time measurement and the first time measurement:

In [None]:
tc=t-t[0]
print(tc)

#### Exercise 3
In the next cell we plot the location of the ball in the x axis versus the time.
Given that the velocity of the ball in the x axis is constant, we expect the points to lie on a straight line of the form:
$$ x(t)=x_0+v_xt$$
In the code we assign the expected location of the ball, given the parameters x0 and vx, to the array xtheory.
Guess or calculate the parameters x0 and vx and assign them.
Execute the code and look at the plot of the straight line on the graph.
Check whether the line approximately matches the ball location.
Modify the values of x0 and vx to get a good match between the line and the points denoting the real locations of the ball.

In [None]:
x0=?
vx=?

xtheory=x0+vx*tc

plot(tc,x,'o')
plot(tc,xtheory)

Repeat the same process for the locations of the ball in the y axis in the next code cell.
Assume that during this time period the velocity is constant.
Can you get a good match between the line and the points in this case as well?

In [None]:
y0=?
vy0=?
ytheory=y0+vy0*tc

plot(tc,y,'o')
plot(tc,ytheory)

Obviously, using cut-and-paste to import data is not scaleable and inefficient when we have many columns of data in a table, or when a table contains many rows.
To efficiently import such data sets we can use python's data analysis library, 'pandas'.

### Method 2. Importing data sets from files
To use the pandas library we must first load (or import) it.
Run the next code cell to do so.

In [None]:
from pandas import *

The pandas library contains commands to load .csv (comma-seperated-values) files and MicroSoft Excel (or any of its free alternatives) files.
To create a .csv file using the Tracker program you must do the following steps:
* If you closed the Tracker program at the end of the previous exercise, start it again now and open the file you saved.
* Make sure that the location points you marked appear in the table in the lower right corner of the program window.
* In the menu select 'File->Export->Data file'.
* Make sure the 'Delimiter' field is set to 'Comma'.
* Make sure the 'Cells' field is set to 'All Cells'.
* Click 'Save As' and save the file to the folder at which you keep your jupyter notebook files.

A file generated in this way is available in [this link](https://anaconda.org/explorephysics/data-1/1/download/ball.csv).
If you did not generate your own data file, download this file and save it to the folder at which you keep your jupyter notebook files.
We named the generated file 'ball.csv'.
You can open this file using any text-editing program, such as notepad, to see its content.
The next code cell loads the file to a variable named 'balldata' using the command 'read_csv'.
The parameter 'header=1' directs pandas to ignore the first line in the file (which is referred to as line number 0) as the Tracker program stores in this line the name of the object that was tracked (in our case - 'mass_A').

A complete description of the available options to the read_csv command can be found [here](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html).

After loading the data, the code prints the variable the data was assigned to.
Execute the code now.

In [None]:
balldata=read_csv('ball.csv',header=1)
print(balldata)

As you can see, the read_csv command returnes a new kind of object, a table (called 'DataFrame' in pandas).
To assign different columns from a table to an array we must run the following code:

In [None]:
t=array(balldata['t'])
print(t)

In the code, the selection of the column from the DataFrame 'balldata' that will be assigned to the variable 't' is done using the syntax ['t'].
Using this syntax one can select any of the columns of balldata.

In order to see the list of all column names in a table, we can print the 'columns' field of the table, as is demonstrated in the following code cell.

In [None]:
print(balldata.columns)

We will again measure time with respect to the first time measurement:

In [None]:
tc=t-t[0]

In the following code cell we assign the values of the column 'x' in balldata to a varaible named 'x'.
We then plot a graph of points indicating the location of the ball on the x axis of motion versus time.
Execute the code cell now.

In [None]:
x=array(balldata['x'])
plot(tc,x,'.')

#### Exercise 4
Use the values you calculated in exercise 3 for the parameters x0 and vx in the equation of the line in the following code cell.
Is the good fit we got for the first 4 locations preserved also for the following locations?

In [None]:
xtheory=x0+vx*tc
plot(tc,x,'o')
plot(tc,xtheory)

#### Exercise 5
Assign the locations of the ball in the y axis, stored in the 'y' column of balldata, to a variable names 'y'.
Plot a graph of the location of the ball in the y axis versus time.

#### Exercise 6
Noting that the expected motion of the ball along the y axis should follow the laws of motion with constant acceleration, 
use the values of y0 and vy0 that you calculated based on the first 4 locations of the ball in the y axis and calculate the expected location of the ball during its motion.
Compare the prediction to the measured location using the following code cell.

In [None]:
a=-9.8
ytheory=y0+vy0*tc+a*tc**2/2

plot(tc,y,'.')
plot(tc,ytheory)

Now, try to update the values of y0 and vy0 to get a better match  using the following code cell.
Does the acceleration match the one expeced for a free body fall on earth?

In [None]:
y0=?
vy0=?
a=-9.8
ytheory=y0+vy0*tc+a*tc**2/2

plot(tc,y,'.')
plot(tc,ytheory)

#### Exercise 7 - bonus task
Plot a graph of the location of the ball in the y axis versus its location on the x axis

<div style="direction:rtl">
<h4>
משימה 7 - משימת בונוס
</h4>
כעת שרטטו את גרף המיקום של הכדור בציר y כנגד מיקומו בציר x.
נסו להתאים פרבולה למסלול תנועת הכדור.
מהם המקדמים שקיבלתם?
מה משמעותם הפיסיקלית?
האם זה מתאים לתוצאות מהמשימות הקודמות?
</div>

<div style="direction:rtl">
<h3>
ייבוא נתונים מקבצי אקסל
</h3>
קיימות שתי אפשרויות לייבא טבלאות מקבצי אקסל.
<ol>
<li>
לייצא את הטבלא מתוכנת האקסל לקובץ csv ולהשתמש באותה השיטה בה עשינו שימוש לעיל.
במקרה זה יש לקבוע את הפרמטר header כך שיורה לפקודה read_csv מהי השורה בקובץ המכילה את כותרות התאים.
<br>
יש לוודא בעת שמירת הקובץ באקסל שהשדות השונים מופרדים באמצעות פסיקים ולא באמצעות תו אחר.
ניתן לצפות בקובץ שנוצר באמצעות notepad על מנת לוודא שתוכנו תואם לציפיות.
</li>
<li>
ספריית pandas מכילה פונקציה בשם read_excel אשר מאפשרת קריאה ישירה של קבצי אקסל.
פקודה זו מקבלת את שם הקובץ, ומאפשרת קריאה של חלק מוגדר מתוך הקובץ (גליון מסוים, טווח שורות מסוים וכו').
תאור מלא של אפשרויות השימוש בפקודה זו מצוי
<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_excel.html">
בקישור הזה
</a>
.
</li>
</ol>
</div>

<div style="direction:rtl">
<h3>
סיכום היחידה:
</h3>
<ul>
<li>
ניתן לייבא מספר קטן של נתונים באמצעות גזירה והדבקה של טקסט.
</li>
<li>
בשיטה זו חשוב להוסיף פסיקים בין הנתונים ידנית ולהקיף את כל סדרת הנתונים בסוגריים מרובעים, [].
הפקודה array מציינת לפייתון שעליו להתייחס לרשימת הערכים כסדרה.
</li>
<li>
על מנת לייבא כמות גדולה יותר של נתונים יש להשתמש בספריית pandas.
טעינת הספרייה מתבצעת באמצעות הפקודה:
* from pandas import 
</li>
<li>
ספריית pandas מכילה את הפקודה read_csv לטעינת נתונים מקובץ טקסט.
אם קובץ הטקסט יוצר על ידי תוכנת Tracker יש לציין בפקודה header=1 על מנת לדלג על השורה הראשונה בקובץ.
</li>
<li>
פקודת read_csv מחזירה משתנה מסוג חדש - טבלא.
</li>
<li>
שמות העמודות בטבלא בשם abc נתונות בשדה columns.
ניתן להדפיסם באמצעות הפקודה
(print(abc.columns 
</li>
<li>
על מנת ליצור סדרה בשם efg מטור הערכים שבעמודה בשם foo בטבלא abc יש לבצע את הפקודה:
(['efg=array(abc['foo
</li>
<li>
ניתן לבצע התאמה ידנית של מודל פיסיקלי לנתוני ניסוי בדרך של ניסוי וטעייה על ידי שרטוט הגרפים של נתוני הניסוי לעומת המודל והתאמת הפרמטרים של המודל להשגת חפיפה בין השניים.
</li>
</ul>
</div>

<div style="direction:rtl">
<h3>
שגיאות נפוצות והודעות השגיאה שהן גורמות
</h3>
כל אחד מהתאים הבאים מכיל שגיאה. הריצו כל תא וצפו בפלט המתקבל עבור שגיאה זו.
</div>

<div style="direction:rtl">
השמטת פסיקים בין ערכי רשימה או סדרה
</div>

In [None]:
x=[
    10,
    20
    30
]

<div style="direction:rtl">
השמטת סוגריים מרובעים מרשימה ונסיון ליצור ממנה סדרה
</div>

In [None]:
x=array(
1,2,3
)

<div style="direction:rtl">
גישה לקובץ שאינו קיים
</div>

In [None]:
x=read_csv('abc.csv')

<div style="direction:rtl">
גישה לטור שאינו קיים בטבלא
</div>

In [None]:
abc=read_csv('ball.csv',header=1)
x=abc['foo']

<div style="direction:rtl">
<h4>
מזל טוב! סיימתם יחידה זו.
</h4>
</div>