# Book 3: How to concatenate data frames

Here we will go over how to use ```pandas``` to read several ```.csv``` files and then **concatenate** them into a single table. This is very useful to load the results of different conditions, or biological replicates, and to **tag** these conditions so we can later use them to compare results and do statistical tests.

## Load the data
Let us use what we learned before to load the ```.csv``` table into a data frame

In [None]:
import pandas as pd 

df1 = pd.read_csv('../data/Results_01.csv')

df1.head()

We can add a new column to the data frame with information about the student that made the analysis.

```
df1['Student'] = '01'
# Now I use "sample" to get 10 random examples from the table
df1.sample(10)
```

Now we can read a second dataset, and add an extra "student" 02 tag

```
df2 = pd.read_csv('../data/Results_02.csv')
df2['Student'] = '02'
df2.head()
```

Lets see if there are small differences or not by plotting histograms of the different areas

```
val = df1["Area"].mean()
print(f"The mean value of Area for Student 1 is: {val:.3f}.")
df1.hist(column='Area')

val = df2["Area"].mean()
print(f"The mean value of Area for Student 2 is: {val:.3f}.")
df2.hist(column='Area')
```

Now instead of keeping track of different data frames is easier if we put these tables together. This operation is called concatenation.

```
# concatenate
df = pd.concat([df1, df2])
df.sample(10)
```

Now let us do a basic boxplot to see if the results of both students are significantly different from one another. Here we benefit from the **Student** tag, we can ask the boxplot to sort the results based on this column value using the ```by=``` parameter.

```
df.boxplot(column="Area",by="Student")
```

# How to save data frames into new .csv files

Now that we have concatenated all our tables into a new more pratical one we can then save this new table.

```
# writting the table
df.to_csv('../data/Results_total.csv')

```

In [None]:
#save the combined dataframe