# BACKGROUND INFORMATION

I have retrieved the Locus Germplasm Phenotype file from The Arabidiosis Information Resource (TAIR - one of the primary _Arabidopsis_ databases globally) (https://www.arabidopsis.org/download/index-auto.jsp?dir=/download_files/Polymorphisms_and_Phenotypes ).  The file contains information about the phenotypes associated with certain _Arabodopsis_ mutations.

The file is in the current folder,named: [Locus_Germplasm_Phenotype_20130122.txt](./Locus_Germplasm_Phenotype_20130122.txt)

The file format is "tab-separated values".  This means that each data field is separated by a "tab" character.  

**There is one line at the beginning of the file - the "header" - that describes the type of data in each column.**  

The first column is the "Arabiodopsis Genome Initiative" (AGI) Locus Code.  Locus Codes have a predictable structure

* The first character is "A" or "a"
* The second character is "T" or "t"
* The third character is the chromosome number (between 1 and 5)
* The fourth character is "G" or "g"
* The remaining characters are a set of 5 digits (between 0 and 9)

**For Example:  At5G22934**

The final column in the dataset is the PubMed ID for the published paper describing this phenotype.  PubMed IDs are all digits (0-9) but the length of the ID is unpredictable.


## Problem 1


* Copy the Locus_Germplasm_Phenotype_20130122.txt file into your copy of the Exams Git
* Create a new Jupyter Notebook in your Exams folder called "Exam_1_Answers" using the 'bash' kernel 
* Do the rest of this exam inside of that the "Exam_1_Answers" Notebook

## Problem 2

Create a directory listing command that shows:

* the ownership of the file
* it's file-size, in Megabytes (i.e. human readable)

then:

* in words (in a Markdown box), describe the _permissions_ on that file (read/write/execute) for users, groups, and "anyone"


In [1]:
ls -lh Locus_Germplasm_Phenotype_20130122.txt

-rw-rw-r-- 1 osboxes osboxes 1.2M Sep  5 16:26 Locus_Germplasm_Phenotype_20130122.txt


## Explanation: 

### Commands

ls list the directory contents. 

Have multiple options: 

* -l: Give the detailed information of the file, like the ownership. In our case, osboxes. 
* -h, --human-readable: with -l print sizes in human readable format. In our case, 1.2M.

### Permissions of the file

* -r: read permission
* -w: write permission
* -x: execite permission

These read, write and execute permissions are defined for: 

* user: the user that owns the file
* group: users in the files group
* other: every other user

The meaning of the ouput -rw-rw-r-- is therefore that the owner and the group have reading and writing permissions while other only have reading permissions.  

## Problem 3

* Create a command that outputs only the "header" line of Locus_Germplasm_Phenotype_20130122.txt






In [2]:
head -1 Locus_Germplasm_Phenotype_20130122.txt

Locus_name)	Germplasm_name	phenotype	pubmed_id


## Explanation

### Command 

The head command outputs the first part of a file or files. By default, prints the first 10 lines of each FILE to standard output. With more than one FILE, it precedes each set of output with a header identifying the file name. If no FILE is specified, or when FILE is specified as a dash ("-"), head reads from standard input.

Have different options: 

* -n, --lines= : Print the first n lines instead the first 10 lines. 

In our case we only want print the first line wich correspond to the header of our file. 

## Problem 4

* Create a command that outputs the total number of lines in Locus_Germplasm_Phenotype_20130122.txt



In [3]:
wc -l Locus_Germplasm_Phenotype_20130122.txt

7216 Locus_Germplasm_Phenotype_20130122.txt


## Explanation 

### Command 

The wc command is used to find out the number of newline count, word count, byte and characters counts un a file specified by the file arguments. 
It has different options, with: 

* -l: prints the numer of lines in a file. 

Therefore, the file has 7216 lines.


## Problem 5

* Create a command that writes ONLY the data lines (i.e. excludes the header!) to a new file called "Data_Only.csv"
* prove that your output file has the expected number of lines




In [4]:
sed '1d' Locus_Germplasm_Phenotype_20130122.txt > Data_Only.csv

## Explanation

### Command 

The sed command is used for make changes to file content. This command can be used to delete lines in a file. 

* The d option [sed 'Nd' file] where N indicates the number of the line, remove the line. 

Therefore, '1d' remove the first line in the file, excluding the header as asked. Now, the file should have 7215 lines.

In [5]:
wc -l Data_Only.csv


7215 Data_Only.csv


## Problem 6

* Create a command that shows all of the lines that have a phenotype including the word "root"



In [3]:

grep -i root Data_Only.csv 

# It is impossible that the word root appear in the other columns of the document, 
# so it is not necessary use grep to match only with the column 'phenotype'

AT1G01550	bps1-2	Growth of bps1-2 mutants on CPTA-supplemented medium resulted in partial rescue of both leaf and [01;31m[Kroot[m[K defects. bps1-2 mutants grown on control growth medium have small radialized leaves with very little vascular tissue, and very short misshapen knotted-looking [01;31m[Kroot[m[Ks. By contrast, bps1-2 mutants grown on CPTA-supplemented medium produced larger flattened leaves that contained primary and secondary veins, and smooth elongated [01;31m[Kroot[m[Ks.	17217459
AT1G01550	bps1-2	Mutant [01;31m[Kroot[m[Ks were abnormal; primary and lateral [01;31m[Kroot[m[Ks were short, [01;31m[Kroot[m[K hairs formed close to the [01;31m[Kroot[m[K apex, and [01;31m[Kroot[m[K defects were most severe when plants were grown at low temperature.	15458645
AT1G01550	bps1-2	Under high light conditions (approximately 200 &mu;E.m-2.sec-1), CPTA-treated seedlings were completely photobleached and mutants showed rescue of both leaf and [01;31m[Kroo

AT1G08260	tilted1-4	Twenty-five percent of embryos show abnormal cell divisions during embryogenesis-large cells- decreased rate of cell division. Division of the hypohyseal cell abnormal leading to abnormal placement of the [01;31m[Kroot[m[K pole. Homozygotes show delayed flowering, abnormal floraly phyllotaxy and abnormal ovules.	16278345
AT1G08370	dcp1-2	The homozygous progeny is seedling lethal, showed arrested postembryonic development including cotyledon expansion, development of vascular networks, [01;31m[Kroot[m[K elongation, and shoot development.	17485080
AT1G08430	Atalm1-KO	On Al3+ medium, the mutant has significantly shorter [01;31m[Kroot[m[Ks than the Col wild type. In the absence of aluminum stress, the [01;31m[Kroot[m[Ks of mutant and wild type grew similarly.	16740662
AT1G08430	Atalm1-KO	The mutant exhibits very low levels of [01;31m[Kroot[m[K  growth and malate release in the presence of  Al (0.2 nmol plant-1 24 h-1).
AT1G09100	rpt5b	Insensitive to 

AT1G19220	CS24617	no obvious auxin-related growth phenotype, but [01;31m[Kroot[m[Ks show mild auxin resistance	15659631
AT1G19220	CS24618	no obvious auxin-related growth phenotype, but [01;31m[Kroot[m[Ks show mild auxin resistance	15659631
AT1G19220	CS24625	The phenotype of the double mutant is most obvious at its seedling stage, with its most prominent phenotype being severely impaired  lateral [01;31m[Kroot[m[K formation. Its primary [01;31m[Kroot[m[Ks fail to produce lateral [01;31m[Kroot[m[Ks in 2-week-old seedlings. However, double mutant seedlings start to generate several lateral [01;31m[Kroot[m[Ks after ~2 weeks of growth, and their morphological appearance is normal.	15659631
AT1G19220	CS24625	nph4-1 arf19 double mutant;  agravitropic response in both hypocotyls and [01;31m[Kroot[m[Ks; impaired phototropic response in hypocotyls; impaired lateral [01;31m[Kroot[m[K formation; small plant size; small and epinastic rosette leaves; reduced auxin sen

AT1G23320	tar1-1	This mutant does not have any obvious morphological defects and it responds normally to ACC and IAA in hypocotyl and [01;31m[Kroot[m[K elongation assays.	18394997
AT1G23320	wei8-1 tar1-1 tar2-1	These triple mutants do not make a primary [01;31m[Kroot[m[K, they have an extremely reduced hypocotyl, and they lack discernible vasculature in their cotyledons. These mutants have a higher propensity to develop a single cotyledon than wild-type embryos.	18394997
AT1G26870	SALK_025663	Homozygotes have a reduced number of columella [01;31m[Kroot[m[K cap (COL) and lateral [01;31m[Kroot[m[K cap (LRC) cell layers. Meristem length, meristem cell number and [01;31m[Kroot[m[K length are comparable to WT. Periclinal division rates in the COL and LRC stem cells are reduced.	19081078
AT1G27320	ahk2-1cu ahk3-1cu	[01;31m[KRoot[m[K elongation unaffected.	15155880
AT1G27320	ahk2-1cu ahk3-1cu ahk4-1cu	Aborted vascular system containing few protoxylem cells in the prima

AT1G31880	brx-2	lateral [01;31m[Kroot[m[K formation in the mutant is insensitive to exogenous cytokinin; altered auxin response in lateral [01;31m[Kroot[m[K primordium in the presence of exogenous cytokinin.	19037657
AT1G31880	brx-3	enhanced response to ABA-mediated inhibition of [01;31m[Kroot[m[K growth	19201913
AT1G31880	brxS	The [01;31m[Kroot[m[Ks of brxS seedlings are as short, both when grown in the light or in darkness. In contrast to the [01;31m[Kroot[m[K system, the shoot system morphology and flowering time of brxS plants resembles the Sav-0 shoot system, which was used as the control. Cell elongation and cell production rate in the [01;31m[Kroot[m[K meristematic and elongation zone were decreased in brxS seedlings, contributing approximately one-third and two-thirds, respectively, to the overall difference in [01;31m[Kroot[m[K length as compared with Sav-0 seedlings.	15031265
AT1G32450	SALK_005099 HOMOZYGOUS	Defective in xylem transport of nitrate 

AT1G53700	wag1-2	Morphologically similar to wild type. When seedlings were grown on vertical plates, [01;31m[Kroot[m[K exhibited slight wavy pattern. Overall phenotype identical to wag1-1 mutant allele.	16460509
AT1G53700	wag1-2/wag2-1	Morphologically similar to double mutant wag1-1/wag2-1. When seedlings were grown on vertical plates, [01;31m[Kroot[m[K exhibited pronounced wavy pattern. Gravitropism was not affected in mutant plants. Double mutant plants were more resistant to inhibition of [01;31m[Kroot[m[K curling by auxin transport inhibitor NPA than wild type plants.	16460509
AT1G53940	SALK_025414C	has increased number of lateral [01;31m[Kroot[m[Ks, hypocotyls have impaired gravitropic curvature	19146828
AT1G53940	SALK_109449C	has increased number of lateral [01;31m[Kroot[m[Ks, hypocotyls have impaired gravitropic curvature	19146828
AT1G54180	SALK_017909	No visible [01;31m[Kroot[m[K phenotype.	16514016
AT1G54960	anp2 anp3	Delayed growth,short [01;31m[Kroo

AT1G62990	knat7	No visible [01;31m[Kroot[m[K phenotype.	16463096
AT1G63440	hma5-1	Accumulation of Cu in both [01;31m[Kroot[m[Ks and shoots of mutant when grown on medium containing 10 &#956;M Cu.	16367966
AT1G63440	hma5-1	Seedlings germination is totally arrested (both aerial parts and [01;31m[Kroot[m[Ks) when grown on 50 microM Cu.	16367966
AT1G63440	hma5-1	Seedlings grown on 30 &#956;M Cu display yellow cotyledons, with completely arrested [01;31m[Kroot[m[K growth.	16367966
AT1G63440	hma5-2	Accumulation of Cu in both [01;31m[Kroot[m[Ks and shoots of mutant when grown on medium containing 10 &#956;M Cu.	16367966
AT1G63440	hma5-2	Seedlings germination is totally arrested (both aerial parts and [01;31m[Kroot[m[Ks) when grown on 50 microM Cu.	16367966
AT1G63440	hma5-2	Seedlings grown on 30 &#956;M Cu display yellow cotyledons, with completely arrested [01;31m[Kroot[m[K growth.	16367966
AT1G64440	CS2257	Decrease [01;31m[Kroot[m[K length (54% of that of wild

AT1G70560	wei8-1	The hypocotyl of this mutant responds normally to ACC when grown in the dark, but, its [01;31m[Kroot[m[Ks are moderately less sensitive to ACC than wild type seedlings grown under the same conditions. This difference in sensitivity can be eliminated when the mutants are treated with ACC and low levels of IAA. wei8-1 and wild type hypocotyls and [01;31m[Kroot[m[Ks respond similarly to IAA in the absence of ACC. The [01;31m[Kroot[m[Ks of wei8-1 mutant seedlings do not display proper gravitropism. High-temperature mediated hypocotyl elongation is reduced by ~25% in these mutants compared to wild-type seedlings.	18394997
AT1G70560	wei8-1 tar1-1 tar2-1	These triple mutants do not make a primary [01;31m[Kroot[m[K, they have an extremely reduced hypocotyl, and they lack discernible vasculature in their cotyledons. These mutants have a higher propensity to develop a single cotyledon than wild-type embryos.	18394997
AT1G70560	wei8-1 tar2-1	The [01;31m[Kroot[m

AT1G80100	ahp6-2	increased number of vascular cell files with intervening procambial and phloem cell files; protoxylem differentiation occurred sporadically along the [01;31m[Kroot[m[K	16400151
AT1G80340	ga3ox1-3 ga3ox2-1	Severe defect in [01;31m[Kroot[m[K length.	16460513
AT1G80340	ga3ox2-1	[01;31m[KRoot[m[K length similar to that of wildtype.	16460513
AT2G01830	CRE1:wol-1	Wildtype [01;31m[Kroot[m[Ks.	15053761
AT2G01830	CRE1:wol-2	Wildtype [01;31m[Kroot[m[Ks.	15053761
AT2G01830	CS6563	Reduced sensitivity to cytokinin in [01;31m[Kroot[m[K growth assay (exogenous application of cytokinins inhibits wildtype [01;31m[Kroot[m[K elongation).	
AT2G01830	ahk2-1cu ahk3-1cu ahk4-1cu	Aborted vascular system containing few protoxylem cells in the primary [01;31m[Kroot[m[K. Normal adventitious [01;31m[Kroot[m[K-vascular system.	16357038
AT2G01830	ahk2-1cu ahk3-1cu ahk4-1cu	Adventitious [01;31m[Kroot[m[Ks are normal.	16357038
AT2G01830	ahk2-1cu ahk3-1cu ahk4-1

AT2G03680	CS6547	Defective in directional cell elongation processes; abnormal cortical microtubule function; exhibits right-handed helical growth in [01;31m[Kroot[m[Ks and etiolated hypocotyls; epidermal cell files of [01;31m[Kroot[m[Ks are twisted to form right-handed helices; on vertically oriented hard agar plates, [01;31m[Kroot[m[Ks grow to the right when viewed from above the agar plates; this skewed [01;31m[Kroot[m[K growth is driven by the friction between agar surface and helical epidermal cell files; phenotype is enhanced under the conditions that accelerate cell elongation, under such conditions, epidermal cells undergo isotropic cell expansion, resulting in spherically shaped cells protruding from the organ surface.	15084720
AT2G03680	spr1-6	Mutant plants exhibit altered patterns of [01;31m[Kroot[m[K and organ growth as a result of defective anisotropic cell expansion. [01;31m[KRoot[m[Ks, etiolated hypocotyls, and leaf petioles exhibit right-handed ax

AT2G24790	col3	Mutant plants have longer hypocotyls in red  light and in short days. Unlike constans, the col3 mutant flowers early and shows a reduced number of lateral branches in  short days. The mutant also exhibits reduced formation of lateral [01;31m[Kroot[m[Ks. The col3 mutation partially suppresses the cop1 and  deetiolated1 (det1) mutations in the dark	16339850
AT2G25180	CS6978	Mutant [01;31m[Kroot[m[K meristems are larger than those of wild type already at 2 dpg, but they stopped growing after reaching a fixed number of cells at 5 dpg.	17363254
AT2G25180	ahk3-3 arr12-2	The [01;31m[Kroot[m[K-meristem size of the double mutant was indistinguishable from that of the ahk3 mutant.	17363254
AT2G25180	arr1-3 arr10-5 arr12-1	severe primary [01;31m[Kroot[m[K abnormalities; premature termination of primary [01;31m[Kroot[m[K growth; substantially reduced sensitivity to cytokinin in the hypocotyl growth response assay; smaller rosette diameter; altered chlorophyll and

AT2G33880	stip-1	When grown in the absence of sugars plants exhibit growth arrest after germination. Seedling lethal. Shoots exhibit reduced sensitivity to cytokinins but [01;31m[Kroot[m[Ks appear normal.	20110319
AT2G33880	stip-2	Hypomorphic allele identified as an intragenic supressor of stip-d activation tagged allele. Homozyous plants show about 80% seedling lethality. Other phenotypes include hyponastic cotyledon and reduced [01;31m[Kroot[m[K size. Expression of WUS is reduced and the domain of STM (a marker of meristems)reduced.Heterozygotes have 25% abnormal embryos with smaller meristems. Phenotype can be rescued with the addition of sucrose in the growth media.	15753038
AT2G33880	stip-d ahk2-2 ahk3-3 cre1-12	Small SAM and limited [01;31m[Kroot[m[K growth.	20110319
AT2G35600	SALK_038885	No visible [01;31m[Kroot[m[K phenotype.	16514016
AT2G36120	dot1-1	dot1-1 mutants display an open-class leaf and cotyledon venation patterning defect with low penetrance. Post-gen

AT2G40890	cyp98A3	background Wassilewskija hypocotyls with shortened length and increased diameter, [01;31m[Kroot[m[Ks had a swollen aspect with increased initiation from the crown and showed reduced growth and gravitropism   in soil plant maintained the dwarfed phenotype with a rosette never exceeding 1 to 1.5 cm in diameter, developed a bushy miniature rosette of round leaves, growth of cyp98A3 plants was arrested latest at the onset of primary stem development (2 weeks later than wild-type), showed darker leaf coloration , severe alteration of lignin content and composition - mainly H units (95% to 3.5% in wild-type), ectopic lignification in [01;31m[Kroot[m[Ks
AT2G40950	zip17	zip17 mutants exhibit a greater inhibition of primary [01;31m[Kroot[m[K elongation in response to NaCl than wild type seedlings. The salt-sensitive phenotype co-segregates with the zip17 T-DNA and can be rescued by a 35S:AtbZIP17 construct. Several salt-responsive genes, such as ATHB-7, that showed

AT2G47000	mdr4-2	Although mutant seedlings display normal acropetal auxin transport in the [01;31m[Kroot[m[K, basipetal auxin transport is reduced by about 50%. [01;31m[KRoot[m[K waving proceeds normally in the mutant seedlings, but gravitropic curvature is enhanced, occurring more quickly and to a greater degree than in wild type seedlings.	17557805
AT2G47000	mdr4-2	mdr4-2 mutants display normal [01;31m[Kroot[m[K architecture and have wild type rates of lateral [01;31m[Kroot[m[K elongation.	17557807
AT2G47000	pgp4-1	The [01;31m[Kroot[m[K lengths of 10-d pgp4 seedlings were 30% shorter than those of the wild type, and the number of lateral [01;31m[Kroot[m[Ks was also significantly reduced under moderate light levels (100 to 120 mmolm2s1) using sucrose concentrations of 0.5 to 1%. Under high light or on sucrose concentrations >1.5%, the [01;31m[Kroot[m[K lengths and number of lateral [01;31m[Kroot[m[Ks observed were greater than or equal to those of the w

AT3G08040	CS6585	defective in iron translocation; chlorotic plants; expression of iron deficiency responses under conditions of iron sufficiency; overaccumulation of iron and other metals (plants accumulate approximately two-fold excess iron, four-fold excess manganesse and two-fold excess zinc in their shoots); detached [01;31m[Kroot[m[Ks are capable of repressing iron uptake responses when cultured under iron-sufficient conditions; when shoots are appropriately supplied with iron, they regreen and the plants are capable of correctly regulating [01;31m[Kroot[m[K iron uptake responses; isolated protoplasts have lower iron levels than those from wild type plants; plants accumulate abnormal levels of ferric iron in their [01;31m[Kroot[m[K vasculature, FRD3 is expressed in the pericycle and other vascular cylinder cells in the mature portion of the [01;31m[Kroot[m[K; lacks trichomes on stems and leaves.	12172022
AT3G08710	trx h9	[01;31m[KRoot[m[Ks and leaves of  the ho

AT3G18780	act2-1	Mutants have abnormal [01;31m[Kroot[m[K hairs that are stunted, bulging or may be branched. The orientation of microtubules is abnormal.	19304937
AT3G18780	act2-1/act7-4	Plants are dwarf and reach only 15% of the height of wild type. [01;31m[KRoot[m[Ks lack [01;31m[Kroot[m[K hairs- the initiate but do not develop further than a bulge. Leaves are small with reduced surface area and fewer lobes. Leaf trichomes are abnormal and contain fewer branches. Also defects in flower morphology, reduced fertility, abnormal inflorescence structure and abnormal silique development.	19304937
AT3G18780	act2-1/act8-2	Lacks [01;31m[Kroot[m[K hairs.[01;31m[KRoot[m[K hairs initiate but do not develop further.	19304937
AT3G19050	pok1-1/pok2-1	At seedling stage pok1/pok2 double mutants differ from wild-type by having smaller cotyledons as well as shorter, wider [01;31m[Kroot[m[Ks and hypocotyls. Adult plants exhibit a dwarfed stature, but all organs are present, altho

AT3G27920	aba2-2	Mutant showed a similar sensitivity to 25 mM trehalose as wild type (inhibition of [01;31m[Kroot[m[K growth).	17031512
AT3G27920	sos3-1 siz1-1	Decrease of macronutrients concentration in the MS salt formulation to 1/20X results in a substantial reduction in primary [01;31m[Kroot[m[K growth.	15894620
AT3G27920	sos3-1 siz1-1	Mutant grown in greenhouse or growth chamber exhibits reduced shoot and [01;31m[Kroot[m[K biomass relative to the wild-type.	15894620
AT3G27920	sos3-1 siz1-1	Substantially more pronounced prototypical Pi starvation [01;31m[Kroot[m[K architecture responses than sos3-1 or wild-type seedlings. On Pi-limited medium, mutant seedlings exhibit an inhibition of primary [01;31m[Kroot[m[K growth, an increase of lateral [01;31m[Kroot[m[K development and length, and higher [01;31m[Kroot[m[K/shoot fresh weight ratio than wild-type.  Enhanced resistance to Pseudomonas syringae DC3000.  Smaller in stature.  Impaired drought tolerance.	15

AT3G53020	stv1-1	Defects in apical-basal gynoecium patterning (similar to ett and mp mutants). Mutant plant gynoecia had shorter ovaries and longer gynophores than the wild type, but the total length of the gynoecium is normal before pollination. After pollination, the gynoecium of failed to elongate, resulting in shorter siliques which contained arrested seeds, resulting in lower fertility. The stv1-1 ovules had shorter integuments than wild-type ones, and in some extreme cases, the gametophyte was protruding from the integuments. Mutant plants had defects in vascular system and embryo organization, which are also associated with auxin signaling. Most cotyledons in mutant plants had abnormal vascular patterns that were asymmetric and/or disconnected. Cotyledons were occasionally fused or single, indicating that stv1-1 is associated with defects in embryo patterning. Growth was retarded in both aerial and underground parts of stv1-1 mutants, resulting in plants that were smaller overal

AT3G63010	gid1a-1 gid1b-1	Only slight differences between mutant and wild type plants with regard to rosette radius and [01;31m[Kroot[m[K length.	17194763
AT3G63010	gid1a-1 gid1b-1 gid1c-1	Dramatic reduction in rosette radius and [01;31m[Kroot[m[K length compared to wild type (87 and 74% reduction, respectively).	17194763
AT3G63010	gid1b-1	Only slight differences between mutant and wild type plants with regard to rosette radius and [01;31m[Kroot[m[K length.	17194763
AT3G63010	gid1b-1 gid1c-1	Only slight differences between mutant and wild type plants with regard to rosette radius and [01;31m[Kroot[m[K length.	17194763
AT3G63110	atipt1 atipt 3 atipt 5 atipt7	External application of <i>trans</i>-zeatin partially rescues the growth of aerial parts of the mutant, and reduced its lateral [01;31m[Kroot[m[K elongation.	17062755
AT3G63110	atipt1 atipt 3 atipt 5 atipt7	Reduced cambial activity and reduced secondary growth in both shoots and [01;31m[Kroot[m[Ks.	19074290
A

AT4G16760	acx1-3	Plants display decreased sensitivity to the inhibitory effect of indole-3-butyric acid (IBA) on [01;31m[Kroot[m[K elongation, while remaining sensitive to inhibitory concentrations of indole-3-acetic acid.  They maintain their ability to initiate lateral [01;31m[Kroot[m[Ks in response to IBA.	15743450
AT4G17615	cbl1 cbl9	growth of the cbl1  cbl9 double mutant was significantly more inhibited than the  growth of wild-type [01;31m[Kroot[m[Ks under low-K+ conditions. Double mutants lost  water significantly more slowly than the ecotype hybrid  plants. Hypersensitive to ABA treatmnet.
AT4G18750	dot4-1	dot4-1 mutants have an aberrant midgap leaf and cotyledon venation pattern and their [01;31m[Kroot[m[Ks are slightly shorter than wild type [01;31m[Kroot[m[Ks.	18643975
AT4G18780	CS18	With the exception of the collapsed xylem phenotype, both the organization of the vascular tissue and the organization of the entire stem of the three irx mutants are identic

AT4G29130	CS6383	At the flowering stage, the mutant has a smaller [01;31m[Kroot[m[K system, tiny leaves with delayed senescence, shorter petioles and inflorescences, and a reduced number of flowers and siliques. Trichomes are normal and flower size and shape, and produced fertile seeds (albeit in reduced number).	12690200
AT4G29130	CS6383	Defect in elongation of hypocotyl, [01;31m[Kroot[m[Ks, and cotyledons in dim light (15 &mu;mol/m2/s).	12690200
AT4G29130	CS6383	Insensitive to high-glucose repression of cotyledon expansion, chlorophyll accumulation, true-leaf development and [01;31m[Kroot[m[K elongation. The response is specific to glucose but not to osmotic changes.	17081979
AT4G29170	Atmnd1-delta1	Mutant plants had normal vegetative development, but flowers failed to produce pollen and seeds; no viable pollen was found in the anthers. In developing pollen mother cells, meiotic progression was interrupted at anaphase I, where chromosomes are randomly distributed and some

AT5G01180	atptr5-1	The ptr5-1 mutant grows normally on soil and on AM medium. Its pollen tubes germinate and elongate normally on germination media. The tubes can germinate normally in the presence of 100uM alanyl-ethionine (Ala-Eth), a toxic dipeptide, but they elongate more than wild type pollen tubes. Their [01;31m[Kroot[m[Ks can also grow normally in the presence of Ala-Eth. ptr5-1 mutants have a similar nitrogen content to wild type plants. ptr5-1 mutants and wild type plants have a similar dry weight when both are grown with Pro-Ala or Ala-Ala dipeptides as the nitrogen source.	18753286
AT5G01180	atptr5-2	The ptr5-2 mutant grows normally on soil and on AM medium. Its pollen tubes germinate and elongate normally on germination media. The tubes can germinate normally in the presence of 100uM alanyl-ethionine (Ala-Eth), a toxic dipeptide, but they elongate more than wild type pollen tubes. Their [01;31m[Kroot[m[Ks can also grow normally in the presence of Ala-Eth. ptr5-2 mut

AT5G03730	ctr1-4	Mutation had dramatic effects on the morphology of adult plants. The rosette leaves of mutant plants were much smaller than wild-type leaves. The mutants bolted - l-2 weeks later, the early flowers were infertile, the [01;31m[Kroot[m[K system was much  less extensive, and the inflorescence was much smaller than those of the wild type.  The gynoecium of the mutant elongated significantly earlier relative to the rest of the developing flower, often protruding  out of the unopened buds.	8431946
AT5G03730	ctr1-5	Mutation had dramatic effects on the morphology of adult plants. The rosette leaves of mutant plants were much smaller than wild-type leaves. The mutants bolted - l-2 weeks later, the early flowers were infertile, the [01;31m[Kroot[m[K system was much  less extensive, and the inflorescence was much smaller than those of the wild type.  The gynoecium of the mutant elongated significantly earlier relative to the rest of the developing flower, often protruding

AT5G11260	hy5-KS50	[01;31m[KRoot[m[Ks of wild-type plants turn green when grown under light conditions.  The [01;31m[Kroot[m[Ks of wildtype plants were green, whereas those of the mutants remained white after having been  cultured for 30 days under light. The chlorophyll content in [01;31m[Kroot[m[Ks of the wild-type plant (Ws) was 1.1 &#956;g of chlorophyll/gram of fresh [01;31m[Kroot[m[Ks. Chlorophyll was not detected in [01;31m[Kroot[m[Ks of hy5-1 or hy5-Ks50.	9367981
AT5G13290	sol2-1	WT [01;31m[Kroot[m[K phenotype.  34% of terminal flowers have extra carpels in the fourth whorl.
AT5G13570	dcp2-1	Null mutants of DCP1, DCP2, and VCS accumulate capped mRNAs with a reduced degradation rate. The homozygous progeny of these mutants also share a similar lethal phenotype at the seedling cotyledon stage, with disorganized veins, swollen [01;31m[Kroot[m[K hairs, and altered epidermal cell morphology.	17158604
AT5G13570	dcp2-1	The homozygous progeny is seedling let

AT5G20490	xi-k / xi-2	The overall growth of the xi-k / xi-2 double mutant is normal. In the leaf epidermis, Golgi stacks, peroxisomes, and mitochondria move more slowly in these mutant cells than in wild-type cells. [01;31m[KRoot[m[K hair length is reduced to ~20% of the wild type [01;31m[Kroot[m[K hair length in these double mutants, but their [01;31m[Kroot[m[K hair density is very similar to the wild type density.	19060218
AT5G20490	xik-3	The xik-3 mutant has normal [01;31m[Kroot[m[K growth, but its [01;31m[Kroot[m[K hairs are only 77% of the length of wild type [01;31m[Kroot[m[K hairs, and they have a hooked shape when grown on media with sugar. The stem trichomes are crooked and wavy and often have abnormally elongated stalks. The trichomes on the leaves also have size and shape irregularities.  But, the total number of trichomes is the same for mutants and wild-type plants.	17458634
AT5G20730	CS24625	The phenotype of the double mutant is most obvious at its

AT5G24310	amiRNA_ABIL3	Trichomes are irregularly expanded and distorted. Branch  lengths are significantly affected with all  three branches being significantly shorter than the respective branches of wild type trichomes.  Increased distance between first and second branch point as compared to WT.  Increased diameter at base of trichome.  Actn filaments in developing trichomes are generally more bundled  and more randomly distributed.  Increased [01;31m[Kroot[m[K length.
AT5G24520	urm23	urm23 mutant leaves produce almost normal trichomes with slight clustering phenotype. Neither [01;31m[Kroot[m[K hair patterning, anthocyanin (flavonoid) biosynthesis, nor seed coat mucilage production are affected in urm23, distinguishing this allele from all other known ttg1 alleles.	19234066
AT5G25220	SALK_136464	No visible [01;31m[Kroot[m[K phenotype.	16463096
AT5G25350	ebf1-1 ebf2-1	The mutant shows dwarfed growth, supernumerary epinastically curled leaves, early senescence, and abnormal

AT5G39510	sgr4-1	Inflorescence stems and hypocotyls of mutant plants showed abnormal gravitropic response, while [01;31m[Kroot[m[K gravitropism was normal.	9210330
AT5G39610	nac2-1	Not significantly affected in lateral [01;31m[Kroot[m[K development.	16359384
AT5G42890	Atscp2-1	Lower germination frequency and time, restored by addition of 1% sucrose. Smaller cotyledon rosettes and  shorter hypocotyls than wild-type and the [01;31m[Kroot[m[K elongation  was also retarded. In the dark, [01;31m[Kroot[m[K elongation was inhibited  on media lacking sucrose
AT5G43900	SALK_055785C	There are no obvious defects in the aerial tissues of xi-2 mutant plants grown under normal conditions. The [01;31m[Kroot[m[K length of the mutant is also normal, but, it has shorter [01;31m[Kroot[m[Ks hairs than wild-type plants.	18178669
AT5G43900	xi-2 / xi-b	The overall growth of the xi-2 / xi-b double mutant is normal. In the leaf epidermis, Golgi stacks and peroxisomes move more slowly in

AT5G48150	pat1-1	No aberrant [01;31m[Kroot[m[K phenotype and no resistance to paclobutrazol (1-100 &mu;M).	10817761
AT5G48160	obe1-1 obe2-1	Premature termination and loss of the shoot apical meristem. Failure to establish or maintain the [01;31m[Kroot[m[K apical meristem.  Aberrant embryonic development.	18403411
AT5G48160	obe1-1 obe2-2	absence of [01;31m[Kroot[m[Ks and defective development of the vasculature.	19392692
AT5G48300	aps1-SALK_040155	In short days plants have delayed growth and were late flowering.In long day conditions, plants also had delayed growth but flowering time was unaffected. Lacks ADP-Glc PPase activity in leaves and no starch can be found in leaves or [01;31m[Kroot[m[Ks.	18614708
AT5G49190	GABI_377G03	The amounts of glucose, fructose, sucrose, cellulose ([01;31m[Kroot[m[Ks) and starch in the mutant were not statistically significantly different  from those of the equivalent wild-type lines grown under the same conditions at the same time.	17

AT5G54690	irx8-3	Collapsed xylem vessels in [01;31m[Kroot[m[Ks and stems, suggesting a decrease in xylem wall strength in the homozygous mutants.	17237350
AT5G54690	irx8-3	Where in wild-type [01;31m[Kroot[m[Ks xylem files form a closely packed array of cell  files, leading to the regular occurrence of  tripartite cell corners between xylem cell files, the central cylinder of homozygous mutant [01;31m[Kroot[m[Ks was disorganized and  the xylem files did not form a closely packed array. As a  consequence, tripartite xylem cell  corners could not be detected in the mutant.	17237350
AT5G54690	irx8-4	Collapsed xylem vessels in [01;31m[Kroot[m[Ks and stems, suggesting a decrease in xylem wall strength in the homozygous mutants.	17237350
AT5G54690	irx8-4	Where in wild-type [01;31m[Kroot[m[Ks xylem files form a closely packed array of cell  files, leading to the regular occurrence of  tripartite cell corners between xylem cell files, the central cylinder of homozygous mutan

AT5G60920	CS8541	When grown on media containing 0.5% sucrose, the mutant had reduced [01;31m[Kroot[m[K length but no apparent radial cell expansion.	7743935
AT5G60920	cob-4	Dark grown hypocotyls have reduced length compared to wild type. [01;31m[KRoot[m[K cells appear swollen and have abnormal radial expansion. Epidermal cells bulge. Cotyledon and hypocotyls are smaller and thicker than wild type. Cellulose microfibrils are randomly oriented.	15849274
AT5G61070	SALK_006938	Increased [01;31m[Kroot[m[K hair density.	16176989
AT5G62165	agl42-4	No [01;31m[Kroot[m[K anatomy phenotype	15937229
AT5G62500	eb1b-1	Plants grown on agar plates have [01;31m[Kroot[m[Ks that skew towards the left occasionally forming clockwise loops with twisted epidermal cell files. Also has reduced/delayed response to gravity.	18281505
AT5G62940	hca2	Plants are dwarfed. In the hca2 mutant inflorescence stems, petioles,and main veins of leaves, the ordered patterning of vascular bundles is replac

LPI1	lpi1	Insensitive to the primary [01;31m[Kroot[m[K growth inhibition caused by low phosphorous.The aerial parts are phenotypically similar to the wild type, except that these plants flowered 3-5 days earlier than wild type plants. Have normal [01;31m[Kroot[m[K elongation and cell division.	16443695
LPI2	lpi2	Insensitive to the primary [01;31m[Kroot[m[K growth inhibition caused by low phosphorous.The aerial parts are phenotypically similar to the wild type, except that these plants flowered 3-5 days earlier than wild type plants. Have normal [01;31m[Kroot[m[K elongation and cell division.	16443695
LPI3	lpi3	Insensitive to the primary [01;31m[Kroot[m[K growth inhibition caused by low phosphorous.The aerial parts are phenotypically similar to the wild type, except that these plants flowered 3-5 days earlier than wild type plants. Have normal [01;31m[Kroot[m[K elongation and cell division.	16443695
LPI4	lpi4	Insensitive to the primary [01;31m[Kroot[m[K growt

In [7]:
echo "With this command the count of matches with the word root ('root' or 'roots') are obtained:"
grep -c root Data_Only.csv 

With this command the count of matches with the word root ('root' or 'roots') are obtained:
830


In [8]:
echo "With this command the count of matches with the word root in a sensitive search (only 'root') are obtained:"
grep -c -w root Data_Only.csv 

With this command the count of matches with the word root in a sensitive search (only 'root') are obtained:
592


In [9]:
echo "With this command the count of matches with the word root in a insensitive search ('root', 'ROOT', 'Roots', 'roots', 'ROOTS') are obtained:"
grep -c -i root Data_Only.csv 

With this command the count of matches with the word root in a insensitive search ('root', 'ROOT', 'Roots', 'roots', 'ROOTS') are obtained:
887


## Explanation

### Command 

The grep command searches for a particular pattern of characters, and displays all the lines that contain that pattern. 

It has multiple options: 

* -c : This prints only a count of the lines that match a pattern
* -h : Display the matched lines, but do not display the filenames.
* -i : Ignores, case for matching
* -l : Displays list of a filenames only.
* -n : Display the matched lines and their line numbers.
* -v : This prints out all the lines that do not matches the pattern
* -e exp : Specifies expression with this option. Can use multiple times.
* -f file : Takes patterns from file, one per line.
* -E : Treats pattern as an extended regular expression (ERE)
* -w : Match whole word
* -o : Print only the matched parts of a matching line, with each such part on a separate output line.

* -A n : Prints searched line and nlines after the result.
* -B n : Prints searched line and n line before the result.
* -C n : Prints searched line and n lines after before the result.



In the enunciate of the problem, doesn't specify if it is neccesary do a insensitive search with -i (give all the lines with all the possibilities of the word: "root", "ROOT", "Root") or not. 
Therefore, I use other commands to test the different ouputs that are possible to obtain if the search is case sensitive or not.
Taking in account that it is logical that we want to know every single gene that it is biologically implicating somehow the roots of a plant, the command more appropiate do a insensitive search. 


## Problem 7

* Create a command that writes the AGI Locus Code for every line that has a phenotype including the word "root" to a file called: **Root-associated-Loci.txt**



In [4]:
grep -i root Data_Only.csv | awk '{print $1}' > Root-associated-Loci.txt

In [11]:
#check 
grep -i root Data_Only.csv | wc -l
wc -l Root-associated-Loci.txt

857
857 Root-associated-Loci.txt


## Explanation 

### Command 

Awk is a command that prints every line of data from the specified file. Combined with $n you can print the line of the column that you want. 
In this case, I combined my last command with awk to get the file and I make a test afterwards to check that prints the same number of lines.

Other way to obtain the same result is using regular expressions, since it is known that the locus code have a predictive estructure: 

With:

 ```
 grep -i root Data_Only.csv | grep -E "^AT[1-5]G[0-9]{5}"
 
 ```

But you should check first that the predictive estructure of the first column doesn't change in any row to avoid mistakes. For that reason, I found quicker use my command with awk instead and in this way it is possible obtain the name of the Loci event if doesn't match with the predictive estructure.





## Problem 8

* Create a command that writes the PubMed ID for every line that has a phenotype including the word "root" to a file called: **Root-associated-Publications.txt**




In [12]:
grep -i root Locus_Germplasm_Phenotype_20130122.txt | awk '{print $(NF)}' > Root-associated-Publications.txt

In [13]:
#check 
grep -i root Data_Only.csv | wc -l
wc -l Root-associated-Publications.txt

857
857 Root-associated-Publications.txt


## Explanation

I used the same approach that in the problem 7.

When I reviewed the data I saw that some of locus doesn't have a proper PubMed ID. It is possible use a regular expression for the PubMed ID ("[0-9]"), to solve the problem, and get a clean file only with PubMed IDs, but it is important understand that there going to be some of then lacking because are also lacking in the original file.

In [9]:
#Solving the problem
grep -E -i root Data_Only.csv | grep -E -io "[0-9]+$" > Root-associated-Publications.txt

## Problem 9

* _**Control experiment**_:  You would hypothesize that genes associated with roots **should be found on all chromosomes.**  Find a way (one or more commands) to test this hypothesis.  In this dataset, is the hypothesis true? 




In [54]:
echo 'The genes that are associated with roots can be found in the following chromosomes:'
grep -E -o "^AT[0-5]"  Root-associated-Loci.txt | tr -d "AT" | uniq 

The genes that are associated with roots can be found in the following chromosomes:
1
2
3
4
5


## Explanation

The chromosomes are described after AT in AGI code. We know that there are only 5 chromosomes, therefore, the hypothesis is correct. 

### Command 

I used two new commands: 

* tr -d for delete the sequence "AT", it's not necessary any more
* uniq for remove all the duplicates 

In [1]:
grep -E -o "^AT[0-5]"  Root-associated-Loci.txt | tr -d "AT" | uniq > Chromosomes_presents.txt

## Problem 10

* If your control experiment shows genes on every chromosome, then you can skip this question!  (you answered Problem 9 correctly!)

*  If your control experiment shows genes only on one or two chromosomes, then you have to explain why... what could the problem be?  (I told you specifically to be careful about this problem!)




## Problem 11

* 'git commit' and 'git push' your answers to your GitHub, then give me your GitHub username before you leave the class. I will clone your repositories and grade your answers.