
## Some basic jargon - string-
Exercise obtained and modified from https://pythonforbiologists.com - By Dr. Martin Jones.

We would like to introduce – the word (jargon) *string*. String is the word we use to refer to a piece of text in a computer program (it just means a string of characters). The word *string* will be used when we're talking about computer code, and the word *sequence* will be used for when we're discussing biological sequences like DNA and protein. 

---

## Printing a message to the screen

The first thing we're going to learn is how to print a message to the screen. When we talk about printing text inside a computer program, we are not talking about producing a document on a printer. The word "print" is used for any occasion when our program outputs some text – in this case, the output is displayed under the code cell.

Here's a line of Python code that will cause a friendly message to be printed:

In [1]:
print("Hello world")

Hello world


This line of code is called a *statement*.

`print()` is the name of a *function*. The function tells Python, in vague terms, what we want to do – in this case, we want to print some text. The function name is always followed by parentheses. 

The bits of text inside the parentheses are called the *arguments* to the function. In this case, we just have one argument (when you have more than one argument, arguments are separated by commas). 

The arguments tell Python what we want to do more specifically – in this case, the argument tells Python exactly what it is we want to print: the message "Hello World".

If you're opening this notebook for the first time, then the output from the code should already be displayed underneath, but if it's missing, just click somewhere inside the code cell and press the *run* buttton. 

These code examples are editable, so you can change them to print something else, run it again, and see how the output changes. 

In normal writing, we only surround a bit of text in quotes when we want to show that they are being spoken. In Python, however, strings are **always** surrounded by quotes. That is how Python is able to tell the difference between the instructions (like the function name) and the data (the thing we want to print). We can use either single or double quotes for strings – Python will happily accept either. The following two statements behave exactly the same:

In [2]:
print("Hello world")
print('Hello world')

Hello world
Hello world


You'll notice that the output above doesn't contain quotes – they are part of the code, not part of the string itself. If we **do** want to include quotes in the output, the easiest thing to do is use the other type of quotes for surrounding the string:

In [2]:
print("They said, 'Hello world'")
print('She said, "Hello world"')

They said, 'Hello world'
She said, "Hello world"


Be careful when writing and reading code that involves quotes – you have to make sure that the quotes at the beginning and end of the string match up, otherwise you'll get an error.

---

## Use comments to annotate your code

Occasionally, we want to write some text in a program that is for humans to read, rather than for the computer to execute. We call this type of line a *comment*. To include a comment in your source code, start the line with a hash symbol (This symbol has many names – you might know it as number sign, pound sign, octothorpe, sharp (from musical notation), cross, or pig-pen).

In [5]:
# this is a comment, it will be ignored by the computer
print("Comments are very useful!")

Comments are very useful!


Comments are a very useful way to document your code, for a number of reasons:

- You can put the explanation of what a particular bit of code does right next to the code itself. This makes it much easier to find the documentation for a line of code that is in the middle of a large program, without having to search through a separate document.
- Because the comments are part of the source code, they can never get mixed up or separated. In other words, if you are looking at the source code for a particular program, then you automatically have the documentation as well. In contrast, if you keep the documentation in a separate file, it can easily become separated from the code.
- Having the comments right next to the code acts as a reminder to update the documentation whenever you change the code. The only thing worse than undocumented code is code with old documentation that is no longer accurate!

Comments are very useful. When you start writing your own code, you will be amazed at how quickly you forget the purpose of a particular section or statement.

Comments can help by giving you hints about the purpose of the code, meaning that you spend less time trying to understand your old code, thus speeding up your progress. A side benefit is that writing a comment for a bit of code reinforces your understanding at the time you are doing it. 

In [6]:
# print a friendly greeting
print("Hello world")

Hello world


You'll see this technique used a lot in the code examples in this course, and I encourage you to use it for your own code as well.

---

## Error messages and debugging

It is important to know that **computer programs almost never work correctly the first time**. Programming languages are not like natural languages – they have a very strict set of rules, and if you break any of them, the computer will not attempt to guess what you intended, but instead will stop running and present you with an error message. You're going to be seeing a lot of these error messages in your programming career.

### Forgetting quotes

Here's one possible error we can make when printing a line of output – we can forget to include the quotes:

In [8]:
print(Hello world)

SyntaxError: invalid syntax (<ipython-input-8-50b4ae29d403>, line 1)

This is easily done, so let's take a look at the output we'll get when we try to run it. Just like normal output, when we're working in a Jupyter notebook the error message appears underneath the code. Unlike normal output, error messages are coloured to make it easier to see the different parts. 

Here's a version of the message with labels so we can look at each piece of information in turn:

```
  File "error.py", line 1❶
    print(Hello world) 
                    ^❷
SyntaxError❸: invalid syntax 
```


Looking at the output, we see that the error occurs on the first line of the file❶.  Python's best guess at the location of the error is just before the close parentheses❷. Depending on the type of error, this can be wrong by quite a bit, so don't rely on it too much! 

The type of error is a `SyntaxError`❸, which mean that Python can't understand the code – it breaks the rules in some way (in this case, the rule that strings must be surrounded by quotation marks). We'll see different types of errors later in the course.

## Spelling mistakes

What happens if we miss-spell the name of the function?:

In [9]:
prin("Hello world")

NameError: name 'prin' is not defined

We get a different type of error – a `NameError` – and the error message is a bit more helpful:

```
Traceback (most recent call last): 
  File "error.py", line 1, in <module> 
    prin("Hello world")❶ 
NameError: name 'prin'❷ is not defined 
```

This time, Python doesn't try to show us where on the line the error occurred, it just shows us the whole line❶ . The error message tells us which word Python doesn't understand❷, so in this case, it's quite easy to fix. 

---

### Splitting a statement over two lines

What if we want to print some output that spans multiple lines? For example, we want to print the word "Hello" on one line and then the word "World" on the next line – like this:

```
Hello
World
```

We might try putting a new line in the middle of our string like this:

In [10]:
print("Hello
World")

SyntaxError: EOL while scanning string literal (<ipython-input-10-c9fefab393b1>, line 1)

but that won't work and we'll get the following error message:

```
  File "error.py", line 1 
    print("Hello ❶
                ^ 
SyntaxError: EOL while scanning string literal❷
```

Python finds the error when it gets to the end of the first line of code❶. The error message❷ is a bit more cryptic than the others. *EOL* stands for End Of Line, and string literal means a string in quotes. So to put this error message in plain English: "I started reading a string in quotes, and I got to the end of the line before I came to the closing quotation mark".

If splitting the line up doesn't work, then how do we get the output we want.....?

## Storing strings in variables

OK, we've been playing around with the `print()` function for a while; let's introduce something new. We can take a string and assign a name to it using an equals sign – we call this a variable:

In [3]:
# store a short DNA sequence in the variable my_dna
my_dna = "ATTTTGAGCA"

The variable `my_dna` now points to the string `"ATGCGTA"`. We call this assigning a variable, and once we've done it, we can use the variable name instead of the string itself – for example, we can use it in a `print()` statement:

In [4]:
# print_variable.py

# store a short DNA sequence in the variable my_dna
my_dna = "ATTTTGAGCA"

# now print the DNA sequence
print(my_dna)

ATTTTGAGCA


Notice that when we use the variable in a `print()` statement, we don't need any quotation marks – the quotes are part of the string, so they are already "built in" to the variable `my_dna`. Also notice that this example includes a blank line to separate the different bits and make it easier to read. We are allowed to put as many blank lines as we like in our programs when writing Python – the computer will ignore them. 

A common error is to include quotes around a variable name:

In [5]:
my_dna = "ATTTTGAGCA"
print("my_dna")

my_dna


but as you can see, if we do this then Python prints the name of the variable rather than its contents.

We can change the value of a variable as many times as we like once we've created it:

In [7]:
my_dna = "ATTTTGAGCA"
print(my_dna)

# change the value of my_dna
my_dna = "TGGCCGGCCA" 
print(my_dna)

ATTTTGAGCA
TGGCCGGCCA


Hopefully you can see that the two identical `print()` lines give different outputs, because we've changed the value of `my_dna` in the middle.

Here's a very important point that trips many beginners up: variable names are **arbitrary** – that means that we can pick whatever we like to be the name of a variable. So our code above would work in exactly the same way if we picked a different variable name:

In [8]:
banana = "ATTTTGAGCA"
print(banana)

banana = "TGGCCGGCCA" 
print(banana)

ATTTTGAGCA
TGGCCGGCCA


What makes a good variable name? Generally, it's a good idea to use a variable name that gives us a clue as to what the variable refers to. In this example, `my_dna` is a good variable name, because it tells us that the content of the variable is a DNA sequence. Conversely, `banana` is a bad variable name, because it doesn't really tell us anything about the value that's stored.  

It's also important to remember that variable names are case-sensitive, so `my_dna`, `MY_DNA`, `My_DNA` and `My_Dna` are all different variables. Technically this means that you could use all four of those names in a Python program to store different values, but please don't do this – it is very easy to become confused when you use very similar variable names. 

---

## Tools for manipulating strings

Now we know how to store and print strings, we can take a look at a few of the tools.

### Concatenation

We can concatenate (stick together) two strings using the `+` symbol. This symbol will join together the string on the left with the string on the right:

In [9]:
# print_concatenated.py

my_dna = "CCTT" + "GGAA"
print(my_dna)

CCTTGGAA


In the above example, the things being concatenated were strings, but we can also use variables that point to strings:

In [10]:
upstream = "TTT"
my_dna = upstream + "ACGC"
print(my_dna)

TTTACGC


We can even join multiple strings together in one go:

In [11]:
upstream = "TTT"
downstream = "ACGC"
my_dna = upstream + "CAGC" + downstream
print(my_dna)

TTTCAGCACGC


It's important to realize that the result of concatenating two strings together is itself a string. So it's perfectly OK to use a concatenation inside a `print()` statement:

## Finding the length of a string

Another useful built in tool in Python is the `len()` function (`len` is short for length). Just like the `print()` function, the `len()` function takes a single argument (take a quick look back at when we were discussing the `print()` function for a reminder about what arguments are) which is a string. However, the behaviour of `len()` is quite different to that of `print()`. Instead of outputting text to the screen, `len()` outputs a value that can be stored – we call this the **return value**. 

Let's see what happens when we write a line of code that uses `len()` to calculate the length of a string:

In [22]:
print("Hello" + " " + "world")

Hello world


In [13]:
len("ATTC")

4

Because a Jupyter notebook always shows us the value of the last thing that was calculated in a code cell, we can see the answer - the string is four characters long. But if we actually want to do something useful with that value, we need to store it:

In [14]:
dna_length = len("ATTC")
print(dna_length)

4


This example looks very similar, but the difference is that now we have stored the result of the `len()` function, and later in the program we will be able to use it. 

There's another interesting thing about the` len()` function: the result (or return value) is not a string, it's a number. This is a very important idea: **Python treats strings and numbers differently**.

### Working with both numbers and strings

Consider this short program which calculates the length of a DNA sequence and then prints a message telling us the length:

In [25]:
# store the DNA sequence in a variable
my_dna = "ATGCGAGT"

# calculate the length of the sequence and store it in a variable
dna_length = len(my_dna)

# print a message telling us the DNA sequence lenth
print("The length of the DNA sequence is " + dna_length)

TypeError: must be str, not int

When we try to run this program, we get the following error:

```
    print("The length of the DNA sequence is " + dna_length) 
TypeError: cannot concatenate 'str' and 'int' objects❶
```

The error message❶ is short but informative: "cannot concatenate 'str' and 'int' objects". Python is complaining that it doesn't know how to concatenate a string (which it calls `str` for short) and a number (which it calls `int` – short for integer).  Strings and numbers are examples of **types** – different kinds of information that can exist inside a program.

Happily, Python has a built in solution – a function called `str()` which turns a number into a string so that we can print it. Here's how we can modify our program to use it:

In [26]:
# print_dna_length.py

my_dna = "ATGCGAGT"
dna_length = len(my_dna)

print("The length of the DNA sequence is " + str(dna_length))

The length of the DNA sequence is 8


The only thing we have changed is that we've replace `dna_length` with `str(dna_length)` inside the `print()` statement. Notice that because we're using one function (`str()`) inside another function (`print()`), our statement now ends with two closing parentheses. 

Let's take a moment to refresh our memory of all the new terms we've learned by writing out what we need to know about the `str()` function: 

`str()` is a function which takes one argument (whose type is number), and returns a value (whose type is string) representing that number. 

Sometimes we need to go the other way – we have a string that we need to turn into a number. The function for doing this is called `int()`, which is short for *integer*. It takes a string as its argument and returns a number:

In [27]:
3 + int('4')

7

In that little example, I'm taking advantage of the fact that Jupyter will display the result of the last calculation to see the answer without having to print it. Note that this trick only works inside Jupyter notebooks - it won't work if you try to run the code in a separate file.



## Recap

We've learned about some general features of the Python programming language like

- the difference between functions, statements and arguments
- the importance of comments and how to use them
- how to use Python's error messages to fix bugs in our programs
- how to store values in variables
- the way that types work, and the importance of understanding them

And we've encountered some tools that are specifically for working with strings:

- concatenation
- different types of quotes
- finding the lenght of a string 


## Motto for the course¶
In the words of Voltaire: "Le mieux est l'ennemi du bien." 
or Perfect is the enemy of the good. That is if your code works, then it is good. It doesn't need to be perfect.