A Gentle Introduction to SQL Using SQLite
SQL or Structured Query language is the language used to communicate with relational databases. What are relational databases? Well, most of the popular database systems you may know, such as MS Access, MySQL or SQLite, are all relational. That is, they all use a relational model, which, it turns out, can be described much like a spreadsheet:
- Data are organized into tables (relations) that represent a collection of similar objects (e.g. contributors).
- The columns of the table represent the attributes that members of the collection share (last name, home address, amount of contribution).
- Each row in the table represents an individual member of the collection (one contributor).
- And the values in the row represent the attributes of that individual (Smith, 1228 Laurel St., $250).
Much of the power of a relational database lies in the ability to query these relations, both within a table (give me all contributors who donated at least $500 and who live in Wyoming) and among tables (from the contributors, judges and litigants tables, give me all contributors who donated at least $1000 to Judge Crawford and who also had legal cases over which Judge Crawford presided). SQL is the powerful and rather minimalist language we use to ask such questions of our data in a relational database. How minimalist is SQL? The basic vocabulary for querying data comes down to a few main verbs:
SELECT INSERT UPDATE DELETE
I imagine you can guess what each of those verbs does, even if you’ve never written a database query.
To create and change the structure of tables in the database, there are a few other verbs to use:
CREATE DROP ALTER
Those are the keywords that perform almost everything you need to do. The language also includes a number of modifiers that help specify the action of the verbs, but the core list comes down to a couple dozen words. These basic keywords are common across pretty much all relational databases. A specific database management system (Access, MySQL or SQLite) may add its own extensions to the common keywords, but the lion’s share of the work is done with this handful of words, and they’re basically the same across database applications.
By combining these simple keywords, you can create remarkably complex and specific queries. And the basic syntax still reads fairly clearly:
SELECT last_name FROM contributors WHERE state = 'WY';
The SQL query above reads pretty much like the English sentence for the same request:
Select the last name from the contributors table where the contributor's state is WY.
If you’re using a graphical interface such as a datagrid, that interface is simply constructing queries like these behind the scenes. So, why not take command of your queries and write them yourself?
A couple of things off the bat:
- SQL keywords are not case-sensitive. So capitalizing SELECT in the statement above is optional. Using all caps for keywords is considered good form, though, because it helps distinguish keywords from table names or other non-keywords.
- The statement ends with a semi-colon. This is the standard way of ending a statement in SQL. Some systems enforce this convention.
So, let’s dive in. For this tutorial, we will be using SQLite, a free and open source database manager that’s lightweight and portable.
If you have the Firefox browser, you have SQLite. Firefox uses SQLite to manage its own data, so when you install Firefox, it brings SQLite along with it. (A growing number of software packages are using SQLite in this way.) So, if you don’t have Firefox installed, install it now
Install the SQLite Manager Firefox Add-on
To create our own databases using Firefox’s SQLite installation, we’ll use the free SQLite Manager add-on.
Go to this page and click “Add to Firefox”.
Click “Install Now.” (You may need to upgrade Firefox if you’re using a version older than 3.5.)
In the Tools menu, you should find the SQLite Manager:
Creating the First Database
Mousing over each of the icons at the top of the SQLite Manager tool will show what the icon does.
To create a database, simply click on the icon for “New Database”
Name the database “test” and save it anywhere you like (the desktop will work, or your documents folder). This single file will contain
the entire database you create.
Creating a Table
Click the “Create Table” icon (), and you’ll get a form allowing you to create a new table.
To create a table, we have to define the attributes or columns that make it up. For each column, we define the datatype of the data it will contain. Define the columns as follows:
- Name the table “contributors.”
- The id field will be a unique identifier for each contributor (and therefore will be the “Primary Key” for the row). It will be an integer that will automatically increment with each row we add (so each new row will have a new id), and it cannot be null or empty (because we need it as the unique identifier).
- The next three columns will all contain text strings of undetermined lengths (last names, for example, come in all kinds of lengths), so we’ll use the VARCHAR or “variable character” datatype.
- For the state and zip columns, if we know that the number of characters will always be the same (for example, 2 for state postal abbreviation), we could use the CHAR character type, rather than VARCHAR. The pros and cons of choosing between the two vary among database management systems. SQLite doesn’t really distinguish between the two, so we’ll just use VARCHAR to keep things simple.
By the way, It may seem strange that the zip column uses a VARCHAR datatype, but remember that some zips start with a 0 (00501 is in NY). So, we want to treat this column as a string of text, rather than as a number (which would be 501).
Click OK, and you will see a dialogue box with the full SQL statement that the manager will execute on your database:
CREATE TABLE "main"."contributors" ("id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL , "last_name" VARCHAR, "first_name" VARCHAR, "city" VARCHAR, "state" VARCHAR, "zip" VARCHAR, "amount" INTEGER)
The syntax should be fairly clear, since it just reflects the choices we made in the form. It’s creating a table in the “main” database (not in the separate “temporary” space) called “contributors” with the fields and data types we’ve defined. Click OK to create the table.
You should now see a “contributors” table in the Tables list on the left panel of the manager. Clicking the icon (arrow or + sign) beside the listing for “contributors” will show you the column list for the table.
Now that we have a table in the database, we can start inserting data. This task is accomplished with (oddly enough) an INSERT statement.
You’ll see that the “Enter SQL” box is already pre-populated with a SELECT statement. Since we don’t have anything in the table yet, this won’t really help. So, delete the “SELECT * from tablename” statement, and replace it with the following:
INSERT INTO contributors (last_name, first_name, city, state, zip, amount) VALUES ('Buffet', 'Warren', 'Omaha', 'Nebraska', '68101', 1500);
This is a little more obscure than the CREATE or SELECT syntax, but it’s still fairly clear. To insert a row in the table, we execute the INSERT INTO statement with a table name, a list of columns to populate, and the VALUES for each of those columns. The order of the columns in the column list must match the order of values in the values list.
It’s very important to surround text values with single quotation marks. Two things to note:
- The quotation marks indicate to SQL that this is a literal string (the word ‘Buffet’), rather than a column name or other special usage.
- SQL uses single quotation marks around text strings. Some database systems will also accept double quotes, but some will throw an error.
- The commas between values are placed outside of the quote marks, not inside.
Notice that we didn’t insert a value for id. Because we set that column to AUTOINCREMENT, SQLite will populate the id with the next integer in the sequence. So, we don’t need to worry about choosing unique ids; SQLite takes care of it.
Finally, we didn’t include dollar signs or commas in the “amount” column. We created the “amount” column as an integer, so we should only insert integers there. (Different database management systems will react differently if you try to insert non-numeric characters in an integer column; it’s best to avoid doing so.)
If you haven’t done so already, click the “Run SQL” button. The “Last Error” field should show “not an error.” Success! You’ve added data.
Just so we’ll have some data to play with, let’s execute a couple of other
INSERT statements. Go back to the “Execute SQL” tab and paste in these lines:
INSERT INTO contributors (last_name, first_name, city, state, zip, amount) VALUES ('Winfrey', 'Oprah', 'Chicago', 'IL', '60601', 500); INSERT INTO contributors (last_name, first_name, city, state, zip, amount) VALUES ('Chambers', 'Anne Cox', 'Atlanta', 'GA', '30301', 200); INSERT INTO contributors (last_name, first_name, city, state, zip, amount) VALUES ('Cathy', 'S. Truett', 'Atlanta', 'GA', '30301', 1200);
You can paste these three lines into the ‘Enter SQL’ text box together. The semi-colons indicate the end of each statement.
Click “Run SQL”
Again, you can view the results in the “Browse and Search” tab.
Now that we have a small data set to use, let’s start querying it. In the “Execute SQL” tab, type the following in the “Enter SQL” text box:
SELECT * FROM contributors;
And click the “Run SQL” button.
You should see a nice grid display of all contributors you’ve added. The * character is a common wildcard. In this SELECT, it is used to indicate all columns. So, we have selected all columns from all rows in the contributors table.
To define which columns of data you want to return, simply provide a comma-separated list of column names to SELECT:
SELECT city, state FROM contributors;
Clicking “Run SQL” should give you a two-column table of cities and states.
Using DISTINCT to get a distinct set
The SELECT query above gives us a list of cities and states, but it includes duplicate rows for Atlanta, GA. Adding DISTINCT to the query eliminates the duplicates:
SELECT DISTINCT city, state FROM contributors;
Now you should have only three rows in your results, showing the distinct values for city and state in the table.
Notice what happens if you add the
last_name field to the DISTINCT query:
SELECT DISTINCT last_name, city, state FROM contributors;
We’re back to four rows. There are four distinct combinations of last_name, city and state in the table, so that’s what we get from DISTINCT.
The WHERE Clause
The WHERE clause provides the scalpel for your SQL operations. A well-crafted WHERE clause can let you take exactly the slice of the data you want. It sets the conditions for the SELECT, and the query will return only those rows that match the conditions.
Say, for example, we only wanted to see contributors from Georgia:
SELECT * from contributors WHERE state='GA';
(Remember the single quotes around the string “GA”.)
And you can test for more than equality in the WHERE clause. This query finds only the contributors who have donated more than $1200:
SELECT * from contributors WHERE amount > 1200;
Of course, donors who have given exactly $1200 won’t be included in the results. To include them, use the >= operator:
SELECT * from contributors WHERE amount >= 1200;
Here are some other operators you can use:
|>=||Greater than or equal|
|<=||Less than or equal|
(*Many database systems also use != for “Not equal.”)
Combining conditions with AND and OR
You can combine conditions using AND and OR. For example, let’s find all contributors from Georgia who have given more than $1000:
SELECT * FROM contributors WHERE state = 'GA' AND amount > 1000;
Now let’s find all contributors who either live in Georgia or who have given more than $1000:
SELECT * FROM contributors WHERE state = 'GA' OR amount > 1000;
And now let’s try to get the big spenders from Chicago and Georgia:
SELECT * FROM contributors WHERE city = 'Chicago' OR state = 'GA' AND amount > 1000;
Hmm . . . Oprah is in the list, but she only donated $500. What gives?
The problem here is that the AND operator has a higher precedence than the OR operator, which means it gets evaluated first. So, in effect, our query really looks like this:
SELECT * FROM contributors WHERE city = 'Chicago' OR (state = 'GA' AND amount > 1000);
Which selects all contributors from Chicago and only those contributors from Georgia who have also donated more than $1000.
We can use parentheses to clarify the original query and actually get the high rollers we wanted:
SELECT * FROM contributors WHERE (city = 'Chicago' OR state = 'GA') AND amount > 1000;
Parentheses are often helpful when you need to disambiguate a query. Tehnically, you’re changing the order of evaluation here, but you’re also just making the intention of your statement clear.
Sorting results with ORDER BY
To order your result set by the values in a particular column, use ORDER BY:
SELECT last_name, amount FROM contributors WHERE amount > 200 ORDER BY amount;
Only the rows matching the WHERE clase are returned (i.e. only those with an amount exceeding $200).
The default direction for ORDER BY is ascending; results are ordered from smallest amount to greatest.
To specify the direction of the ORDER BY, use the
SELECT last_name, amount FROM contributors WHERE amount > 200 ORDER BY amount DESC;
You can also order the results by more than one column. Rows with the same value for the first column of the ORDER BY are further ordered by the additional column(s):
SELECT last_name, state, amount FROM contributors ORDER BY state, amount DESC;
Here we get the list of contributors ordered by state and then ordered by the amount of their contribution. This is one quick way to see who has contributed the most in each state.
Changing values with UPDATE
Now we have some basic skills for creating tables, inserting data into the table and querying the data we’ve inserted. But what about changing the values in existing rows? To change the value of existing rows, we use the
One thing that just looks wrong with our data set is that value “Nebraska” in the state column:
SELECT state FROM contributors;
That should be the postal abbreviation, like the other rows. To change that value, we need to use
SET the value for a column. But we want to make sure we don’t blow away the state values in our other columns.
If we just used
UPDATE contributors SET state = 'NE'; //DON'T DO THIS!!, we would end up replacing the
state value in every row with “NE”. Not exactly what we want.
So, we have to define a WHERE clause to determine which rows will be changed by the
UPDATE contributors SET state = 'NE' WHERE state = 'Nebraska';
Ok, let’s see how the state list looks:
SELECT DISTINCT state FROM contributors;
Now that’s more like it.
The remaining keyword from the original list is
DELETE, which unsurprisingly deletes rows from the table. As when using
UPDATE, it’s important to specify a
WHERE clause with
DELETE without a
WHERE clause will blow away your precious data and can seriously ruin your day.
Before executing a
UPDATE, it’s always a good idea to run a
SELECT with the same
WHERE clause, just to see which rows your changes will affect.
So, let’s get rid of one of our rows. How about deleting Warren Buffet?
WHERE clause, we could match on any column or combination of columns, but if we know the
PRIMARY KEY value of the row, that’s our safest bet. Because it’s a unique identifier, we can be certain we’re not accidentally deleting other rows. First let’s make sure we have the row we want:
SELECT * FROM contributors WHERE id = 1;
Looks like the one we want, so let’s delete it:
DELETE FROM contributors WHERE id = 1;
Notice that we don’t need to specify columns or use
DELETE, since we’re deleting the entire row.
Now the row should be gone:
SELECT * FROM contributors;
Importing data from a file
Finally let’s move on to a topic that takes us a bit outside the basic SQL syntax. One common task we all face in data management is importing a data set into the database. Often, we receive a file in some other format such as MS Excel, CSV (comma-separated values) or tab-delimited and we want to get those values into a database table in order to run SQL queries on them.
Each database management system handles importing values from a file a bit differently. The SQLite Manager provides a nice interface for doing data imports from text files (you’ll have to export an Excel file as a CSV before importing it).
First, let’s grab a plain text file full of contributors from the FEC database. Go to https://github.com/tthibo/SQL-Tutorial/raw/master/tutorial_files/contributors.txt and save the file as “contributors.txt” somewhere you can find it (your desktop is a good place).
Notice that this file is pipe-delimited (the columns are separated by the | character). I find this delimiter easy to use because it’s so unlikely to appear within a value in the import data. But using comma or tab characters to separate the values will work as well. Also notice that there is an id column here with no values in it. SQLite will populate that attribute for each row using auto-increment, much as it did for our earlier
The Import Wizard should appear, which you can use to define your import.
- Using the Select File button, browse for the contributors.txt file you just saved.
- Check the “First row contains column names” check box.
- Select Pipe(|) for “Fields separated by”
For everything else, you can keep the default selections.
Click OK, and you’ll get a couple of confirmation pop-ups, one asking if you’re sure you want to import, and another asking if you want to import
100 rows. Click OK for both.
And now you have 103 rows of data to play with. The FEC data is dirty: there are missing fields, first names include middle names, there are weird values for some columns. Play around with it using the SQL you know, and see what you can find out. If you get surprising results from a query or are wondering how to do something, add a comment to the blog post.
We’ll pick up with some more advanced queries and functions in Part II.
A Gentle Introduction to SQL Using SQLite by Troy Thibodeaux is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.