# Chapter 17: Keeping Time, Scheduling Tasks, and Launching Programs
Running programs while you’re sitting at your computer is fine, but it’s also useful to have programs run without your direct supervision. Your computer’s clock can schedule programs to run code at some specified time and date or at regular intervals. For example, your program could scrape a website every hour to check for changes or do a CPU-intensive task at 4 AM while you sleep. Python’s time and datetime modules provide these functions.

You can also write programs that launch other programs on a schedule by using the subprocess and threading modules. Often, the fastest way to program is to take advantage of applications that other people have already written.

### The time Module

Your computer’s system clock is set to a specific date, time, and time zone. The built-in time module allows your Python programs to read the system clock for the current time. The time.time() and time.sleep() functions are the most useful in the time module.

### The time.time() Function

The Unix epoch is a time reference commonly used in programming: 12 AM on January 1, 1970, Coordinated Universal Time (UTC). The time.time() function returns the number of seconds since that moment as a float value. (Recall that a float is just a number with a decimal point.) This number is called an epoch timestamp. For example, enter the following into the interactive shell:

In [1]:
import time
epochTimeStamp = time.time()
epochTimeStamp

1737413966.3996685

The following code is not in the book, but I wanted to try to convert back to calendar time using the datetime library

In [3]:
import datetime

# Example Unix timestamp
epochTimeStamp = 1543813875.3518236

# Convert to datetime object
datetime_obj = datetime.datetime.fromtimestamp(epochTimeStamp)

# Print the formatted date and time
print(datetime_obj.strftime('%Y-%m-%d %H:%M:%S'))


2018-12-03 00:11:15


Here I’m calling time.time() on December 2, 2018, at 9:11 PM Pacific Standard Time. The return value is how many seconds have passed between the Unix epoch and the moment time.time() was called.

Epoch timestamps can be used to profile code, that is, measure how long a piece of code takes to run. If you call time.time() at the beginning of the code block you want to measure and again at the end, you can subtract the first timestamp from the second to find the elapsed time between those two calls. For example, open a new file editor tab and enter the following program:

In [40]:
import time, sys

# By default the intefer str conversion limit seems to be limited to 4300 digits. Use the following to change the limit
sys.set_int_max_str_digits(1000000)


def calcProd(num):
    # Look through num integers and return their product
    product = 1
    for i in range (1, num):
        product = product * i

    return product

startTime = time.time()
prod = calcProd(100000)
endTime = time.time()
print('The result is %s digits long.' % (len(str(prod))))
print('Took %s seconds to calculate.' % (endTime - startTime))

The result is 456569 digits long.
Took 2.4745500087738037 seconds to calculate.


NOTE

Another way to profile your code is to use the cProfile.run() function, which provides a much more informative level of detail than the simple time.time() technique. The cProfile.run() function is explained at https://docs.python.org/3/library/profile.html.


The return value from time.time() is useful, but not human-readable. The time.ctime() function returns a string description of the current time. You can also optionally pass the number of seconds since the Unix epoch, as returned by time.time(), to get a string value of that time. Enter the following into the interactive shell:

In [41]:
import time
time.ctime()

'Mon Jan 20 18:29:15 2025'

In [42]:
import time
thisMoment = time.time()
time.ctime(thisMoment)

'Mon Jan 20 18:29:47 2025'

### The time.sleep() Function

If you need to pause your program for a while, call the time.sleep() function and pass it the number of seconds you want your program to stay paused. Enter the following into the interactive shell:

In [46]:
import time
for i in range(3):
    print('Tick')
    time.sleep(1)
    print('Tock')
    time.sleep(1)
time.sleep(5) # blocks for 5 seconds until the program releases the console.    

Tick
Tock
Tick
Tock
Tick
Tock


## Rounding Numbers

When working with times, you’ll often encounter float values with many digits after the decimal. To make these values easier to work with, you can shorten them with Python’s built-in round() function, which rounds a float to the precision you specify. Just pass in the number you want to round, plus an optional second argument representing how many digits after the decimal point you want to round it to. If you omit the second argument, round() rounds your number to the nearest whole integer. Enter the following into the interactive shell:

In [47]:
import time
now = time.time()
now

1737416039.5278842

In [48]:
round(now, 2)

1737416039.53

In [49]:
round(now, 4)

1737416039.5279

In [50]:
round(now) # rounds to the nearest integer

1737416040

## Project: Super Stopwatch

Say you want to track how much time you spend on boring tasks you haven’t automated yet. You don’t have a physical stopwatch, and it’s surprisingly difficult to find a free stopwatch app for your laptop or smartphone that isn’t covered in ads and doesn’t send a copy of your browser history to marketers. (It says it can do this in the license agreement you agreed to. You did read the license agreement, didn’t you?) You can write a simple stopwatch program yourself in Python.

At a high level, here’s what your program will do:

    Track the amount of time elapsed between presses of the ENTER key, with each key press starting a new “lap” on the timer.
    Print the lap number, total time, and lap time.

This means your code will need to do the following:

    Find the current time by calling time.time() and store it as a timestamp at the start of the program, as well as at the start of each lap.
    Keep a lap counter and increment it every time the user presses ENTER.
    Calculate the elapsed time by subtracting timestamps.
    Handle the KeyboardInterrupt exception so the user can press CTRL-C to quit.

Open a new file editor tab and save it as stopwatch.py.

### Step 1: Set Up the Program to Track Times

The stopwatch program will need to use the current time, so you’ll want to import the time module. Your program should also print some brief instructions to the user before calling input(), so the timer can begin after the user presses ENTER. Then the code will start tracking lap times.

Enter the following code into the file editor, writing a TODO comment as a placeholder for the rest of the code:

In [None]:
#! python3
# stopwatch.py - A simple stopwatch program.

import time

# Display the program's instructions.
print('Press ENTER to begin. Afterward, press ENTER to "click" the stopwatch. Press Ctrl-C to quit.')
input()                    # press Enter to begin
print('Started.')
startTime = time.time()    # get the first lap's start time
lastTime = startTime
lapNum = 1

# TODO: Start tracking the lap times.

### Step 2: Track and Print Lap Times

Now let’s write the code to start each new lap, calculate how long the previous lap took, and calculate the total time elapsed since starting the stopwatch. We’ll display the lap time and total time and increase the lap count for each new lap. Add the following code to your program:

In [1]:
#! python3
# stopwatch.py - A simple stopwatch program.

import time

# Display the program's instructions.
print('Press ENTER to begin. Afterward, press ENTER to "click" the stopwatch. Press Ctrl-C to quit.')
input()                    # press Enter to begin
print('Started.')
startTime = time.time()    # get the first lap's start time
lastTime = startTime
lapNum = 1

# Start tracking the lap times.
try:
    while True: # create an infinite loop that calls input and waits for user to press enter to end lap
        input()
        lapTime = round(time.time() - lastTime, 2)
        totalTime = round(time.time() - startTime, 2)
        print('Lap #%s: Total Time: %s, Lap Tme: %s\n' % (lapNum, totalTime, lapTime), end='')
        lapNum += 1
        lastTime = time.time() # reset the last lap time
except KeyboardInterrupt:
    # Handle the Ctrl-C exception to keep its error messafe from displaying.
    print('\n Done.')

Press ENTER to begin. Afterward, press ENTER to "click" the stopwatch. Press Ctrl-C to quit.
Started.
Lap #1: Total Time: 2.93, Lap Tme: 2.93
Lap #2: Total Time: 4.54, Lap Tme: 1.61
Lap #3: Total Time: 8.11, Lap Tme: 3.57
Lap #4: Total Time: 9.9, Lap Tme: 1.79
Lap #5: Total Time: 11.53, Lap Tme: 1.63
Lap #6: Total Time: 12.79, Lap Tme: 1.26
Lap #7: Total Time: 13.92, Lap Tme: 1.13


If the user presses CTRL-C to stop the stopwatch, the KeyboardInterrupt exception will be raised, and the program will crash if its execution is not a try statement. To prevent crashing, we wrap this part of the program in a try statement ➊. We’ll handle the exception in the except clause ➏, so when CTRL-C is pressed and the exception is raised, the program execution moves to the except clause to print Done, instead of the KeyboardInterrupt error message. Until this happens, the execution is inside an infinite loop ➋ that calls input() and waits until the user presses ENTER to end a lap. When a lap ends, we calculate how long the lap took by subtracting the start time of the lap, lastTime, from the current time, time.time() ➌. We calculate the total time elapsed by subtracting the overall start time of the stopwatch, startTime, from the current time ➍.

Since the results of these time calculations will have many digits after the decimal point (such as 4.766272783279419), we use the round() function to round the float value to two digits at ➌ and ➍.

At ➎, we print the lap number, total time elapsed, and the lap time. Since the user pressing ENTER for the input() call will print a newline to the screen, pass end='' to the print() function to avoid double-spacing the output. After printing the lap information, we get ready for the next lap by adding 1 to the count lapNum and setting lastTime to the current time, which is the start time of the next lap.

Ideas for Similar Programs

Time tracking opens up several possibilities for your programs. Although you can download apps to do some of these things, the benefit of writing programs yourself is that they will be free and not bloated with ads and useless features. You could write similar programs to do the following:

    Create a simple timesheet app that records when you type a person’s name and uses the current time to clock them in or out.
    Add a feature to your program to display the elapsed time since a process started, such as a download that uses the requests module. (See Chapter 12.)
    Intermittently check how long a program has been running and offer the user a chance to cancel tasks that are taking too long.


## The datetime Module

The time module is useful for getting a Unix epoch timestamp to work with. But if you want to display a date in a more convenient format, or do arithmetic with dates (for example, figuring out what date was 205 days ago or what date is 123 days from now), you should use the datetime module.

The datetime module has its own datetime data type. datetime values represent a specific moment in time. Enter the following into the interactive shell:

In [1]:
import datetime
datetime.datetime.now() # returns a datetime object for a current date and time

datetime.datetime(2025, 2, 1, 10, 23, 3, 190649)

In [2]:
dt = datetime.datetime(2019, 10, 21, 16, 29, 0) # create a datetime object for specified time
dt.year, dt.month, dt.day

(2019, 10, 21)

In [3]:
dt.hour, dt.minute, dt.second

(16, 29, 0)

Calling datetime.datetime.now() ➊ returns a datetime object ➋ for the current date and time, according to your computer’s clock. This object includes the year, month, day, hour, minute, second, and microsecond of the current moment. You can also retrieve a datetime object for a specific moment by using the datetime.datetime() function ➌, passing it integers representing the year, month, day, hour, and second of the moment you want. These integers will be stored in the datetime object’s year, month, day ➍, hour, minute, and second ➎ attributes.

A Unix epoch timestamp can be converted to a datetime object with the datetime.datetime.fromtimestamp() function. The date and time of the datetime object will be converted for the local time zone. Enter the following into the interactive shell:

In [4]:
import datetime, time
datetime.datetime.fromtimestamp(1000000) # 1 million seconds after the Unix epoch

datetime.datetime(1970, 1, 12, 8, 46, 40)

In [5]:
datetime.datetime.fromtimestamp(time.time()) # pass the current Unix time to datetime

datetime.datetime(2025, 2, 1, 10, 36, 24, 470117)

Calling datetime.datetime.fromtimestamp() and passing it 1000000 returns a datetime object for the moment 1,000,000 seconds after the Unix epoch. Passing time.time(), the Unix epoch timestamp for the current moment, returns a datetime object for the current moment. So the expressions datetime.datetime.now() and datetime.datetime.fromtimestamp(time.time()) do the same thing; they both give you a datetime object for the present moment.

You can compare datetime objects with each other using comparison operators to find out which one precedes the other. The later datetime object is the “greater” value. Enter the following into the interactive shell:

In [6]:
halloween2019 = datetime.datetime(2019, 10, 31, 0, 0, 0)
newyears2020 = datetime.datetime(2020, 1, 1, 0, 0, 0)
oct31_2019 = datetime.datetime(2019, 10, 31, 0, 0, 0)
halloween2019 == oct31_2019

True

In [7]:
halloween2019 > newyears2020

False

In [8]:
newyears2020 > halloween2019

True

In [9]:
newyears2020 != halloween2019

True

Make a datetime object for the first moment (midnight) of October 31, 2019, and store it in halloween2019 ➊. Make a datetime object for the first moment of January 1, 2020, and store it in newyears2020 ➋. Then make another object for midnight on October 31, 2019, and store it in oct31_2019. Comparing halloween2019 and oct31_2019 shows that they’re equal ➌. Comparing newyears2020 and halloween2019 shows that newyears2020 is greater (later) than halloween2019 ➍ ➎.

### The timedelta Data Type

The datetime module also provides a timedelta data type, which represents a duration of time rather than a moment in time. Enter the following into the interactive shell:

In [10]:
delta = datetime.timedelta(days=11, hours=10, minutes=9, seconds=8) # create a timedelta object
# will return 11 days and the number of seconds that equal 10 hours, 9 min and 8 secodns
delta.days, delta.seconds, delta.microseconds

(11, 36548, 0)

In [12]:
delta.total_seconds() # total time delta expresed as seconds

986948.0

In [13]:
str(delta)

'11 days, 10:09:08'

To create a timedelta object, use the datetime.timedelta() function. The datetime.timedelta() function takes keyword arguments weeks, days, hours, minutes, seconds, milliseconds, and microseconds. There is no month or year keyword argument, because “a month” or “a year” is a variable amount of time depending on the particular month or year. A timedelta object has the total duration represented in days, seconds, and microseconds. These numbers are stored in the days, seconds, and microseconds attributes, respectively. The total_seconds() method will return the duration in number of seconds alone. Passing a timedelta object to str() will return a nicely formatted, human-readable string representation of the object.

In this example, we pass keyword arguments to datetime.delta() to specify a duration of 11 days, 10 hours, 9 minutes, and 8 seconds, and store the returned timedelta object in delta ➊. This timedelta object’s days attributes stores 11, and its seconds attribute stores 36548 (10 hours, 9 minutes, and 8 seconds, expressed in seconds) ➋. Calling total_seconds() tells us that 11 days, 10 hours, 9 minutes, and 8 seconds is 986,948 seconds. Finally, passing the timedelta object to str() returns a string that plainly describes the duration.

The arithmetic operators can be used to perform date arithmetic on datetime values. For example, to calculate the date 1,000 days from now, enter the following into the interactive shell:

In [14]:
dt = datetime.datetime.now()
dt

datetime.datetime(2025, 2, 1, 10, 55, 30, 497389)

In [15]:
thousandDays = datetime.timedelta(days=1000)
dt + thousandDays

datetime.datetime(2027, 10, 29, 10, 55, 30, 497389)

First, make a datetime object for the current moment and store it in dt. Then make a timedelta object for a duration of 1,000 days and store it in thousandDays. Add dt and thousandDays together to get a datetime object for the date 1,000 days from now. Python will do the date arithmetic to figure out that 1,000 days after December 2, 2018, will be August 18, 2021. This is useful because when you calculate 1,000 days from a given date, you have to remember how many days are in each month and factor in leap years and other tricky details. The datetime module handles all of this for you.

timedelta objects can be added or subtracted with datetime objects or other timedelta objects using the + and - operators. A timedelta object can be multiplied or divided by integer or float values with the * and / operators. Enter the following into the interactive shell:

In [17]:
oct21st = datetime.datetime(2019, 10, 21, 16, 29, 0)
aboutThirtyYears = datetime.timedelta(days=365 * 30)
oct21st

datetime.datetime(2019, 10, 21, 16, 29)

In [18]:
oct21st - aboutThirtyYears

datetime.datetime(1989, 10, 28, 16, 29)

In [19]:
oct21st - (2 * aboutThirtyYears)

datetime.datetime(1959, 11, 5, 16, 29)

Here we make a datetime object for October 21, 2019, ➊ and a timedelta object for a duration of about 30 years (we’re assuming 365 days for each of those years) ➋. Subtracting aboutThirtyYears from oct21st gives us a datetime object for the date 30 years before October 21, 2019. Subtracting 2 * aboutThirtyYears from oct21st returns a datetime object for the date 60 years before October 21, 2019.

### Pausing Until a Specific Date

The time.sleep() method lets you pause a program for a certain number of seconds. By using a while loop, you can pause your programs until a specific date. For example, the following code will continue to loop until Halloween 2016:

In [20]:
import datetime
import time
haloween2016 = datetime.datetime(2016, 10, 31, 0, 0, 0)
while datetime.datetime.now() < haloween2016:
    time.sleep(1)

The time.sleep(1) call will pause your Python program so that the computer doesn’t waste CPU processing cycles simply checking the time over and over. Rather, the while loop will just check the condition once per second and continue with the rest of the program after Halloween 2016 (or whenever you program it to stop).

### Converting datetime Objects into Strings

Epoch timestamps and datetime objects aren’t very friendly to the human eye. Use the strftime() method to display a datetime object as a string. (The f in the name of the strftime() function stands for format.)

The strftime() method uses directives similar to Python’s string formatting. Table 17-1 has a full list of strftime() directives.

strftime() directive
	

Meaning

%Y
	

Year with century, as in '2014'

%y
	

Year without century, '00' to '99' (1970 to 2069)

%m
	

Month as a decimal number, '01' to '12'

%B
	

Full month name, as in 'November'

%b
	

Abbreviated month name, as in 'Nov'

%d
	

Day of the month, '01' to '31'

%j
	

Day of the year, '001' to '366'

%w
	

Day of the week, '0' (Sunday) to '6' (Saturday)

%A
	

Full weekday name, as in 'Monday'

%a
	

Abbreviated weekday name, as in 'Mon'

%H
	

Hour (24-hour clock), '00' to '23'

%I
	

Hour (12-hour clock), '01' to '12'

%M
	

Minute, '00' to '59'

%S
	

Second, '00' to '59'

%p
	

'AM' or 'PM'

%%
	

Literal '%' character

Pass strftime() a custom format string containing formatting directives (along with any desired slashes, colons, and so on), and strftime() will return the datetime object’s information as a formatted string. Enter the following into the interactive shell:

In [22]:
oct21st = datetime.datetime(2019, 10, 21, 16, 29, 0)
oct21st.strftime('%Y/%m/%d %H:%M:%S')

'2019/10/21 16:29:00'

In [23]:
oct21st.strftime('%I:%M %p')

'04:29 PM'

In [24]:
oct21st.strftime('%B of %y')

'October of 19'

Here we have a datetime object for October 21, 2019, at 4:29 PM, stored in oct21st. Passing strftime() the custom format string '%Y/%m/%d %H:%M:%S' returns a string containing 2019, 10, and 21 separated by slashes and 16, 29, and 00 separated by colons. Passing '%I:%M% p' returns '04:29 PM', and passing "%B of '%y" returns "October of '19". Note that strftime() doesn’t begin with datetime.datetime.

### Converting Strings into datetime Objects

If you have a string of date information, such as '2019/10/21 16:29:00' or 'October 21, 2019', and need to convert it to a datetime object, use the datetime.datetime.strptime() function. The strptime() function is the inverse of the strftime() method. A custom format string using the same directives as strftime() must be passed so that strptime() knows how to parse and understand the string. (The p in the name of the strptime() function stands for parse.)

Enter the following into the interactive shell:

In [25]:
datetime.datetime.strptime('October 21, 2019', '%B %d, %Y')

datetime.datetime(2019, 10, 21, 0, 0)

In [26]:
datetime.datetime.strptime('2019/10/21 16:29:00', '%Y/%m/%d %H:%M:%S')

datetime.datetime(2019, 10, 21, 16, 29)

In [27]:
datetime.datetime.strptime("October of '19", "%B of '%y")

datetime.datetime(2019, 10, 1, 0, 0)

In [28]:
datetime.datetime.strptime("November of '63", "%B of '%y")

datetime.datetime(2063, 11, 1, 0, 0)

To get a datetime object from the string 'October 21, 2019', pass that string as the first argument to strptime() and the custom format string that corresponds to 'October 21, 2019' as the second argument ➊. The string with the date information must match the custom format string exactly, or Python will raise a ValueError exception.

## Multithreading
To introduce the concept of multithreading, let’s look at an example situation. Say you want to schedule some code to run after a delay or at a specific time. You could add code like the following at the start of your program:

import time, datetime

startTime = datetime.datetime(2029, 10, 31, 0, 0, 0)
while datetime.datetime.now() < startTime:
    time.sleep(1)

print('Program now starting on Halloween 2029')

This code designates a start time of October 31, 2029, and keeps calling time.sleep(1) until the start time arrives. Your program cannot do anything while waiting for the loop of time.sleep() calls to finish; it just sits around until Halloween 2029. This is because Python programs by default have a single thread of execution.

To understand what a thread of execution is, remember the Chapter 2 discussion of flow control, when you imagined the execution of a program as placing your finger on a line of code in your program and moving to the next line or wherever it was sent by a flow control statement. A single-threaded program has only one finger. But a multithreaded program has multiple fingers. Each finger still moves to the next line of code as defined by the flow control statements, but the fingers can be at different places in the program, executing different lines of code at the same time. (All of the programs in this book so far have been single threaded.)

Rather than having all of your code wait until the time.sleep() function finishes, you can execute the delayed or scheduled code in a separate thread using Python’s threading module. The separate thread will pause for the time.sleep calls. Meanwhile, your program can do other work in the original thread.


To make a separate thread, you first need to make a Thread object by calling the threading.Thread() function. Enter the following code in a new file and save it as threadDemo.py:

In [1]:
import threading, time
print('Start of program. ')
def takeANap(): # define a function that you want to use in the new thread
    time.sleep(5)
    print('Wake up!')

threadObj = threading.Thread(target=takeANap) # create a thread object and pass it a keyword argument target=takeANap
threadObj.start() # create the new thread and start executing the target function
print('End of program.')

Start of program. 
End of program.


Wake up!


At ➊, we define a function that we want to use in a new thread. To create a Thread object, we call threading.Thread() and pass it the keyword argument target=takeANap ➋. This means the function we want to call in the new thread is takeANap(). Notice that the keyword argument is target=takeANap, not target=takeANap(). This is because you want to pass the takeANap() function itself as the argument, not call takeANap() and pass its return value.

After we store the Thread object created by threading.Thread() in threadObj, we call threadObj.start() ➌ to create the new thread and start executing the target function in the new thread. 

This can be a bit confusing. If print('End of program.') is the last line of the program, you might think that it should be the last thing printed. The reason Wake up! comes after it is that when threadObj.start() is called, the target function for threadObj is run in a new thread of execution. Think of it as a second finger appearing at the start of the takeANap() function. The main thread continues to print('End of program.'). Meanwhile, the new thread that has been executing the time.sleep(5) call, pauses for 5 seconds. After it wakes from its 5-second nap, it prints 'Wake up!' and then returns from the takeANap() function. Chronologically, 'Wake up!' is the last thing printed by the program.

Normally a program terminates when the last line of code in the file has run (or the sys.exit() function is called). But threadDemo.py has two threads. The first is the original thread that began at the start of the program and ends after print('End of program.'). The second thread is created when threadObj.start() is called, begins at the start of the takeANap() function, and ends after takeANap() returns.

A Python program will not terminate until all its threads have terminated. When you ran threadDemo.py, even though the original thread had terminated, the second thread was still executing the time.sleep(5) call.

### Passing Arguments to the Thread’s Target Function

If the target function you want to run in the new thread takes arguments, you can pass the target function’s arguments to threading.Thread(). For example, say you wanted to run this print() call in its own thread:

>>> print('Cats', 'Dogs', 'Frogs', sep=' & ')
Cats & Dogs & Frogs

This print() call has three regular arguments, 'Cats', 'Dogs', and 'Frogs', and one keyword argument, sep=' & '. The regular arguments can be passed as a list to the args keyword argument in threading.Thread(). The keyword argument can be specified as a dictionary to the kwargs keyword argument in threading.Thread().

Enter the following into the interactive shell:

In [2]:
import threading
# arguments need to get passed explicitly to the function inside of threading.Thread using keyword arguments
threadObj = threading.Thread(target=print, args=['Cats', 'Dogs', 'Frogs'],
kwargs={'sep': ' & '})
threadObj.start()

Cats & Dogs & Frogs


To make sure the arguments 'Cats', 'Dogs', and 'Frogs' get passed to print() in the new thread, we pass args=['Cats', 'Dogs', 'Frogs'] to threading.Thread(). To make sure the keyword argument sep=' & ' gets passed to print() in the new thread, we pass kwargs={'sep': '& '} to threading.Thread().

The threadObj.start() call will create a new thread to call the print() function, and it will pass 'Cats', 'Dogs', and 'Frogs' as arguments and ' & ' for the sep keyword argument.

This is an incorrect way to create the new thread that calls print():

threadObj = threading.Thread(target=print('Cats', 'Dogs', 'Frogs', sep=' & '))

What this ends up doing is calling the print() function and passing its return value (print()’s return value is always None) as the target keyword argument. It doesn’t pass the print() function itself. When passing arguments to a function in a new thread, use the threading.Thread() function’s args and kwargs keyword arguments.

### Concurrency Issues

You can easily create several new threads and have them all running at the same time. But multiple threads can also cause problems called concurrency issues. These issues happen when threads read and write variables at the same time, causing the threads to trip over each other. Concurrency issues can be hard to reproduce consistently, making them hard to debug.

Multithreaded programming is its own wide subject and beyond the scope of this book. What you have to keep in mind is this: to avoid concurrency issues, never let multiple threads read or write the same variables. When you create a new Thread object, make sure its target function uses only local variables in that function. This will avoid hard-to-debug concurrency issues in your programs.

NOTE

A beginner’s tutorial on multithreaded programming is available at https://nostarch.com/automatestuff2/.


## Project: Multithreaded XKCD Downloader

In Chapter 12, you wrote a program that downloaded all of the XKCD comic strips from the XKCD website. This was a single-threaded program: it downloaded one comic at a time. Much of the program’s running time was spent establishing the network connection to begin the download and writing the downloaded images to the hard drive. If you have a broadband internet connection, your single-threaded program wasn’t fully utilizing the available bandwidth.

A multithreaded program that has some threads downloading comics while others are establishing connections and writing the comic image files to disk uses your internet connection more efficiently and downloads the collection of comics more quickly. Open a new file editor tab and save it as threadedDownloadXkcd.py. You will modify this program to add multithreading. The completely modified source code is available to download from https://nostarch.com/automatestuff2/.

### Step 1: Modify the Program to Use a Function

This program will mostly be the same downloading code from Chapter 12, so I’ll skip the explanation for the requests and Beautiful Soup code. The main changes you need to make are importing the threading module and making a downloadXkcd() function, which takes starting and ending comic numbers as parameters.

For example, calling downloadXkcd(140, 280) would loop over the downloading code to download the comics at https://xkcd.com/140/, https://xkcd.com/141/, https://xkcd.com/142/, and so on, up to https://xkcd.com/279/. Each thread that you create will call downloadXkcd() and pass a different range of comics to download.

Add the following code to your threadedDownloadXkcd.py program:

In [None]:
#! python3
# threadedDownloadXkcd.py - Downloads XKCD co mincs using multiple threads.

import requests, os, bs4, threading
os.makedirs('xkcd_threaded', exist_ok=True)     # store the comics in ./xkcd_threaded
def downloadXkcd(startComic, endComic): # define the function to download the comics
    for urlNumber in range(startComic, endComic): # loop through defined range of comic numbers
        # Download the page.
        print('Downloading page https://xkcd.com/%s...' % (urlNumber))
        res = requests.get('https://xkcd.com/%s' % (urlNumber))
        res.raise_for_status()
        soup = bs4.BeautifulSoup(res.text, 'html.parser')
        # Find the URL of the comic image.
        comicElem = soup.select('#comic img')
        if comicElem == []:
            print('Could not find comic image.')
        else: 
            comicUrl = comicElem[0].get('src')
            # Download the image.
            print('Downloading image %s...' % (comicUrl))
            res = requests.get('https:' + comicUrl)
            res.raise_for_status()
            # Save the image to ./xkcd.
            imageFile = open(os.path.join('xkcd', os.path.basename(comicUrl)), 'wb')
            for chunk in res.iter_content(100000):
                imageFile.write(chunk)
            imageFile.close()
# TODO: Create and start the Thread objects.
# TODO: Wait for all threads to end.

After importing the modules we need, we make a directory to store comics in ➊ and start defining downloadxkcd() ➋. We loop through all the numbers in the specified range ➌ and download each page ➍. We use Beautiful Soup to look through the HTML of each page ➎ and find the comic image ➏. If no comic image is found on a page, we print a message. Otherwise, we get the URL of the image ➐ and download the image ➑. Finally, we save the image to the directory we created.

## Step 2: Create and Start Threads

Now that we’ve defined downloadXkcd(), we’ll create the multiple threads that each call downloadXkcd() to download different ranges of comics from the XKCD website. Add the following code to threadedDownloadXkcd.py after the downloadXkcd() function definition:

In [None]:
#! python3
# threadedDownloadXkcd.py - Downloads XKCD co mincs using multiple threads.

import requests, os, bs4, threading
os.makedirs('xkcd_threaded', exist_ok=True)     # store the comics in ./xkcd_threaded
def downloadXkcd(startComic, endComic): # define the function to download the comics
    for urlNumber in range(startComic, endComic): # loop through defined range of comic numbers
        # Download the page.
        print('Downloading page https://xkcd.com/%s...' % (urlNumber))
        res = requests.get('https://xkcd.com/%s' % (urlNumber))
        res.raise_for_status()
        soup = bs4.BeautifulSoup(res.text, 'html.parser')
        # Find the URL of the comic image.
        comicElem = soup.select('#comic img')
        if comicElem == []:
            print('Could not find comic image.')
        else: 
            comicUrl = comicElem[0].get('src')
            # Download the image.
            print('Downloading image %s...' % (comicUrl))
            res = requests.get('https:' + comicUrl)
            res.raise_for_status()
            # Save the image to ./xkcd.
            imageFile = open(os.path.join('xkcd', os.path.basename(comicUrl)), 'wb')
            for chunk in res.iter_content(100000):
                imageFile.write(chunk)
            imageFile.close()
# Create and start the Thread objects.
downloadThreads = []             # a list to keep trrack of all the Thread objects
for i in range(0, 140, 10):    # loops 14 times, creates 14 threads
    start = i
    end = i + 9
    if start == 0:
        start = 1 # there is no comic 0, so set it to 1
    downloadThread = threading.Thread(target=downloadXkcd, args=(start, end))
    downloadThreads.append(downloadThreads)
    downloadThread.start()
    


# TODO: Wait for all threads to end.

First we make an empy list downloadThreads; the list will help us keep track of the many Thread objects we’ll create. Then we start our for loop. Each time through the loop, we create a Thread object with threading.Thread(), append the Thread object to the list, and call start() to start running downloadXkcd() in the new thread. Since the for loop sets the i variable from 0 to 140 at steps of 10, i will be set to 0 on the first iteration, 10 on the second iteration, 20 on the third, and so on. Since we pass args=(start, end) to threading.Thread(), the two arguments passed to downloadXkcd() will be 1 and 9 on the first iteration, 10 and 19 on the second iteration, 20 and 29 on the third, and so on.

As the Thread object’s start() method is called and the new thread begins to run the code inside downloadXkcd(), the main thread will continue to the next iteration of the for loop and create the next thread.

### Step 3: Wait for All Threads to End

The main thread moves on as normal while the other threads we create download comics. But say there’s some code you don’t want to run in the main thread until all the threads have completed. Calling a Thread object’s join() method will block until that thread has finished. By using a for loop to iterate over all the Thread objects in the downloadThreads list, the main thread can call the join() method on each of the other threads. Add the following to the bottom of your program:

In [None]:
#! python3
# multidownloadXkcd.py - Downloads XKCD comics using multiple threads.

import requests, os, bs4, threading
os.makedirs('xkcd_threaded', exist_ok=True)    # store comics in ./xkcd_threaded

def downloadXkcd(startComic, endComic):
    for urlNumber in range(startComic, endComic):
        # Download the page.
        print('Downloading page https://xkcd.com/%s...' % (urlNumber))
        res = requests.get('https://xkcd.com/%s' % (urlNumber))
        res.raise_for_status()

        soup = bs4.BeautifulSoup(res.text, 'html.parser')

        # Find the URL of the comic image.
        comicElem = soup.select('#comic img')
        if comicElem == []:
            print('Could not find comic image.')
        else:
            comicUrl = comicElem[0].get('src')
            # Download the image.
            print('Downloading image %s...' % (comicUrl))
            res = requests.get('https:' + comicUrl)
            res.raise_for_status()

            # Save the image to ./xkcd.
            imageFile = open(os.path.join('xkcd_threaded', os.path.basename(comicUrl)), 'wb')
            for chunk in res.iter_content(100000):
                imageFile.write(chunk)
            imageFile.close()

# Create and start the Thread objects.
downloadThreads = []             # a list of all the Thread objects
for i in range(0, 140, 10):    # loops 14 times, creates 14 threads
    start = i
    end = i + 9
    if start == 0:
        start = 1 # There is no comic 0, so set it to 1.
    downloadThread = threading.Thread(target=downloadXkcd, args=(start, end))
    downloadThreads.append(downloadThread)
    downloadThread.start()

# Wait for all threads to end.
for downloadThread in downloadThreads:
    downloadThread.join()
print('Done.')

Downloading page https://xkcd.com/1...Downloading page https://xkcd.com/10...

Downloading page https://xkcd.com/20...
Downloading page https://xkcd.com/30...
Downloading page https://xkcd.com/40...
Downloading page https://xkcd.com/50...
Downloading page https://xkcd.com/60...
Downloading page https://xkcd.com/70...
Downloading page https://xkcd.com/80...
Downloading page https://xkcd.com/90...
Downloading page https://xkcd.com/100...
Downloading page https://xkcd.com/110...
Downloading page https://xkcd.com/120...
Downloading page https://xkcd.com/130...
Downloading image //imgs.xkcd.com/comics/penny_arcade.jpg...
Downloading image //imgs.xkcd.com/comics/super_bowl.jpg...
Downloading image //imgs.xkcd.com/comics/family_circus.jpg...
Downloading image //imgs.xkcd.com/comics/julia_stiles.jpg...
Downloading image //imgs.xkcd.com/comics/clark_gable.jpg...
Downloading image //imgs.xkcd.com/comics/other_car.jpg...
Downloading image //imgs.xkcd.com/comics/barrel_cropped_(1).jpg...
Downloadi

Debugging and adding exception handling

In [12]:
#! python3
# multidownloadXkcd.py - Downloads XKCD comics using multiple threads.

import requests, os, bs4, threading

os.makedirs('xkcd_threaded', exist_ok=True)    # store comics in ./xkcd_threaded

def downloadXkcd(startComic, endComic):
    for urlNumber in range(startComic, endComic):
        try:
            # Download the page.
            print('Downloading page https://xkcd.com/%s...' % (urlNumber))
            res = requests.get('https://xkcd.com/%s' % (urlNumber))
            res.raise_for_status()

            soup = bs4.BeautifulSoup(res.text, 'html.parser')

            # Find the URL of the comic image.
            comicElem = soup.select('#comic img')
            if comicElem == []:
                print('Could not find comic image.')
            else:
                comicUrl = 'https:' + comicElem[0].get('src')
                # Download the image.
                print('Downloading image %s...' % (comicUrl))
                res = requests.get(comicUrl)
                res.raise_for_status()

                # Save the image to ./xkcd_threaded.
                imageFile = open(os.path.join('xkcd_threaded', os.path.basename(comicUrl)), 'wb')
                for chunk in res.iter_content(100000):
                    imageFile.write(chunk)
                imageFile.close()
        except Exception as e:
            print('There was a problem: %s' % (e))

# Create and start the Thread objects.
downloadThreads = []             # a list of all the Thread objects
for i in range(0, 3050, 10):    # loops 14 times, creates 14 threads
    start = i
    end = i + 10
    if start == 0:
        start = 1 # There is no comic 0, so set it to 1.
    downloadThread = threading.Thread(target=downloadXkcd, args=(start, end))
    downloadThreads.append(downloadThread)
    downloadThread.start()

# Wait for all threads to end.
for downloadThread in downloadThreads:
    downloadThread.join()
print('Done.')


Downloading page https://xkcd.com/1...
Downloading page https://xkcd.com/10...
Downloading page https://xkcd.com/20...
Downloading page https://xkcd.com/30...
Downloading page https://xkcd.com/40...
Downloading page https://xkcd.com/50...
Downloading page https://xkcd.com/60...
Downloading page https://xkcd.com/70...
Downloading page https://xkcd.com/80...
Downloading page https://xkcd.com/90...
Downloading page https://xkcd.com/100...
Downloading page https://xkcd.com/110...
Downloading page https://xkcd.com/120...
Downloading page https://xkcd.com/130...
Downloading page https://xkcd.com/140...
Downloading page https://xkcd.com/150...
Downloading page https://xkcd.com/160...
Downloading page https://xkcd.com/170...
Downloading page https://xkcd.com/180...
Downloading page https://xkcd.com/190...
Downloading page https://xkcd.com/200...
Downloading page https://xkcd.com/210...
Downloading page https://xkcd.com/220...
Downloading page https://xkcd.com/230...
Downloading page https://xk

Making an explicit way to tune the number of threads based on number of comics

In [16]:
#! python3
# multidownloadXkcd.py - Downloads XKCD comics using multiple threads.

import requests, os, bs4, threading

os.makedirs('xkcd_threaded', exist_ok=True)    # store comics in ./xkcd_threaded

def downloadXkcd(startComic, endComic):
    for urlNumber in range(startComic, endComic):
        try:
            # Download the page.
            print('Downloading page https://xkcd.com/%s...' % (urlNumber))
            res = requests.get('https://xkcd.com/%s' % (urlNumber))
            res.raise_for_status()

            soup = bs4.BeautifulSoup(res.text, 'html.parser')

            # Find the URL of the comic image.
            comicElem = soup.select('#comic img')
            if comicElem == []:
                print('Could not find comic image.')
            else:
                comicUrl = 'https:' + comicElem[0].get('src')
                # Download the image.
                print('Downloading image %s...' % (comicUrl))
                res = requests.get(comicUrl)
                res.raise_for_status()

                # Save the image to ./xkcd_threaded.
                imageFile = open(os.path.join('xkcd_threaded', os.path.basename(comicUrl)), 'wb')
                for chunk in res.iter_content(100000):
                    imageFile.write(chunk)
                imageFile.close()
        except Exception as e:
            print('There was a problem: %s' % (e))

# Total number of comics to download
totalComics = 3050

# Number of threads to use
numThreads = 10

# Comics per thread
comicsPerThread = totalComics // numThreads

# Create and start the Thread objects.
downloadThreads = []             # a list of all the Thread objects

for i in range(numThreads):
    start = i * comicsPerThread + 1
    end = (i + 1) * comicsPerThread + 1
    downloadThread = threading.Thread(target=downloadXkcd, args=(start, end))
    downloadThreads.append(downloadThread)
    downloadThread.start()

# Wait for all threads to end.
for downloadThread in downloadThreads:
    downloadThread.join()
print('Done.')


Downloading page https://xkcd.com/1...Downloading page https://xkcd.com/306...

Downloading page https://xkcd.com/611...
Downloading page https://xkcd.com/916...
Downloading page https://xkcd.com/1221...
Downloading page https://xkcd.com/1526...
Downloading page https://xkcd.com/1831...
Downloading page https://xkcd.com/2136...
Downloading page https://xkcd.com/2441...
Downloading page https://xkcd.com/2746...
Downloading image https://imgs.xkcd.com/comics/orphaned_projects.png...
Downloading image https://imgs.xkcd.com/comics/here_to_help.png...
Downloading image https://imgs.xkcd.com/comics/placebo_blocker.png...
Downloading image https://imgs.xkcd.com/comics/launch_window.png...
Downloading page https://xkcd.com/1832...
Downloading page https://xkcd.com/307...
Downloading page https://xkcd.com/2747...
Downloading image https://imgs.xkcd.com/comics/photo_library_management.png...
Downloading image https://imgs.xkcd.com/comics/disaster_voyeurism.png...
Downloading image https://imgs.x