Reading Optophone Kit
This is a repository for the Optophone Kit, part of the Maker Lab's Kits for Cultural History series. The optophone was an aid for blind readers that converted text to sound in the 20th century, beginning in the 1910s and extending until at least the 1960s. To read more about the process of remaking the optophone, see our blog posts at maker.uvic.ca.
These files are part of research conducted by Tiffany Chan, Katherine Goertz, Danielle Morgan, Victoria Murawski, and Jentery Sayers. Thanks to Robert Baker (Blind Veterans UK), Mara Mills (New York University), and Matthew Rubery (Queen Mary University of London) for their support and feedback.
These instructions detail the workflow and steps for converting an image into plaintext and then into a stream of optophonic sounds. Currently, the repo contains 3 scripts, written in the Python programming language. As the Kit develops, some of the scripts may change or be combined together (see the change log).
- OCRscript.py - takes an image from the PiCamera, runs it through OCR (Optical Character Recognition)
- toneGen.py - creates and saves optophonic sounds for later playback.
- optoscript.py - takes a string of characters as its input and plays the corresponding sounds.
To run these scripts, you will need to download and install Python. See the Python website for instructions on how to do this. Note that the optophone project uses Python version 2.7. The scripts and dependencies should be able to work with Python 3, but there may be slight differences in the dependencies and scripts.
There are also several dependencies (Python modules or packages) that must be installed before the scripts can work. These include:
- PiCamera - for taking a picture with the Raspberry Pi camera.
- OpenCV - a computer vision and image processing program. This project uses version 3.0.0. See installation instructions.
- Pillow/PIL - Python Imaging Library, also for working with images.
- Tesseract - a free OCR program.
- pyTesser - a Python wrapper for Tesseract (basically, how Python talks to Tesseract).
- pyGame - for playing sounds with Python.
For the optophone project, these were all installed on a Raspberry Pi. Except for OCRscript.py, all the scripts should work on a laptop or personal computer. You can also modify OCRscript.py (see OCRscript.py for more) to take an arbitrary image as its input instead of an image from a Raspberry Pi camera.
Set up everything you need. Set up all the hardware (the Raspberry Pi, PiCamera) and any other peripherals (e.g. monitor, mouse, keyboard) you might need. Download and install any dependencies. Download all the python scripts.
Create and save tones for the optophone to play. Currently, optoscript.py only plays the tones for lowercase a,b, and c. You can download the sound files in the tones folder to use them for playback or modify and run toneGen.py to generate different tones (note you will probably have to change the dictionary in optoscript.py to match). To make things easier, you can keep the tones in the same directory (folder) as your scripts. Otherwise, make sure the script can find the correct file path to your sound files.
Take a picture of the print material and turn it into plaintext. Position the camera to take a picture of the text. Ideally, you will want an image with bright lighting and where the text will take up as much of the image as possible (i.e. little to no background). This will make the image easier for the computer to read. OCRscript.py and Tesseract (the OCR program) will optimize the image as best as they can as well. Modify OCRscript.py as necessary (see Notes below) and run it. OCRscript.py will convert the image into a plaintext file named results.txt.
Run optoscript.py to read the results.txt file (make sure results.txt is in the same folder/directory as optoscript.py) and play the associated sounds.
The current scripts were made for testing small samples of code. To use them to express your own text as tones, you may have to modify it. For example, the dictionary of tones, as it is recorded in optopscript.py, only contains 3 entries for lowercase a, b, and c. You would have to change the dictionary to include the tones for other characters before you could play them.
Other places where code may be modified are noted in the scripts themselves.
This is version 1.0 of the repository. This version contains snippets of code, and a video and animated GIF demonstrating how the optophone may have scanned type.
This repository is licensed CC BY-NC 4.0.