Skip to content
Trevor Prinn edited this page Nov 30, 2016 · 18 revisions

This is a small program to help with the rather tedious task of getting pages from pdfs of the real books that you have bought into the MobileSheets Android app.

It combines two functions:

  1. Extracting the pages from the pdf as picture files.
  2. Displaying each song in turn and allowing the Artist, Title and set of pages making up the song to be entered, then moving them into a folder structure suitable for Batch Import into the MobileSheets Companion app.

An installer for the current release is available at
http://tprinn.co.uk/RealBookExtractor/Real%20Book%20Extractor-1.4.0.msi

Contents

  1. Extracting Pages
  2. Identifying Songs
    1. Undo
    2. Unwanted Pages
    3. The Artist Combo Box
    4. Duplicating Pages
    5. Negatives
    6. Filenames
    7. Taking a Break
  3. Importing into MobileSheets 2. Options 3. Metadata Settings

Extracting pages

To extract the pages from a pdf, go to File/Extract Images... and select the pdf in the dialogue. When you press OK the images on all the pages will be extracted, in order, into a new folder on the desktop with the same name as the pdf (that folder must not already exist). Once it has finished, you could move the folder somewhere else if you want, but I find it's easiest to just leave it there until you have finished sorting out the songs.

When the extract starts an Explorer window for the output folder will be displayed. This is really just so you can see how the extract is going.

When it finishes the extract, the program will automatically go back to the main form and load the folder and first song ready for identification.

Occasionally, there may be images in the pdf that can't be extracted. These tend to be either bad scans or photographs/covers etc. that aren't required. The program will report any errors in the list box on the progress form and skip the page or image with the problem. If there are any errors it will offer to copy all of the details of them to the clipboard at the end of the extract. You can cancel the extract at any time before it finishes.

This program uses a couple of open-source libraries to extract the images from pdfs. These haven't been written to cope with all of the possible formats that images could be stored in within pdfs, and because they are written by volunteers there's no guarantee they ever will.

If you do find that this program can't extract the images, you can use another program to handle that part of the process. Make sure you end up with a folder containing just the image files (nothing else - although it will ignore any pdfs it comes across when processing the images) and that they are in the order that you want to process them, in particular that songs that go over several pages display in order in Explorer. You can then browse to the folder using the ... button at the top of the main page, and press Load to set up the first song for identification.

MobileSheets Companion has a built in function for extracting the pages from pdfs, which you will find in its File menu as "Convert PDF to Images". This is more sophisticated than the function built into the Real Book Extractor and can definitely handle pdfs that this program can't. You don't have to connect to an Android device to use the MobileSheets Companion converter.

Identifying Songs

For each song, you need to identify 3 things:

  1. The Artist,
  2. The Title,
  3. The pages that make up the song.

To help with this, the program displays 2 pages side by side.

The left hand page is the first page of the current song (it's actually the current first image in the folder, as songs are processed, their images are moved into subfolders by artist). You would normally type in the Artist and Title from that into the boxes above.

You use the right hand page to define the end of the song. When you start to process a new song, this will show the second image in the folder. To define the end of the current song, you need to get the start of the next song displayed in this box before you press Save. If the current song is one page, then that's fine. Otherwise you need to press either the >> button or the Page Down key until the start of the next song is displayed. If you go too far, press the << button or the Page Up key to return.

Once you have entered the Artist and Title, and defined the length of the song, press the Save button. This will create a new subfolder for the artist, if there isn't one already, and move the current song's image files into it while renaming them with the title you entered. If there is more than one page, it also numbers them (the MobileSheets Batch Import will recognise them from that as one song and take the number off when it creates the title - although it will be confused by a title like '26-2'). Unless you have pressed other buttons, the Save button should be the default, and so you only need to press Return. You don't usually need to use the mouse, or press buttons at all, while identifying songs, unless there is something unusual.

Undo

Occasionally you will forget to check the right hand pane, and after the save you will find yourself looking at the middle of the last song in the left hand pane. If (or rather when) this happens, just select Edit/Undo and the last save will be reversed, allowing you to re-enter it. You can also use this if you enter any other details wrong. The program only retains the last save.

Unwanted pages

The pdf may include pages you don't want in MobileSheets, such as covers, photos and indexes. When you get to one of these, press the Delete button. The program will delete the image in the left hand pane, and move on to the next one. Be careful, because there is no Undo for this.

The Artist Combo box

The artist box is actually a modified combo box. As the subfolder each artist is added, it is also added to this box. Normally, however, you'll find don't ever need to pull the box down to select an artist without typing them in because this box will automatically select the next artist that matches the text you have started typing. for example, if you have already entered songs for John Coltrane and John McLaughlin you can just type John and then down-arrow to save yourself having to type in McLaughlin again.

If you try editing in this box, other than just typing in and using backspace, you may find it doesn't behave quite as you expect. This combo box control was originally designed (many years ago in .net 1.1) to only allow selection of entries that were in the list. I put a quick hack in for the version in this program to allow new items to be entered in it, but I haven't bothered modifying it to handle all the editing keys correctly. I haven't found this a problem in using the program.

The list of Artists in the box is just the current list of subfolders for the current book. The program doesn't keep any master list, so if you use it for another book you won't find the artists from the last one.

Duplicating pages

You may come across a few pages have more than one song on them. You could edit them using some other s/w and separate them out. If you can't be bothered with that, the Duplicate button will just make another copy of the current page. You can then enter it twice with a different artist/title.

Negatives

Occasionally, you may see a page that is a negative - white text on a black background. When pages are extracted from a pdf they sometimes don't have any palette information, and the program can get confused about which pixels are black and which are white. It tries to work it out by looking at the corners, on the assumption they will be mostly white, but sometimes this doesn't work. The Negative button will reverse the palette on the left hand page, so that negative images become positive. This button is only available if the page is Black and White (not Greyscale or Colour).

Filenames

The artist names and titles you enter are being used as names of files. This places some restrictions on the characters that can be used in them. The program will prevent you entering any characters that are invalid. Of these, the most annoying one is ?. You also can't enter / and a number of other characters.

You also can't enter different versions of the same song with the same artist and title, within the same book. I think MobileSheets could probably cope with this, but obviously this program can't. Either give them different titles, or just enter them all as one song.

Taking a break

Don't worry about shutting down the program when you have had enough for the moment. Next time you start it, it will load up the folder as before ready to carry on.

Importing into MobileSheets

Once you reach the end of the book, when the left hand box is blank and you can't enter anything in the artist and title boxes, you are ready to do a batch import into the MobileSheets Companion App. Follow the instructions for the Companion App and make sure on the Batch Import dialogue that:

Options

  1. The directory is the folder that this program created on the desktop.
  2. Scan All subdirectories is selected.
  3. Use subdirectories for metadata is selected.
  4. Scan for PDFs is not selected
    (although it shouldn't matter if it is, as there shouldn't be any).
  5. Scan for Image Files is selected.
  6. Delete Original Files After Import is not selected
    (although I haven't been able to work out what this does - selecting it does not delete anything on the PC, as far as I can see).
  7. Set Create Subdirectory Per Song, Avoid Duplicate Songs and the Import Location as you want them.

Metadata Settings

  1. Title is set to Guess Title From Filename.
  2. Source and Key are whatever you want.
  3. Artist is Unknown.
  4. Album/Book is the name of the book.
    etc... as you want.