A little shell script to download a pdf file from a scribd document. This script isn't perfect, but it's enough for me.
Switch branches/tags
Nothing to show
Clone or download

README.org

Scribd-downloader

/!\ /!\ /!\ IMPORTANT /!\ /!\ /!\ THIS REPOSITORY IS NOT SUPPORTED ANYMORE (AND DOES NOT WORK ANYMORE). PLEASE TRY MY NEW VERSION HERE https://github.com/tobiasBora/scribd-downloader-3. IT’S NOT PERFECT I GUESS, BUT WORKS PRETTY NICELY.

A shell script to download a pdf file from a scribd document. This script can download too big files (such as 150 pages), forbidden pages (marked with “You are not reading a free preview”)… If you have a problem with a file just tell me at tobias.bora -@- gmail -.- com.

To use it, make sure you have installed phantomjs, pdftk and ImageMagick with

sudo apt-get install phantomjs imagemagick pdftk

You should check that phantomjs version is greater than 1.6 because an important function came. (If it’s not possible to install a newer version, see below how to correct that)

After, just run :

$ ./scribd_download.sh <your url>

If you have any problem and you want to try more options, run with no option

$ ./scribd_download.sh

and you’ll get a list of all options.

For time benchmark, for big files you can count around 120 pages for 5mn of execution. If you don’t want to use pdftk (it can be longer with it) go in the script file uncomment the line

# pdf_convert_mode="convert"

like this

pdf_convert_mode="convert"

Example :

$ ./scribd_download.sh http://fr.scribd.com/doc/63942746/chopin-nocturne-n-20-partition

The Scribd structure often changes, so if you have any problem, please contact me at tobias.bora -@- gmail -.- com, or let a message in the “Issues” section.

FAQ

I can’t install a recent version of PhantomJs newer than 1.6. Can I use an older version ?

Yes you can. To do that, please edit the file scribd_dowload.sh and modify the line

zoom_precision=2

like this

zoom_precision=1

Note that you may lose some precision. It will be easier to deal with this case in a futur version.

Is it possible to see the pages created in real time when the document is long to download ?

Just open the hidden folder .tmp in the current folder. You can see every pages in png format and check that there is no problem.

To Do

  • Auto-dectect PhantomJs version and correct zoom precision
  • Deal with documents with different page size
  • More flexible command line options (resolution, pages…)
  • Deal with documents with only one big obfuscated “page”
  • Add a graphical interface
  • Port it in a multi-OS language