PDFRead
PDFRead is a tool for converting PDF and DJVU documents for reading on eBook devices. It does this by creating an image out of each page, enhancing the image and then collating the images in a device-specific format.
Features
- support for PDF and DJVU documents
- works on Windows, Linux and OS X
- supports creating images for any ebook reader device
- out of the box profiles for the 1100/1150/1200 and PRS-500.
- fast and very accurate autocropping
- image dilation for more "thicker" text enabled by default
- automatically splitting into multiple pages for landscape mode
- can generate image which will fit the screen size exactly in potrait mode
- rotation of image for devices that don't support landscape mode
- option for reducing the number of colors to reduce image size
- output formats supported: currently html, rb, lrf, imp1, and imp2.
Installation Instructions
Windows
- Download and install the PDFRead installer.
- Right-click a PDF, and choose PDFRead. Alternatively, use the start menu
(Start -> Programs -> PDFRead).
- If you choose an output eBook, then that will be the generated file name.
If you do not enter a filename, then it will be created in temp directory.
- To customize a profile, check the "customize" box and specify parameters.
- To uninstall, remove it from Add/Remove Programs.
Linux
This assumes you have Ubuntu. If you don't, then install the equivalent
packages for your distribution.
- Open up a terminal window, and install the following packages:
sudo apt-get install pdftk python-imaging pngnq gs-common djvulibre-bin libtiff-tools unpaper optipng
- Download the PDFRead source code and extract it somewhere.
- The above directory will contain a command line program, execute it via
python pdfread.py <options> pdf-file
- Read the command line options.
Mac OS X
(based on sammykrupa's instructions)
- Install the Apple developer tools.
- Install the latest python version. The Mac OS X installer application puts a big "MacPython 2.5" folder in your applications folder. Go into that folder and double-click the "Update Shell Profile.command" file.
- Install pdftk and pngnq 0.41.
- Install fink and then install the necessary packages by typing this in the Terminal:
sudo apt-get install ghostscript ghostscript-fonts libtiff-bin djvulibre optipng
- Download the latest Python Imaging Library source, extract it and install it via
sudo python setup.py install
- In case you want to convert DJVU documents, please install djvulibre from Fink or install it yourself.
- Download the PDFRead source code and extract it somewhere.
- The above directory will contain a command line program, execute it via
python pdfread.py <options> pdf-file
- Read the command line options.
Device Support
Rocket eBook / REB 1100
- Use the reb1100 profile, which will always create images in landscape mode. You may have to switch the device to landscape mode.
- If you are on Linux or OS X, you may need to compile and install rbmake.
- Typical command line usage:
pdfread -p reb1100 <pdf-file>
eBookwise 1150 / ETI-2 / REB 1150
- Use the eb1150 profile, which will always create images in landscape mode.
You will have to hold the device sideways.
Tip: you may want to enable
"reverse paging" in settings, so that you can advance to the next page
using the left button (which is normally the bottom one).
- If you are on Windows, please install eBook Publisher. Please uninstall and reinstall it if you already have it installed, as there may be problems if you have installed GEB Librarian after it.
- If you are on Linux or OS X, then an IMP file will not be created. You will
have to create it by either copying the output directory to a Windows machine
(real or virtual), and then creating an IMP with "ebook.html" in the above
directory as the source. I recommend that you run Windows in VMWare or equivalent tools.
- Typical command line usage:
pdfread -p eb1150 <pdf-file>
REB 1200 / ETI-1 / Gemstar 2150
- There are two profiles you can use: reb1200 and reb1200-p.
- The reb1200 profile rotates the PDF and creates it in landscape mode, which should
result in much better legiblity at the expense of splitting up the page.
You will have to hold the device sideways.
- The reb1200-p profile will create images in potrait mode, but this may not look great due to lower resolution of the 1200.
- If you are on Windows, please install eBook Publisher. Please uninstall and reinstall it if you already have it installed, as there may be problems if you have installed GEB Librarian after it.
- If you are on Linux or OS X, then an IMP file will not be created. You will
have to create it by either copying the output directory to a Windows machine
(real or virtual), and then creating an IMP with "ebook.html" in the above
directory as the source. I recommend that you run Windows in VMWare or equivalent tools.
- Typical command line usage:
pdfread -p reb1200 <pdf-file>
Sony Reader PRS-500 / Librie
- There are two profiles you can use:
- The prs500 profile creates a typical LRF in potrait mode.
- The prs500-l profile rotates the PDF and creates it in landscape mode, which should
result in much better legiblity at the expense of splitting up the page.
- Typical command line usage:
pdfread -p prs500 <pdf-file>
Command Line Options
Usage: pdfread [options] input-document
Options:
-h, --help show this help message and exit
-p PROFILE one of: reb1200, prs500-l, eb1150, reb1200-p, prs500,
reb1100
-o OUTPUT the output filename
-t TITLE generated ebook title (default: "Unknown")
-a AUTHOR generated ebook author (default: "Unknown")
-c CATEGORY generated ebook category (default: "General")
-f FORMAT one of: imp2, imp1, html, rb, lrf
-i FORMAT one of: pdf, imglist, djvu, tiff
-m MODE one of: portrait, landscape-half, landscape
-u ARGS command line arguments for unpaper
-d DIR the temporary directory where images are generated
--first-page=PAGE first page to convert
--last-page=PAGE last page to convert
--optimize optimize generated PNG images
--crop-percent=N% whitespace cropping percentage (default: 2.0%)
--edge-level=L edge enhancement level from 1-9 (default: 5)
--dpi=DPI the DPI at which to perform dilation (default: 300)
--colors=N downsample the output image to N grayscale colors
--mono downsample the output image to monochrome
--rotate=DIRECTION one of: none, right, left
--count=N consider that the document has N pages
--hres=HRES the maximum usable horizontal resolution
--vres=VRES the maximum usable vertical resolution
--overlap=OVERLAP screen overlap between pages (in pixels)
--no-crop disable the cropping stage
--no-dilate disable the dilation stage
--no-enhance disable the edge enhancement stage
--no-toc disable the generation of Table of Contents
--list-profiles show the various profiles and their settings
Changelog
[2007-04-27] 1.7
- add a "landscape-half" mode which splits a page into two even halves (gdxf's suggestion)
- if the output document does not have the proper file extension, then append it automatically.
- remove imagemagick and use pngnq for color reduction.
- fix the problems if the PDF has an incorrect TOC referring to an invalid page. Also added option --no-toc to disable TOC generation.
[2007-04-25] 1.6
- revamped the Windows GUI: added tooltips, preview feature and show the command line options when executed (useful for batch execution).
- add support for TIFF and a list of page images for input.
- add unpaper support for image cleanup.
- add extremely agressive whitespace detection, even in the middle of the page text.
- added an edge-enhancement filter, similiar to rbmake and RasterFarian.
- allow all processing stages to be selectively disabled.
- allow a page range to be specified for conversion.
- tweak the prs-500 profile to rotate right instead of left (thanks gdxf)
- add an optional step to optimize generated PNG images via OptiPNG.
- removed the dependency on xpdf.
- removed the autocontrast and ghostscript cropping features (no longer useful).
- fix problem where the IMP file was not created if the latest eBook Publisher was not installed.
- complete overhaul of the code for better maintainability.
[2007-04-06] v5
- add DJVU input support
- allow specifying custom options in the Windows GUI
- OS X support fixes (based on sammykrupa's input)
- tweaks in the EB-1150 and REB-1200 profile
- do not split page if the generated image width/height is less than the device parameters. This caused too many blank pages to be created.
PDFRead was created by
Ashish Kulkarni with lots of help and suggestions from Falstaff, alex_d, sammykrupa, gdxf and many others from the Mobileread forums. It is licensed under the MIT license.