Combine JPEG images into a single PDF

I no longer have a scanner, but I found that the mobile phone takes pictures of good enough quality that it is a good alternative. The problem, though, is how to combine many pictures when they come from a single document.

The following script has been concocted as a Windows alternative to the [fabulous recipe from Stack Overflow]( to combine multiple JPEG images into a single PDF, using just Ghostscript.

The script assumes that you have installed MikTeX for usual LaTeX processing, and that it has been installed using Scoop. The MikTeX distribution comes with a preinstalled copy of Ghostscript, under the executable name mgs.exe. We will use this program and the script, creating a command line tool which I call jpeg2pdf.cmd.

The command line tool takes as arguments the name of the PDF file followed by one or more names of files, which may include wildcards, as in

img2pdf.bat multipage-file.pdf images-00*.jpg

Here goes the full script:

@echo off

rem Where we installed Scoop
set scoop=%HOMEDRIVE%%HOMEPATH%\scoop

rem Where Ghostscript is found in the MikTeX distribution
set tool=%scoop%\apps\latex\current\texmfs\install\ghostscript\base\

rem Create a temporary file with the postscript commands to combine
rem the images into a single pdf
set script=%TEMP%\
if exist %script% del %script%
if x%2==x goto run
for /f "tokens=*" %%g in ('dir /b %2') do echo (%%g)^ viewJPEG^ showpage >> "%script%"
if x%3==x goto run
for /f "tokens=*" %%g in ('dir /b %2') do echo (%%g)^ viewJPEG^ showpage >> "%script%"
if x%4==x goto run
for /f "tokens=*" %%g in ('dir /b %2') do echo (%%g)^ viewJPEG^ showpage >> "%script%"
if x%5==x goto run
for /f "tokens=*" %%g in ('dir /b %2') do echo (%%g)^ viewJPEG^ showpage >> "%script%"
if x%6==x goto run
for /f "tokens=*" %%g in ('dir /b %2') do echo (%%g)^ viewJPEG^ showpage >> "%script%"
if x%7==x goto run
for /f "tokens=*" %%g in ('dir /b %2') do echo (%%g)^ viewJPEG^ showpage >> "%script%"
if x%8==x goto run
for /f "tokens=*" %%g in ('dir /b %2') do echo (%%g)^ viewJPEG^ showpage >> "%script%"

echo Combining images with Postscript:
type "%script%"
rem The actual job is done here
%scoop%\apps\latex\current\texmfs\install\miktex\bin\x64\mgs.exe -q -sDEVICE=pdfwrite -dPDFSETTINGS=/ebook -o %1 %tool% "%script%"

I am writing some largish lecture notes, which now are around 120 pages long. The problem with this document is that it is full of attractive images that are needed to make the content more clear and amenable, and these images are not optimized in size: they come from papers, simulations, drawings, etc, and each of them has been produced with different means and the final size of the image might be much smaller than what the resolution allows for.

So, the result is that I have this large file which is 16Mb large, which I will call notes.pdf I need to send a few chapters of this file to my colleagues and students and when I cut the file, it still is too large, 12-13Mb, as Adobe or OS X are not able to really trim the size.

If you run Linux or Mac OSX, here is what you can do:

gs -sDEVICE=pdfwrite -dBATCH -dNOPAUSE -dCompatibilityLevel=1.4 -dPDFSETTINGS=/screen -sOutputFile=small-notes.pdf notes.pdf

This reduces the size quite significantly by (i) reducing the size of figures to the resolution that is needed for the given device (screen), (ii) eliminating fonts that are already standard and which are included by PDFLaTex, (iii) compressing the output. In my case, down from 34Mb to 13Mb.

If I need to select a few pages, tell that to Ghostscript:

gs -dFirstPage=1 -dLastPage=73 -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dBATCH -dNOPAUSE -dPDFSETTINGS=/screen -sOutputFile=small-notes.pdf notes.pdf

If you run Windows but have MikTeX installed, the same commands work, but you have to replace gs with mgs.