Wednesday, March 3, 2010

Ghostscript saves the day

We got a request from one of our users last week that we had to (gov't regs) print out over 700 pdf documents that we have stored in Alfresco... She wanted to know if there was any way we could do that without having to open each pdf and print it out individually. I didn't think Alfresco had any way of doing that, but had a couple of ideas.

I got our primary Alfresco dev to write a query and put the docs in an Alfresco folder, which I mounted to my Fedora laptop. I cheated and use Gnome's "Connect to..." functionality, but you can manually mount it using CIFS as well. I noticed that the .pdf extension wasn't case consistent..

I had used ghostscript to merge pdf's before, so I planned on using it to merge all 730 pdf's to one file and let the user do whatever she needed to with it. I thought that ghostscript had an option to just append files together one at a time so I figured out the bash necessary to iterate over the files (which had spaces in the filenames) and pass each filename to gs:


## This makes the for loop work even though there are spaces in the filenames
IFS=$(echo -en "\n\b")
for File in `ls -l | grep -i ".*pdf" | cut -c 54-`
do
gs -sDEVICE=pdfwrite -sOutputFile=$2 -dNOPAUSE -dBATCH -f $File
done;


But ghostscript doesn't have an append option (at least not one that I could find) and I didn't want to have to concat all the file names together and pass that to ghostscript.

Here's the command I figured out to make it work:

gs -sDEVICE=pdfwrite -sOutputFile=/tmp/merged.pdf -dNOPAUSE -dBATCH -f *[pP][dD][fF]

That created a 1000+ page pdf doc that the user was happy to receive and promptly printed out. Interestingly enough, the poor printer she sent the doc to actually died trying to print the 1000 pages... fail.

Thanks ghostscript!

No comments:

Post a Comment