Blog Home Page » Archived Blog Posts in April, 2007

Posted: 2007-04-04 15:59:06
Tags: technical   perl   programming  
PDF to ASCII Text Conversion Utility
Every once in a while you just come across a piece of software that is simply nothing short of the magic wand. I was recently dealt with a new task that requires me to convert PDF files to ASCII text format. As always beginning with random googling, hopping from one page to the other, few hours and in some case few days go by in search of that elusive crucial component.

Add a salt to your frustrations, you run more often into commercial software (as always for an insane amount of licensing fee) than its open-source counterparts. More so I was interested finding a non-Windoze non-desktop-based solution, which naturally leaves me with finding the POSIX based one. Behold XPDF, thanks very very much to the fine folks at foolabs.com. You've got to love that name, so elemental to programming yet so cute! Anyhow, this package has been very popular, I just didn't know about it as I didn't have a need for it up until now.

Anyhoo, download, compile and install the source and you're off to PDF->ASCII nirvana in minutes.

Also if you're interested in using Perl in conjunction with XPDF, here's a sample code below that you might find useful -

#!/usr/bin/perl

# PDF - ASCII Conversion using XPDF
# XPDF Available at : http://www.foolabs.com/xpdf/download.html

use strict;
use warnings;

my $parser = "/usr/local/bin/pdftotext";
my $infile = "test.pdf";
my $text;

open PDF, "$parser -raw -q \"$infile\" - 2>/dev/null |"
                || "error opening pdf \"$infile\"\n";
$text .= $_ while <PDF>;
close PDF;

print "Conversion Results\n\n";
print "$text\n";
View Comments(0) » | Add Comment » | Permalink »