pdftosrc
extract source file or stream from PDF file
see also :
pdfimages - pdftotext - pdftex
Synopsis
pdftosrc
PDF-file [stream-object-number]
add an example, a script, a trick and tips
examples
When using pdftex, a source file can be embedded into some
PDF-file by using pdftex primitives, as illustrated by the
following example:
\immediate\pdfobj
stream attr {/Type /SourceFile /SourceName (myfile.zip)}
file{myfile.zip}
\pdfcatalog{/SourceObject \the\pdflastobj\space 0 R}
Then this zip file can be extracted from the PDF-file by
calling pdftosrc PDF-file.
description
If only
PDF-file is given as argument, pdftosrc
extracts the embedded source file from the first found
stream object with /Type /SourceFile within the
PDF-file and writes it to a file with the name
/SourceName as defined in that PDF stream object (see
application example below).
If both
PDF-file and stream-object-number are given as
arguments, and stream-object-number is positive,
pdftosrc extracts and uncompresses the PDF stream of
the object given by its stream-object-number from the
PDF-file and writes it to a file named
PDF-file.stream-object-number with the ending
.pdf or .PDF stripped from the original
PDF-file name.
A special case
is related to XRef object streams that are part of the PDF
standard from PDF-1.5 onward: If stream-object-number
equals -1, then pdftosrc decompresses the XRef stream
from the PDF file and writes it in human-readable PDF
cross-reference table format to a file named
PDF-file.xref (these XRef streams can not be
extracted just by giving their object number).
In any case an
existing file with the output file name will be
overwritten.
options
None.
copyright
Copyright (c) 1996-2006 Han The Thanh,
<thanh[:at:]pdftex[:dot:]org>
This file is part of pdfTeX.
pdfTeX is free software; you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
pdfTeX is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
General Public License for more details.
You should have received a copy of the GNU General Public License
along with pdfTeX; if not, write to the Free Software Foundation,
Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
diagnostics
At success the exit code of pdftosrc is 0, else 1.
All messages go to stderr. At program invocation, pdftosrc
issues the current version number of the program xpdf, on
which pdftosrc is based:
pdftosrc version 3.01
When pdftosrc was successful with the output file writing,
one of the following messages will be issued:
Source file extracted to source-file-name
or
Stream object extracted to
PDF-file.stream-object-number
or
Cross-reference table extracted to PDF-file.xref
When the object given by the stream-object-number does not
contain a stream, pdftosrc issues the following error
message:
Not a Stream object
When the PDF-file can’t be opened, the error message is:
Error: Couldn’t open file ’PDF-file’.
When pdftosrc encounters an invalid PDF file, the error
message (several lines) is:
Error: May not be a PDF file (continuing anyway)
(more lines)
Invalid PDF file
There are also more error messages from pdftosrc for
various kinds of broken PDF files.
environment
None.
files
Just the executable pdftosrc.
notes
An embedded source file will be written out unchanged, i. e. it
will not be uncompressed in this process.
Only the stream of the object will be written, i. e. not the
dictionary of that object.
Knowing which stream-object-number to query requires
information about the PDF file that has to be gained elsewhere,
e. g. by looking into the PDF file with an editor.
The stream extraction capabilities of pdftosrc (e. g.
regarding understood PDF versions and filter types) follow the
capabilities of the underlying xpdf program version.
Currently the generation number of the stream object is not
supported. The default value 0 (zero) is taken.
The wording stream-object-number has nothing to do with
the ’object streams’ introduced by the Adobe PDF Reference, 5th
edition, version 1.6.
bugs
Not all
embedded source files will be extracted, only the first
found one.
Email bug
reports to pdftex[:at:]tug[:dot:]org.
see also
xpdf,
pdfimages , pdftotext ,
pdftex ,
authors
pdftosrc
written by Han The Thanh, using xpdf functionality
from Derek Noonburg.
Man page
written by Hartmut Henkel.