qpdf: PDF Transformation Software

author — Sat, 25 Feb 2023 17:16:05 +0000

qpdf is both a free command-line program and a C++ library (open source PDF manipulation library) for structural, content-preserving transformations on PDF files.
qpdf has been designed with very few external dependencies and is intentionally very lightweight.

It was created in 2005 by Jay Berkenbilt.

One of the main features is the capability to merge and split PDF files by selecting pages from one or more input files.
It is also capable of performing a variety of transformations such as linearization (known as web optimization or fast web viewing), encryption, and decryption of PDF files.

qpdf Online Documentation

qpdf Local Documentation: /usr/share/doc/qpdf/qpdf-manual.html

Portable Document Format

Adobe created the PDF in 1992 by Dr. John Warnock, offering an easy, reliable way to present and exchange documents regardless of the software, hardware, or operating systems being used.
Today, it is one the most trusted file formats around the world, it can be easily viewed on any operating system.

PDF was standardized as ISO 32000 in 2008 as an open standard.
The PDF format is now maintained by the International Organization for Standardization (ISO).
ISO 32000-2:2020 edition was published in December 2020, it does not include any proprietary technologies.

The PDF specification also provides for encryption (in which case a password is needed to view or edit the contents), digital signatures (to provide secure authentication), file attachments, and metadata.
PDF 2.0 defines 256-bit AES encryption as standard for PDF 2.0 files.

The standard security provided by PDF consists of two different passwords:

– user password, which encrypts the file and prevents opening

– owner password, which specifies operations that should be restricted even when the document is decrypted, which can include modifying, printing, or copying text and graphics out of the document, or adding or modifying text notes.

The user password encrypts the file, the owner password does not, instead relying on client software to respect content restrictions.
An owner password can easily be removed by software.
Thus, the used restrictions that an author places on a PDF document are not secure, and cannot be assured once the file is distributed.

Metadata includes information about the document and its content, such as the author’s name, document title, description, creation/modification dates, application used to create the file, keywords, copyright information, etc.

Install qpdf (Debian)

# apt install qpdf

Usage

--linearize
Create linearized (web-optimized) output file.
Linearized files are formatted in a way that allows compliant readers to begin displaying a PDF file before it is fully downloaded.
Ordinarily, the entire file must be present before it can be rendered because important cross-reference information typically appears at the end of the file.

$ qpdf --linearize infile.pdf  outfile.pdf

Merge PDF files with pages selection

qpdf allows you to use the --pages option to select pages from one or more input files.

$ qpdf primary_input_file.pdf --pages . [--password=password] [page-range] [ ... ] -- outputfile.pdf

Within [ ... ] you may repeat the following:  inputfile_N.pdf [--password=password] [page-range]

The special input file '.' can be used as an alias for the primary input file.
Multiple input files may be specified and you can select specific pages from it.
For each inputfile that pages should be extracted from, specify the filename, a password (if needed) to open the file, and a page range.
Note that '--' terminates parsing of page selection flags.

--password=password specifies a password for accessing encrypted files
The password option is only needed for password-protected files

The page range may be omitted. In this case, all pages are included.

Document-level information (metadata, outline, etc.) is taken from the primary input file (in the above example, primary_input_file.pdf) and is preserved in outputfile.pdf
You can use --empty in place of the primary input file to start from an empty file (without any metadata, outline, etc.) and just merge selected pages from input files.

In most cases you will most likely use this following syntax

$ qpdf --empty --pages inputfile_1.pdf [page-range] inputfile_2.pdf [page-range] inputfile_3.pdf [page-range] [ ... ] -- outputfile.pdf

The page-range is a set of numbers separated by commas, ranges of numbers separated dashes, or combinations of those.
The character 'z' represents the last page.
A number preceded by an 'r' indicates to count from the end, so r3-r1 would be the last three pages of the document.
Pages can be specified in any order (selection of any pages).
Ranges can be specified in any order (ascending or descending): a high number followed by a low number causes the pages to appear in reverse.
Numbers may be repeated in a page range.
A page range may be optionally appended with :even or :odd to indicate only the even or odd pages in the given range.
Note that even and odd refer to the positions within the specified, range, not whether the original number is even or odd.

Example page ranges:

1,3,5-9,15-12
Pages 1, 3, 5, 6, 7, 8, 9, 15, 14, 13, and 12 in that order

z-1
All pages in the document in reverse

r3-r1
The last three pages of the document

r1-r3
The last three pages of the document in reverse order

1-20:even
Even pages from 2 to 20

5,7-9,12:odd
Pages 5, 8 and 12, which are the pages in odd positions from among the original range (pages 5, 7, 8, 9, and 12)

Example, to extract pages 1 through 5 from infile.pdf while preserving all metadata associated with that file in outfile.pdf
$ qpdf infile.pdf --pages . 1-5 -- outfile.pdf

If you want pages 1 through 5 from infile.pdf without any metadata, use
$ qpdf --empty --pages infile.pdf 1-5 -- outfile.pdf

Merge all .pdf files
$ qpdf --empty  --pages *.pdf -- outfile.pdf

Split a PDF into separate PDF files

--split-pages[=n]
Write each group of n pages to a separate output file.
If n is not specified, create single pages.

Output file names are generated as follows:
If the string %d appears in the output file name, it is replaced with a range of zero-padded page numbers starting from 1.
Otherwise, if the output file name ends in .pdf (case insensitive), a zero-padded page range, preceded by a dash, is inserted before the file extension.
Otherwise, the file name is appended with a zero-padded page range preceded by a dash.

Zero padding is added to all page numbers in file names so that all the numbers are the same length, which causes the output filenames to sort lexically in numerical order.

Page ranges are a single number in the case of single-page groups or two numbers separated by a dash otherwise.

Here are some examples. In these examples, infile.pdf has 20 pages

Output files are 01-outfile through 20-outfile with no extension
$ qpdf --split-pages infile.pdf %d-outfile

Output files are outfile-01.pdf through outfile-20.pdf
$ qpdf --split-pages infile.pdf outfile.pdf

Output files are outfile-01-04.pdf, outfile-05-08.pdf, outfile-09-12.pdf, outfile-13-16.pdf, outfile-17-20.pdf
$ qpdf --split-pages=4 infile.pdf outfile.pdf

Output files are outfile.notpdf-01 through outfile.notpdf-20
The extension .notpdf is not treated in any special way regarding the placement of the number
$ qpdf --split-pages infile.pdf outfile.notpdf

Note that metadata, outline, etc, and other document-level features of the original PDF file are not preserved.
For each page of output, this option creates an empty PDF and copies a single page from the output into it.
If you require the document-level data, you will have to run qpdf with the --pages option once for each page.
Using --split-pages is much faster if you don’t require the document-level data.

If you don’t want to split out every page, use page ranges to select the pages you only want to extract.
The page range is used to specify the pages or ranges you want, but each extracted page is still stored in a single PDF.