This guide will show you, from start to finish, how to:
* Install and configure tools for scrubbing metadata
* Show you how to use those tools to scrub metadata
* Touch on some best practices for handling images and PDFs when you intend to publish them
[TOC]
# Installation
## Windows
Before you begin you will need to install LibreOffice, Exiftool, and QPDF.
While other document editors might be suitable, we recommend LibreOffice due to its open-source nature and **highly suggest avoiding Microsoft Office**.
### Prerequisites
* Ensure you have administrative privileges for installations and PATH edits.
* Winget must be available (included in Windows 10/11 via the App Installer; update it from the Microsoft Store if needed). If `winget` is not recognized in Command Prompt, install it manually from Microsoft's official GitHub repository: https://github.com/microsoft/winget-cli.
### LibreOffice
To install LibreOffice on Windows, download it from their official download link below and run the installer: https://www.libreoffice.org/download/download-libreoffice/
### Exiftool
To install `exiftool` on Windows, first open a command prompt window (type Win+R, type in `cmd.exe`, and hit enter to open a prompt window) then copy/paste the following command in and hit enter. Follow the on screen instructions for the installer:
```
winget install --id OliverBetz.ExifTool -e
```
### QPDF
Install `qpdf` by entering the following command into the Command Prompt:
```
winget install --id QPDF.QPDF -e
```
Run the following to check the version that was installed. Take note of it:
```
winget list --id QPDF.QPDF
```
Open `C:\Program Files` and look for a folder titled `qpdf 12.2.0`. Yours may have a different version number. Copy the file path to the bin folder. The filepath will be something like `C:\Program Files\qpdf 12.2.0\bin`.
Search for "edit system variables" in the Windows search bar and open it. Click on the "Environment variables" button. Click on "PATH" under user variables, then click "Edit". A new window will pop up. Click the new button and then paste the folder path to the qpdf bin folder. Click OK to close both windows.
## Linux
Install `exiftool` and `qpdf` through your package manager.
# Scrubbing the Data
## Scrub Images
**NOTE**: It is important to scrub images **before** embedding them into your document. Scrubbing a PDF does not scrub any images inside of it.
**NOTE**: You should take backups of any images you perform this process on. Consider copying them into a new folder and then working on those copies.
These instructions focus on JPEGs but can be adapted for other formats, like PNG or TIFF. Use `-ext png` for PNGs, or specify multiple with `-ext jpg -ext png`. Scrub only the images you'll embed; if your document includes vectors or other embeds (e.g., from LibreOffice Draw), they must be scrubbed separately. Make sure that all of the images have been scrubbed before you proceed to creating your pdf.
Open a `cmd.exe` window and change directory to the new image folder (you did read the notes at the top of this section, right?). The command to do this is below. Be sure to use the correct filepath:
```
cd C:\[path to folder]\images
```
Scrub all JPEG images that will be inserted into the doc using `exiftool`. The command to do this is below. Be sure to use the correct file path:
```
exiftool -overwrite_original -all= -r -ext jpg C:\[path to folder]\images
```
## Create Your PDF
We recommend using **LibreOffice** to draft your documentation as we cannot fully trust Microsoft Office to not embed data into the resulting PDF in ways we have not yet discovered.
In LibreOffice, you should then export to PDF through the "File > Export As > Export" menu. For other word processing software, such as MS Word (which we do not recommend that you use), we recommend that you save the document as a PDF - do not "print to PDF".
## Scrub Your PDF
Scrub all PDF metadata with the command below. Be sure to use the correct file path:
```
exiftool -overwrite_original -all= C:\[path to folder]\input.pdf
```
Then, after scrubbing, linearize the PDF with QPDF to remove any remaining metadata stored in the file:
```
qpdf --linearize input.pdf output.pdf
```
To verify that your PDF has been scrubbed, run:
```
exiftool -a -G1 -s output.pdf
```
The output should show minimal metadata. No user-specific info should remain.
You are now the proud author of a PDF document which has its metadata scrubbed. We know that this process is cumbersome and are working to have a cleaner solution available to the community soon.