Automate paper digitization with Paperless-ngx and a cheap Brother scanner
Since Corona pandemic and inflation, scanners have become almost prohibitively expensive. It is even worse if you are looking for a scanner with SANE support. Brother is one of the few manufacturers that still actively provide drivers for Linux.
With Paperless-ngx and bash script you can clean up the paper pile at your home. You don't need the most expensive model. The ADS 1200 is perfectly adequate. The device is available second-hand for just under 100 euros.
Preparations
In addition to the Brother scanner you need:
- A docker-enabled host (VM or device) on which you can install paperless-ngx. Note: There are no drivers for armhf, so unfortunately Raspberry Pi will not work.
- USB to Micro USB Type B (length depending on where you want to place the scanner)
Install Sane utils if it is not already installed:
apt install sane-utils
Go to the official Brother website and download the latest driver.
If you are using a virtualization platform like Proxmox, you need to install the driver on it as well.
Connect the scanner and turn it on. Test if the installation worked:
Install docker and docker-compose.
Install Paperless-ngx
Paperless-ngx differs from other document tools by its machine learning component.
Unlike papermerge, for example, it saves a lot of manual work. The automatic tagging works amazingly well over time.
Create a new user with the necessary permissions and a new directory paperless-ngx in /opt
Go to /opt/paperless-ngx
cd /opt/paperless-ngx
The installation is done via docker compose.
Paperless-ngx offers several options. It is best to use the version with Postgres as database. Paperless is able to parse and convert Office documents (such as .doc, .xlsx and .odt). Whether you want to use Tika and Gotenberg is up to you.
It is important that you either set the path for "consume" volume manually (e.g. to a separate partition) or remember it, since this will be your scan folder.
Paperless-ngx checks this directory for new files to be processed.
The previously created user "paper" needs read and write permissions to this directory.
On my setup the entry is:
Install via docker-compose up -d.
You should be able to access the Web GUI via port 8000.
Create a user. The configuration is relatively intuitive.
In the admin interface, the first tag you create should be an inbox tag, e.g. named "please check".
All documents that are scanned are processed automatically. But in the first weeks you should still check everything manually. The tag filters new documents.
Automate your paperwork
It would be nice to use the buttons on the scanner. Unfortunately, this is not yet possible.
Brother scanners are not supported by scanbuttond/scanbd. But Brother itself provides a remedy with the tool brscan-skey
Download the tool from the Brother website and install it:
dpkg -i brscan-skey-0.3.1-2.amd64.deb
Before you can start the tool, you need to make a few adjustment.
The default brotherscripts are a bit inconvenient and export to tif format. We use imagemagick to create PDFs and use our own script.
apt install imagemagick
Setup policies:
<policy domain="coder" rights="read|write" \
pattern="{GIF,JPEG,PNG,WEBP}" />
<policy domain="coder" rights="read||write" \
pattern="PDF" />
Download the batch file scan.sh from github.
Change the line starting with Output to your Papleress-ngx Consume path.
This script scans the document in batched duplex mode, removes blank pages and saves the output in the consume path for paperless-ngx in PDF format.
Feel free to adjust it as you need it.
Change to /opt/brother/scanner/brscan-skey and edit brscan-skey.config
Now start brscan-skey in daemon mode:
/usr/bin/brscan-skey --bg-daemon=brscan
Place a document in the scanner and press the scan button. If everything goes well, within a few minutes the document should appear on Paperless-ngx.
If not, you can terminate the daemon with the command
Check for possible config errors with the command
Unfortunately, the scan tool does not start after the boot.
We make do with a Systemd service file.
Activate it with the command:
systemctl enable --now brscan.service
A few security concerns
Brother always uses port 54925 (UDP) as default and enables it globally.
Depending on the model and setup, it makes sense to block this with a firewall.
In the Brother configuration file there is the possibility to set a password (see above).