Converting PDF files to HTML — 14 Dec, 2016
It's a shame that we don't still have a decent PDF reader in Linux, I was in need of a PDF reader that let me copy and paste without messing all the formatting (the Firefox PDF reader didn't worked well) and I didn't want to install a PDF reader with lots of dependencies, I like things simple, that's why I don't use any Desktop Environment, just i3-wm, simple applications, and scripts (lots of scripts).
I've tried XPDF, it's a decent PDF reader but the copy and paste operation is a little weird.
I was tired of PDF readers and I needed a solution, that's when I found poppler, it's a PDF renderer who let's you convert PDF files to other formats.
You can install poppler package in Arch Linux with
pacman -S poppler.
I've made a simple script that uses poppler to convert the file to HTML, saves it in
/tmp/<filename> and then opens it in your default browser.
#! /bin/bash # convert_pdftohtml.sh # Copyright (C) 2016 Bruno Jesus (aka strang3quark) <firstname.lastname@example.org> # # Distributed under terms of the MIT license. # PDFPATH=$1; PDFFILE=$(basename $PDFPATH); if [ "$2" == "--format" ]; then FORMAT="-s"; HTMLFILE="index-html.html"; else FORMAT=""; HTMLFILE="index.html"; fi mkdir /tmp/$PDFFILE; pdftohtml -p $FORMAT $PDFPATH /tmp/$PDFFILE/index.html $BROWSER /tmp/$PDFFILE/$HTMLFILE
The usage is very simple:
convert_pdftohtml.sh myfile.pdf - this will remove all the weird formatting
convert_pdftohtml.sh myfile.pdf --format - this will keep all the formatting
If you have some alternatives or suggestions please contact me.