Converting massive e-book-collections with Calibre and GNU Parrallel
Calibre deals well with massive e-book collections, but it does not offer full text search (as far as I know). If you are on a Mac there is a decent search engine for your personal files built in, called Spotlight. However, Spotlight is not able to search the contents of EPUB or MOBI e-book files. This is why I decided to convert my collection to plain text.
Calibre offers the capability to bulk-convert a lot of books in parallel, but at least on my system it quite often froze after a couple of minutes. Calibre also lets you convert books from the command line with the ebook-convert command. Combined with a simple find and GNU Parallel it can be used to convert massive collections with a one liner:
find $your_library_location -type f -iname "*.mobi" | parallel --timeout 120 --progress "/Applications/calibre.app/Contents/console.app/Contents/MacOS/ebook-convert {} {.}.txt"
Parallel will try to saturate all the available CPUs on your system and can even scale out to other machines with a little more tweaking. The --timeout
option will kill off any spawned sub-process that takes longer than the given ammount of seconds, which is nessecary, since sometimes the conversion seems to hang for no apparent reason. The option --progress
will give you an idea how many jobs have been completed yet.
Parallel can be installed via the excellent homebrew project by running brew install parallel
. The location of the ebook-convert
command may vary on other systems, if I remember correctly Ubuntu sets it up correctly when installing Callibre via APT, so you can omit the path.
For me this works well enough and seems a little more robust than Calibres own job-scheduling mechanism. Hope this helps someone else.