Project Gutenberg

The Project Gutenberg is a project gathering public domain books in different language, its web site is http://www.gutenberg.org. The purpose of this project is to create a sustainable solution to create a ZIM file providing the Gutenberg project ebooks in the similar manner like gutenberg.org

Goals

A script (python/perl/nodejs) able to create quickly a ZIM file with all books in all languages.
The data should be scraped from www.gutemberg.org.
The texts should be available in HTML and EPUB.
The ZIM should provide a simple filtering/search solution to find content (by author, language, title, ....)

One way to achieve it

Retrieve the list of books is published by the Gutenberg project in XML/RDF format
Parse the XML/RDF and put the data in a structured manner (memory or local DB)
Download the necessary HTML+EPUB data from Gutemberg.org based on the XML/RDF Catalog in a target directory
Create the necessary templates of the index web pages (For the search/filter feature, a javascript client side solution should be tried)
Fill the HTML templates with the data from the XML/RDF and write the index pages in a target directory
Run zimwriterfs to create the corresponding ZIM file of your target directory

Project Gutenberg

Goals

One way to achieve it

Navigation menu

Search