Project Gutenberg
Jump to navigation
Jump to search
The Project Gutenberg is a project gathering public domain books in different language, its web site is http://www.gutenberg.org. The purpose of this project is to create a sustainable solution to create a ZIM file providing the Gutenberg project ebooks in the similar manner like gutenberg.org
Goals
- A script (python/perl/nodejs) able to create quickly a ZIM file with all books in all languages.
- The data should be scraped from www.gutemberg.org.
- The texts should be available in HTML and EPUB.
- The ZIM should provide a simple filtering/search solution to find content (by author, language, title, ....)
One way to achieve it
- Retrieve the list of books is published by the Gutenberg project in XML/RDF format
- Parse the XML/RDF and put the data in a structured manner (memory or local DB)
- Download the necessary HTML+EPUB data from Gutemberg.org based on the XML/RDF Catalog in a target directory
- Create the necessary templates of the index web pages (For the search/filter feature, a javascript client side solution should be tried)
- Fill the HTML templates with the data from the XML/RDF and write the index pages in a target directory
- Run zimwriterfs to create the corresponding ZIM file of your target directory