Project Gutenberg

From Kiwix
Revision as of 16:39, 2 January 2014 by Kelson (talk | contribs) (Created page with "The '''Project Gutenberg''' is a project gathering public domain books in different language, its web site is http://www.gutenberg.org. The purpose of this project is to creat...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

The Project Gutenberg is a project gathering public domain books in different language, its web site is http://www.gutenberg.org. The purpose of this project is to create a sustainable solution to create a ZIM file providing the Gutenberg project ebooks in the similar manner like gutenberg.org

Goals

  • A script (python/perl/nodejs) able to create quickly a ZIM file with all books in all languages.
  • The data should be scraped from www.gutemberg.org.
  • The texts should be available in HTML and EPUB.
  • The ZIM should provide a simple filtering/search solution to find content (by author, language, title, ....)

One way to achieve it

  1. Retrieve the list of books is published by the Gutenberg project in XML/RDF format
  2. Parse the XML/RDF and put the data in a structured manner (memory or local DB)
  3. Download the necessary HTML+EPUB data from Gutemberg.org based on the XML/RDF Catalog in a target directory
  4. Create the necessary templates of the index web pages (For the search/filter feature, a javascript client side solution should be tried)
  5. Fill the HTML templates with the data from the XML/RDF and write the index pages in a target directory
  6. Run zimwriterfs to create the corresponding ZIM file of your target directory