Difference between revisions of "Project Gutenberg"

Jump to navigation Jump to search
no edit summary
Line 3: Line 3:
== Goals ==
== Goals ==
* A script (python/perl/nodejs) able to create quickly a ZIM file with all books in all languages.
* A script (python/perl/nodejs) able to create quickly a ZIM file with all books in all languages.
* The data should be scraped from www.gutemberg.org.
* The data should be scraped from www.gutenberg.org.
* The texts should be available in HTML and EPUB.
* The texts should be available in HTML and EPUB.
* The ZIM should provide a simple filtering/search solution to find content (by author, language, title, ....)
* The ZIM should provide a simple filtering/search solution to find content (by author, language, title, ....)
Line 10: Line 10:
# Retrieve the list of books is published by the Gutenberg project in [http://www.gutenberg.org/cache/epub/feeds/rdf-files.tar.bz2 XML/RDF format]
# Retrieve the list of books is published by the Gutenberg project in [http://www.gutenberg.org/cache/epub/feeds/rdf-files.tar.bz2 XML/RDF format]
# Parse the XML/RDF and put the data in a structured manner (memory or local DB)
# Parse the XML/RDF and put the data in a structured manner (memory or local DB)
# Download the necessary HTML+EPUB data from Gutemberg.org based on the XML/RDF Catalog in a target directory
# Download the necessary HTML+EPUB data from Gutenberg.org based on the XML/RDF Catalog in a target directory
# Create the necessary templates of the index web pages (For the search/filter feature, a javascript client side solution should be tried)
# Create the necessary templates of the index web pages (For the search/filter feature, a javascript client side solution should be tried)
# Fill the HTML templates with the data from the XML/RDF and write the index pages in a target directory
# Fill the HTML templates with the data from the XML/RDF and write the index pages in a target directory
21

edits

Navigation menu