21
edits
RashiqAhmad (talk | contribs) |
|||
Line 3: | Line 3: | ||
== Goals == | == Goals == | ||
* A script (python/perl/nodejs) able to create quickly a ZIM file with all books in all languages. | * A script (python/perl/nodejs) able to create quickly a ZIM file with all books in all languages. | ||
* The data should be scraped from www. | * The data should be scraped from www.gutenberg.org. | ||
* The texts should be available in HTML and EPUB. | * The texts should be available in HTML and EPUB. | ||
* The ZIM should provide a simple filtering/search solution to find content (by author, language, title, ....) | * The ZIM should provide a simple filtering/search solution to find content (by author, language, title, ....) | ||
Line 10: | Line 10: | ||
# Retrieve the list of books is published by the Gutenberg project in [http://www.gutenberg.org/cache/epub/feeds/rdf-files.tar.bz2 XML/RDF format] | # Retrieve the list of books is published by the Gutenberg project in [http://www.gutenberg.org/cache/epub/feeds/rdf-files.tar.bz2 XML/RDF format] | ||
# Parse the XML/RDF and put the data in a structured manner (memory or local DB) | # Parse the XML/RDF and put the data in a structured manner (memory or local DB) | ||
# Download the necessary HTML+EPUB data from | # Download the necessary HTML+EPUB data from Gutenberg.org based on the XML/RDF Catalog in a target directory | ||
# Create the necessary templates of the index web pages (For the search/filter feature, a javascript client side solution should be tried) | # Create the necessary templates of the index web pages (For the search/filter feature, a javascript client side solution should be tried) | ||
# Fill the HTML templates with the data from the XML/RDF and write the index pages in a target directory | # Fill the HTML templates with the data from the XML/RDF and write the index pages in a target directory |
edits