Difference between revisions of "Tools/en"

Jump to navigation Jump to search
1,853 bytes added ,  14 years ago
Translation of the Generation section
(Began translating the tools page, for the benefit of non-Francophones)
 
(Translation of the Generation section)
Line 4: Line 4:


Kiwix is primarily designed as a tool to publish copies of Wikipedia, but every effort is made to ensure it would also be useful for:
Kiwix is primarily designed as a tool to publish copies of Wikipedia, but every effort is made to ensure it would also be useful for:
* release of [[http://www.wikimedia.org/ other Wikimedia Foundation projects]
* release of [http://www.wikimedia.org/ other Wikimedia Foundation projects]
* release of other content created on the Mediawiki platform.
* release of other content created on the Mediawiki platform.


As the heart of Kiwix being the HTML rendering engine Gecko, the objective of Kiwix tools is to produce:
As the heart of Kiwix is the HTML rendering engine Gecko, the objective of Kiwix tools is to produce:


* first, a coherent set of static HTML files and their needed resources: Stylesheets, JavaScript code, images, etc.
* first, a coherent set of static HTML files and their needed resources: Stylesheets, JavaScript code, images, etc.
Line 18: Line 18:


ZIM is an open, standard format created and maintained by the [http://www.openzim.org openZIM project], of which Kiwix is a founding member.  ZIM is itself based on an older format (Zeno). Zeno was created by the Berlin publishing house [http://www.digitale-bibliothek.de Directmedia] and served for [http://www.amazon.de/Wikipedia-2007-2008-Kompakt-DVD-ROM/dp/3866400187/ref=sr_1_1?ie=UTF8&s=software&qid=1232812631&sr=8-1 the German Wikipedia released on CD-ROM].  Later, the Zeno format had been abandoned, but we wanted to continue development. The future will tell whether this initiative will be successful, but the goal is to make a standard and thus simplify the problem for each of the storage dumps. It is, anyway, ''already'' the best ''free'' solution.
ZIM is an open, standard format created and maintained by the [http://www.openzim.org openZIM project], of which Kiwix is a founding member.  ZIM is itself based on an older format (Zeno). Zeno was created by the Berlin publishing house [http://www.digitale-bibliothek.de Directmedia] and served for [http://www.amazon.de/Wikipedia-2007-2008-Kompakt-DVD-ROM/dp/3866400187/ref=sr_1_1?ie=UTF8&s=software&qid=1232812631&sr=8-1 the German Wikipedia released on CD-ROM].  Later, the Zeno format had been abandoned, but we wanted to continue development. The future will tell whether this initiative will be successful, but the goal is to make a standard and thus simplify the problem for each of the storage dumps. It is, anyway, ''already'' the best ''free'' solution.
==Generating ZIM Files From Wikis==
The question of how to generate a dump is not a simple one.  For several reasons, Kiwix has so far concentrated on generating dumps offering a selection of a given Wiki site, even if the publication of complete Wikipedia dumps remains a clear objective.  The Kiwix tools are designed to assist in the selection of entries, replication of content from the online site in a local mirror, and then from the mirror to a ZIM file.
But this is not the only method to generate a dump: theoretically, this can be done in different ways. Here is a small inexhaustive list of approaches:
* If you want to produce a complete dump, you can:
** obtain a ready HTML dump provided by the wiki admin, as [http://static.wikipedia.org/ provided here by the Wikimedia Foundation] for example.
** mount a local mirror of the wiki, uploading the data (the content from another wiki) into the database and then generating an HTML dump by yourself. One can find such data for the Wikimedia Foundation [http://download.wikimedia.org/backup-index.html here].  In the case of a selection rather than a complete dump, you can also retrieve the data dynamically from the site (since the wiki is open source).
** generate an HTML dump directly (by retrieving the HTML pages) using software such as Vacuum on the website (be careful not to abuse the remote Web site by inordinate amounts of traffic, though!).
* If you want a partial dump, you must make a selection of items; once you have only the items you want, then the same process applies as with a complete dump.
There are certain constraints that should be taken into account.  Here are the most important ones:
* material resources (equipment, power) of the server
* your own material resources
* the storage space you have for the final result
* how to make the selection if necessary.
23

edits

Navigation menu