8,236
edits
(Translation of the Generation section) |
|||
(13 intermediate revisions by 7 users not shown) | |||
Line 1: | Line 1: | ||
{{Translations}} | {{Translations}} | ||
The Kiwix tools are a set of scripts (mostly in Perl) aiming to help creating content usable by Kiwix. | ''This tools are deprecated. Although they may work under certain conditions, we don't provide support for them.'' | ||
The Kiwix tools are a set of scripts (mostly in Perl) aiming to help creating content usable by Kiwix. Current development code can be found at: | |||
<nowiki>svn co http://kiwix.svn.sourceforge.net/svnroot/kiwix/tools/ kiwix-tools</nowiki> | |||
Kiwix is primarily designed as a tool to publish copies of Wikipedia, but every effort is made to ensure it would also be useful for: | Kiwix is primarily designed as a tool to publish copies of Wikipedia, but every effort is made to ensure it would also be useful for: | ||
Line 36: | Line 39: | ||
* the storage space you have for the final result | * the storage space you have for the final result | ||
* how to make the selection if necessary. | * how to make the selection if necessary. | ||
==Prerequisites== | |||
You'll need a bunch of Perl modules to run these scripts. Here is a list of modules one tester ([[User:Ijon]]) had to install given a plain Perl 5.10 installation on Ubuntu Linux. Your mileage may vary. Install them using CPAN (perl -MCPAN -e shell), CPANPLUS (cpanp(1)), or your distro's Perl bundling mechanism. | |||
* Array::PrintCols | |||
* Getargs::Long | |||
* HTML::Parser | |||
* HTML::Tagset | |||
* LWP | |||
* Log::Agent | |||
* Log::Log4perl | |||
* Term::Query | |||
* URI | |||
* XML::DOM | |||
* XML::NamespaceSupport | |||
* XML::Parser | |||
* XML::Parser::PerlSAX | |||
* XML::RegExp | |||
* XML::SAX | |||
* XML::SAX::Expat | |||
* XML::Simple | |||
I managed to install these by installing this subset and allowing automatic installation of dependencies: | |||
* XML::Simple | |||
* XML::DOM | |||
* Term::Query | |||
* Array::PrintCols | |||
* Log::Log4perl | |||
* Getargs::Long | |||
=== Debian/Ubuntu dependencies === | |||
<pre>sudo apt-get install liblog-log4perl-perl libdata-dumper-simple-perl libxml-simple-perl | |||
libxml-libxml-perl libarray-printcols-perl libgetargs-long-perl | |||
liburi-perl libdata-dumper-simple-perl libhtml-linkextractor-perl | |||
libhtml-parser-perl libdbd-pg-perl</pre> | |||
==Usage== | |||
Here is a list of available scripts (many of them are specific to Mediawiki): | |||
===Mediawiki Maintenance=== | |||
* {{ScriptTool|backupMediawikiInstall.pl}} creates a tgz archive of a complete existing Mediawiki installation (code + resources + database). | |||
* {{ScriptTool|installMediawiki.pl}} brings up an instance of Mediawiki from source code without human intervention. This actually simulates the manual Mediawiki installation process. | |||
* {{ScriptTool|resetMediawikiDatabase.pl}} empties a local instance of Mediawiki of all pages. | |||
===Mirroring Tools=== | |||
* {{ScriptTool|buildHistoryFile.pl}} given a list of articles and an online Mediawiki site, obtains complete histories of each page on the list. | |||
** {{ScriptTool|extractContributorsFromHistoryFile.pl}} extracts a list of authors from the histories obtained by the buildHistoryFile.pl script. | |||
* {{ScriptTool|buildContributorsHtmlPages.pl}} given a template and a list of authors, builds a custom set of HTML pages containing all of the authors on the list. | |||
* {{ScriptTool|checkMediawikiPageCompleteness.pl}} check if the local copies of pages from an online Mediawiki site are complete, i.e. have no dependencies (template files, multimedia resources, etc.) missing. | |||
* {{ScriptTool|checkPageExistence.pl}} given a list of page titles and an online Mediawiki site, checks whether such pages exist in it. This can be handy, for example, to see what pages have been replicated. | |||
* {{ScriptTool|checkRedirects.pl}} checks if there are no pages redirecting to non-existent pages (i.e. broken redirects). Eventually, it should also check against pages redirecting to each other. | |||
* {{ScriptTool|listAllImages.pl}} lists all images of an online Mediawiki site. | |||
* {{ScriptTool|listAllPages.pl}} lists all pages in an online Mediawiki site. | |||
* {{ScriptTool|listCategoryEntries.pl}} lists the pages belonging to a category, recursively. | |||
* {{ScriptTool|listRedirects.pl}} list page redirects in an online Mediawiki site. | |||
* {{ScriptTool|mirrorMediawikiCode.pl}} downloads the exact same version used by an online MediaWiki site; this includes both Mediawiki code and Mediawiki extensions. | |||
* {{ScriptTool|mirrorMediawikiInterwikis.pl}} installs to a local Mediawiki site the InterWikis (cross-language links) exactly identical to an online Mediawiki site. | |||
* {{ScriptTool|mirrorMediawikiPages.pl}} copies a set of pages and their dependencies (template and multimedia resources) from an online Mediawiki site to a local Mediawiki site. | |||
* {{ScriptTool|modifyMediawikiEntry.pl}} removes, deletes, or replaces a list of pages from an online Mediawiki site. | |||
===Dumping Tools=== | |||
* [http://kiwix.svn.sourceforge.net/viewvc/kiwix/dumping_tools/scripts/checkEmptyFilesInHtmlDirectory.pl?view=log checkEmptyFilesInHtmlDirectory.pl] checks whether a directory and its subdirectories contain empty files. | |||
* [http://kiwix.svn.sourceforge.net/viewvc/kiwix/dumping_tools/scripts/dumpHtml.pl?view=log dumpHtml.pl] given a local Mediawiki site, makes all-static copies of pages, i.e. creates a directory with all needed HTML. | |||
* [http://kiwix.svn.sourceforge.net/viewvc/kiwix/dumping_tools/scripts/launchTntreader.pl?view=log launchTntreader.pl] easily launches the tntreader program. | |||
* [http://kiwix.svn.sourceforge.net/viewvc/kiwix/dumping_tools/scripts/optimizeContents.pl?view=log optimizeContents.pl] optimizes a directory with HTML pages and resources. This script calls the following extensions: [http://tidy.sourceforge.net/ HTML Tidy] for HTML files; The [http://sourceforge.net/projects/littleutils/ Little utils] for images. | |||
===ZIM Generation=== | |||
* [http://kiwix.svn.sourceforge.net/viewvc/kiwix/dumping_tools/scripts/buildZimFileFromDirectory.pl?view=log buildZimFileFromDirectory.pl] creates a ZIM file from a directory tree containing static HTML and other content files. | |||
[[Category:Developer's Guide]] | |||
== Virtual machine == | |||
We have prepared a VM to help people to make ZIM files from their HTML files. Download it [http://download.kiwix.org/dev/ZIMmakerVMv3.ova there]. Unix login/pass are root/kiwix and for postgres: postgres/kiwix. To build your ZIM file go to root/dumping_tools/scripts and use buildZimFileFromdirectory.pl. | |||
== See also == | |||
* [[:File:Pediapress zim creation approach with mediawiki collection extension.jpg]] | |||
* [[:File:Kiwix zim creation approach.jpg]] |
edits