Difference between revisions of "Mediawiki DumpHTML extension improvement"

Jump to navigation Jump to search
Line 28: Line 28:
== Workpackages ==
== Workpackages ==


=== 1 - Revamping and fixing bugs ===
=== 1 - phpzim creation ===
 
This is about the creation of a ZIM [devzone.zend.com/303/extension-writing-part-i-introduction-to-php-and-zend/ PHP extension] called ''[[phpzim]]''. phpzim is an extension allowing PHP developers to read/write ZIM files. It's based on the [http://www.openzim.org/Zimlib zimlib] like [https://github.com/pediapress/pyzim pyzim], the Python extension to deal with ZIM files. phpzim is essential to:
* speed up the ZIM creation (avoiding using a postgresql database and the [http://www.openzim.org/Zimwriter zimwriter] binary).
* Mandatory to integration the ZIM generation directly in DumpHTML
* Essential for many CMS coded in php to generate also ZIM files
 
Delivarables:
* Create a tgz of the zimlib with only the necessary for phpzim
* Create the code (c++) of the phpzim PHP extension using the GNU tools for the compilation
* phpzim should offer a easy API to read/write ZIM files with all the necessary options
* Code of phpzim should be online developed on openZIM subversion and as a tgz directly compilable
* Code usage should be documented and documentation should be automaticaly generated using doxygen or similar
* Rewrite and improve [http://kiwix.svn.sourceforge.net/viewvc/kiwix/dumping_tools/scripts/buildZimFileFromDirectory.pl?view=log buildZimFileFromDirectory.pl] in PHP (dealing directly with the zimlib)
 
Costs:
* ~ 4000 euros
 
{{rellink|More details: [[phpzim]]}}
 
=== 2 - Revamping and fixing bugs ===


The worth point is that the DumpHTML extension is not correctly maintained and with the time, [https://bugzilla.wikimedia.org/buglist.cgi?query_format=advanced&list_id=2671&component=DumpHTML&resolution=---&product=MediaWiki%20extensions many issues were discovered]. Currently, the extension is not really usable without fixing/tweaking the Mediawiki code.
The worth point is that the DumpHTML extension is not correctly maintained and with the time, [https://bugzilla.wikimedia.org/buglist.cgi?query_format=advanced&list_id=2671&component=DumpHTML&resolution=---&product=MediaWiki%20extensions many issues were discovered]. Currently, the extension is not really usable without fixing/tweaking the Mediawiki code.
Line 42: Line 62:
* ~ 4000 euros
* ~ 4000 euros


=== 2 - phpzim creation an integration in DumpHTML extension ===
=== 3 - phpzim Integration ===


phpzim would be a new [http://pecl.php.net/ PHP extension] allowing to create/write and read ZIM file directly in PHP. This would be a binding of the [http://www.openzim.org/Zimlib zimlib], like [https://code.launchpad.net/zim/pyzim pyzim] in Python. With this library done, we will be able to create ZIM file directly from the DumpHTML.
phpzim would be a new [http://pecl.php.net/ PHP extension] allowing to create/write and read ZIM file directly in PHP. This would be a binding of the [http://www.openzim.org/Zimlib zimlib], like [https://code.launchpad.net/zim/pyzim pyzim] in Python. With this library done, we will be able to create ZIM file directly from the DumpHTML.
Line 51: Line 71:


Deliverables:
Deliverables:
* phpzim (40 hours)
* updated dumpHTML (40 hours)
* updated dumpHTML (20 hours)


Costs:
Costs:
* ~ 2500 euros
* ~ 1500 euros


=== 3 - Integrating Collection and DumpHTML extensions and new features ===
=== 4 - Integrating Collection and DumpHTML extensions and new features ===


By integrating the DumpHTML and the Collection extension we want to give to everyone the capacity to easily create small ZIMs from the Wikipedia user interface with following advantages:
By integrating the DumpHTML and the Collection extension we want to give to everyone the capacity to easily create small ZIMs from the Wikipedia user interface with following advantages:

Navigation menu