Difference between revisions of "Mediawiki DumpHTML extension improvement"

Jump to navigation Jump to search
no edit summary
Line 1: Line 1:
The '''DumpHTML extension improvement''' is an effort which needs to be granted as soon a possible to provide an efficient and a simple to use (and deploy) way for people wanting to bring offline their Mediawiki base wiki.
== Context ==
The ZIM format was choosen by the top actors around Mediawiki to provide an offline usable version of their content. The ZIM format was designed to deal efficiently with hugh amount of data. This format is complementary to the [https://secure.wikimedia.org/wikipedia/en/wiki/EPUB EPUB] which is more though for small content.
But, we still suffer from a lack of tools to build such files and only a few people have the mandatory know-how to do it.
We have currently
The [http://www.mediawiki.org/wiki/Extension:DumpHTML Mediawiki DumpHTML extension] is the best solution to export the dynamic generated HTML pages in a set of static HTML/Media files. This extension is better and has more potential to get a good set of HTML pages from a Mediawiki (in comparison with a Web site mirroring tool for example or an extern rendering solution). This is true especially if you deal with an big amount of content and actually this is the solution retained to the big ZIM [[Template:ZIMdumps|you may already yet download on Kiwix Web site]].
The [http://www.mediawiki.org/wiki/Extension:DumpHTML Mediawiki DumpHTML extension] is the best solution to export the dynamic generated HTML pages in a set of static HTML/Media files. This extension is better and has more potential to get a good set of HTML pages from a Mediawiki (in comparison with a Web site mirroring tool for example or an extern rendering solution). This is true especially if you deal with an big amount of content and actually this is the solution retained to the big ZIM [[Template:ZIMdumps|you may already yet download on Kiwix Web site]].


Line 6: Line 15:
* Does not generate ZIM files
* Does not generate ZIM files


== Challenges ==
Consequently, almost nobody use it right now to generate ZIM files, this is too complicated. This, although a lot of people want to do that and contact the Kiwix dev. Team to help them to make a ZIM of their own content. But not only external projects would benefit from such a development, we would also gain a lot in efficiency and this would be the first mandatory step to prepare automatically ZIM files.
Consequently, almost nobody use it right now to generate ZIM files, this is too complicated. This, although a lot of people want to do that and contact the Kiwix dev. Team to help them to make a ZIM of their own content. But not only external projects would benefit from such a development, we would also gain a lot in efficiency and this would be the first mandatory step to prepare automatically ZIM files.


==== Workpackage1: Revamping and fixing bugs ====
== Workpackages ==
=== Workpackage1: Revamping and fixing bugs ===


The worth point is that the DumpHTML extension is not correctly maintained and with the time, [https://bugzilla.wikimedia.org/buglist.cgi?query_format=advanced&list_id=2671&component=DumpHTML&resolution=---&product=MediaWiki%20extensions many issues were discovered]. Currently, the extension is not really usable without fixing/tweaking the Mediawiki code.
The worth point is that the DumpHTML extension is not correctly maintained and with the time, [https://bugzilla.wikimedia.org/buglist.cgi?query_format=advanced&list_id=2671&component=DumpHTML&resolution=---&product=MediaWiki%20extensions many issues were discovered]. Currently, the extension is not really usable without fixing/tweaking the Mediawiki code.
Line 22: Line 33:
* ~ 4000 euros
* ~ 4000 euros


==== Workpackage2: phpzim creation an integration in DumpHTML extension ====
=== Workpackage2: phpzim creation an integration in DumpHTML extension ===


phpzim would be a new php module allowing to create/write and read ZIM file directly in PHP. This would be a binding of the zimlib, like pyzim in Python. With this library done, we will be able to create ZIM file directly from the DumpHTML.
phpzim would be a new php module allowing to create/write and read ZIM file directly in PHP. This would be a binding of the zimlib, like pyzim in Python. With this library done, we will be able to create ZIM file directly from the DumpHTML.
Line 37: Line 48:
* ~ 2500 euros
* ~ 2500 euros


==== Workpackage3: Integrating Collection and DumpHTML extensions and new features ====
=== Workpackage3: Integrating Collection and DumpHTML extensions and new features ===


By integrating the DumpHTML and the Collection extension we want to give to everyone the capacity to easily create small ZIMs from the Wikipedia user interface with following advantages:
By integrating the DumpHTML and the Collection extension we want to give to everyone the capacity to easily create small ZIMs from the Wikipedia user interface with following advantages:
Line 55: Line 66:
Costs:
Costs:
* ~ 3500 euros
* ~ 3500 euros
== Realisation ==
The realisation of the whole project would take 4 months and would be supervised by Kelson (creator and lead developer of Kiwix) and a member of WMFR. Payment would be done after validation by both supervisers of each workpackage.
If you want to know more:
* Kiwix presentation document (in French): http://www.kiwix.org/images/6/6f/Kiwix_presentation_fr.pdf
* Kiwix official Web site:http://www.kiwix.org

Navigation menu