Difference between revisions of "Mediawiki DumpHTML extension improvement"

Jump to navigation Jump to search
no edit summary
Line 1: Line 1:
The '''Mediawiki DumpHTML extension improvement''' is an effort which needs to be granted as soon a possible to provide an efficient and a simple to use (and deploy) way for people wanting to bring offline their Mediawiki base wiki.
The '''Mediawiki DumpHTML extension improvement''' is an effort which needs to be granted for being able to provide an efficient and a handful solution to make ZIM files from Mediawiki. With the solution we propose to develop, everyone from the Mediawiki Administrator to the normal user will be able to generate best quality small or big ZIM files.


== Context ==
== Context ==
The ZIM format was choosen by the top actors around Mediawiki to provide an offline usable version of their content. The ZIM format was designed to deal efficiently with hugh amount of data. This format is complementary to the [https://secure.wikimedia.org/wikipedia/en/wiki/EPUB EPUB] which is more though for small content.  
The [http://www.openzim.org ZIM] format was choosen by the top actors around Mediawiki to provide a offline usable versions of the content. [http://www.kiwix.org Kiwix] is the reference reader supporting it, but ZIM is an open format, and they are also [http://www.openzim.org/ZIM_Readers other readers].


But, we still suffer from a lack of tools to build such files and only a few people have the mandatory know-how and tools to do it:
The ZIM format was designed to deal efficiently with hugh amount of data, that means that you may deal with millions of pictures and text extremly quickly also on a small device like a smartphone. This format is complementary to the https://secure.wikimedia.org/wikipedia/en/wiki/EPUB EPUB] which is more though for small content and unable to scale.
* Kiwix, using a hacked version of Mediawiki DumpHTML extension, which currently is the only one project generating (and more or less able) big ZIM files from WMF projects. [[Template:ZIMdumps|You may already yet download on Kiwix Web site such ZIM files]].
* Mediawiki Collection extension developed by Pediapress which is on Wikipedia user friendly but really complicated too install on a separate instance, slow and not able at all to deal with huge amount of data. In addition, the technical approach is they are not able at all to tune the content efficiently for offline usages.


Our year long experience showed us that the [http://www.mediawiki.org/wiki/Extension:DumpHTML Mediawiki DumpHTML extension] (at least the approach) is the best solution to export the Mediawiki dynamic generated HTML pages in a set of static HTML/Media files. This extension is better and has more potential to get a good set of HTML pages from a Mediawiki (in comparison with a Web site mirroring tool for example or an extern rendering solution).  
Unfortunately, the development of the format is brake by a lack of softwares. We suffer from a lack of tools to build ZIM files and actually only a few people have the mandatory know-how and software solutions to do it:
* Kiwix, which use a solution based on a hacked version of [http://www.mediawiki.org/wiki/Extension:DumpHTML Mediawiki DumpHTML extension], with additional custom scripts. This is currently the only one project generating (and more or less able to generate) big ZIM files from WMF projects. [[Template:ZIMdumps|You may download these ZIM files here]]. This solution is currently not usable at all project external people.
* Mediawiki Collection extension developed by Pediapress (which is deployed on Wikipedia) is user friendly but suffer of many issues: (1) Really complicated to install on a separate instance (2) slow and not able at all to deal with huge amount of data (4) Rendering quality far away of online version (5) technical approach makes they are not able at all to tune the content rendering for offline usage.


Unfortunately, they are few pain points:
Our years long experience showed us that the [http://www.mediawiki.org/wiki/Extension:DumpHTML Mediawiki DumpHTML extension] (at least the approach) is the best solution to export the Mediawiki dynamic generated HTML to an offline usable format. Unfortunately, they are few pain points:
* Not maintained, bugs are not fixed, new features are not implemented
* Not maintained, bugs are not fixed, new features are not implemented
* Only available for Mediawiki system admin
* Only available for Mediawiki system admin
Line 16: Line 16:


== Challenges ==
== Challenges ==
Consequently, almost nobody use it right now to generate ZIM files, this is too complicated. This, although a lot of people want to do that and contact the Kiwix dev. Team to help them to make a ZIM of their own content. But not only external projects would benefit from such a development, we would also gain a lot in efficiency and this would be the first mandatory step to prepare automatically ZIM files.
Consequently, almost nobody uses it right now to generate ZIM files, this is too complicated and buggy. This, although a lot of people want to do that and contact the Kiwix dev. Team to help them to make a ZIM of their own content. But not only external projects would benefit from such a development, we would also gain a lot in efficiency and this would be the first mandatory step to prepare automatically ZIM files.


== Workpackages ==
== Workpackages ==

Navigation menu