Talk:Mediawiki DumpHTML extension improvement

From Kiwix
Revision as of 17:17, 31 August 2011 by Kelson (talk | contribs)
Jump to navigation Jump to search

I highly appreciate your efforts to make automatic zim dumps happen. Two comments regarding mobile support:

  • Equations

As you mention in your proposal it is important for mobile usage have zim versions without images. (As else the zim files are too large for most potential users).

Removal of all images is fine for most cases, however it has the drawback that it removes mathematical equations as well, as these are rendered as images.

This issue should be considered in the generation of the zim files. A variant is to include the ALT text of equations (or all images if this is easier). Much better than nothing, but rather cryptic. An other variant is to include images for equations only. If the overhead in file size is high, it may even make sense to generate both variants. (Thus user could download either version with all images, equation images only, or without images (but with alt text for at least the equations)).

I never consider tex renderer equations as "images", they should and will be always there. This is easy to achieve with DumpHTML extension, because handlings (equations vs images) are different in the php code. I also do not think this should increase a lot of the ZIM file size at the end. We do not have any issue here I think. Kelson 19:17, 31 August 2011 (CEST)

Independently, it definitely also make sense (not only for mobile use case) to also have additionally zim files with a small selection of images only. I am aware that this is pretty complex to do, but I may be worth experimenting whether a pretty simple logic e.g. like "all equation images, first N images for 10000 most popular articles (or first image of all articles)" could give good enough results. However, this "zim-file with a selection of images"-feature should not delay the implementation of the automatic dumps, but it may make sense to be added later.

Fully agree with you (but only images, no equations). An algorithm to identify "important" pictures and sort them would be great. If you have time to work on that this would be really great. I'm also ready to help you to test the results with test ZIM files. This work should also IMO be granted. Kelson 19:17, 31 August 2011 (CEST)
  • Split

For user experience on mobile phones it would be nice to offer zim files split into 2GB chunks on the wikimedia servers. (In a zip archive, or as separate downloads, best to offer both)

4GB is the limit due to the FAT32 file system, but some phones have a 2GB limit.

It's true the mobile apps could include download functionality which splits the file during download, but this has some drawbacks:

1. The user may not be able or may not want to download that large files on the mobile phone.

2. It is pretty complex to support such a feature in an app. (i.p. on multiple platforms) This may mean that apps just don't support this feature and let it up to the user to download the files. If there are no split files available, which are pretty simple to install (connect mobile via usb, extract archive (or copy separate files) to mobile), many potential users won't be able to use the zim file. For this scenario having separate download is better, because this would allow downloading on the mobile phone as well (without app support), while the zip-file could normally not be downloaded (as larger than 2 (or 4) GB.

--Cip 18:59, 31 August 2011 (CEST)