Difference between revisions of "Talk:Wikipedia"

From Kiwix
Jump to navigation Jump to search
(→‎Creating the files with images: - Further clarified my question)
Line 2: Line 2:
How are the files which include images created? I can see making the smaller ones with the export tool, but the 40GB one with all the images seems harder. I found [http://permalink.gmane.org/gmane.org.wikimedia.offline/521 An old thread] listing some of the tools used, but the default way to use those tools seems to hit Wikipedia's servers fairly hard, at least if everything is working correctly and all the downloads work. Is there a way to do this from the [http://dumps.wikimedia.org/enwiki/20140304/ official dumps], which should be both faster and less likely to overload the Wikipedia servers? I think I can do this by hosting a Wikimedia instance with Parsoid myself and setting the script's <code>parsoidUrl</code> to "http://localhost:8000/localhost/" and <code>hostUrl</code> to "http://localhost/". Would this actually work? Would it have any subtle issues to be aware of? Is there a better solution at the moment? --[[User:DanielH|DanielH]] ([[User talk:DanielH|talk]]) 09:07, 8 March 2014 (CET)
How are the files which include images created? I can see making the smaller ones with the export tool, but the 40GB one with all the images seems harder. I found [http://permalink.gmane.org/gmane.org.wikimedia.offline/521 An old thread] listing some of the tools used, but the default way to use those tools seems to hit Wikipedia's servers fairly hard, at least if everything is working correctly and all the downloads work. Is there a way to do this from the [http://dumps.wikimedia.org/enwiki/20140304/ official dumps], which should be both faster and less likely to overload the Wikipedia servers? I think I can do this by hosting a Wikimedia instance with Parsoid myself and setting the script's <code>parsoidUrl</code> to "http://localhost:8000/localhost/" and <code>hostUrl</code> to "http://localhost/". Would this actually work? Would it have any subtle issues to be aware of? Is there a better solution at the moment? --[[User:DanielH|DanielH]] ([[User talk:DanielH|talk]]) 09:07, 8 March 2014 (CET)
:This email tells you how it is done, you can get more information [http://openzim.org/wiki/Build_your_ZIM_file available here]. There is no easy way to create ZIM files from the XML/Wikicode, but what you propose will certainly work. The idea is not that everybody creates ZIM files of Wikipedia with millions of entries, but that we create them and other use them. But, we really want to provide all Wikipedia ZIM file at least with a new version per month. If you need a ZIM file which is not already available, open a feature request. [[User:Kelson|Kelson]] ([[User talk:Kelson|talk]]) 10:15, 8 March 2014 (CET)
:This email tells you how it is done, you can get more information [http://openzim.org/wiki/Build_your_ZIM_file available here]. There is no easy way to create ZIM files from the XML/Wikicode, but what you propose will certainly work. The idea is not that everybody creates ZIM files of Wikipedia with millions of entries, but that we create them and other use them. But, we really want to provide all Wikipedia ZIM file at least with a new version per month. If you need a ZIM file which is not already available, open a feature request. [[User:Kelson|Kelson]] ([[User talk:Kelson|talk]]) 10:15, 8 March 2014 (CET)
:: The email says what tools were used, but not how. As you now know, I've already read the page you linked, which doesn't even mention mwoffliner (this should probably be fixed by somebody who understands mwoffliner). The documentation for mwoffliner helps, but seems to want to be used with the online servers instead of dumps, which is wasteful and slow; that's why I'm trying to figure out how to combine it with the dumps. I don't actually want to do Wikipedia myself; I want several other English Wikimedia projects ''and the corresponding pictures'' (if I didn't want the images I'd just use the dumps, a local Wikimedia install, Wiki2html and zimwriterfs as mentioned on the page you linked). Is it still OK to open a feature request for English non-Wikipedia Wikimedia projects? And even if you did provide all these files, I'd still want to know ''how'' you generated them. --[[User:DanielH|DanielH]] ([[User talk:DanielH|talk]]) 05:27, 9 March 2014 (CET)

Revision as of 04:27, 9 March 2014

Creating the files with images

How are the files which include images created? I can see making the smaller ones with the export tool, but the 40GB one with all the images seems harder. I found An old thread listing some of the tools used, but the default way to use those tools seems to hit Wikipedia's servers fairly hard, at least if everything is working correctly and all the downloads work. Is there a way to do this from the official dumps, which should be both faster and less likely to overload the Wikipedia servers? I think I can do this by hosting a Wikimedia instance with Parsoid myself and setting the script's parsoidUrl to "http://localhost:8000/localhost/" and hostUrl to "http://localhost/". Would this actually work? Would it have any subtle issues to be aware of? Is there a better solution at the moment? --DanielH (talk) 09:07, 8 March 2014 (CET)

This email tells you how it is done, you can get more information available here. There is no easy way to create ZIM files from the XML/Wikicode, but what you propose will certainly work. The idea is not that everybody creates ZIM files of Wikipedia with millions of entries, but that we create them and other use them. But, we really want to provide all Wikipedia ZIM file at least with a new version per month. If you need a ZIM file which is not already available, open a feature request. Kelson (talk) 10:15, 8 March 2014 (CET)
The email says what tools were used, but not how. As you now know, I've already read the page you linked, which doesn't even mention mwoffliner (this should probably be fixed by somebody who understands mwoffliner). The documentation for mwoffliner helps, but seems to want to be used with the online servers instead of dumps, which is wasteful and slow; that's why I'm trying to figure out how to combine it with the dumps. I don't actually want to do Wikipedia myself; I want several other English Wikimedia projects and the corresponding pictures (if I didn't want the images I'd just use the dumps, a local Wikimedia install, Wiki2html and zimwriterfs as mentioned on the page you linked). Is it still OK to open a feature request for English non-Wikipedia Wikimedia projects? And even if you did provide all these files, I'd still want to know how you generated them. --DanielH (talk) 05:27, 9 March 2014 (CET)