Zimfarm
This page is a draft
The goal of this project is to create a distributed zimfarm, a software solution which would allow to build ZIM files (from any type) on many computers.
This solutions would include:
- A master which would run a deamon software
- Many slaves which would get tasks from the master
The master would run a custom scheduler/tasker, probably based on Celery which would know about:
- The allowed slaves (it's impossible for security reason, to allow any computer to work as a slave)
- The task pipeline (todo/doing/done)
Each task would have many properties:
- Estimated duration
- Estimated memory use
- Estimated bandwidth use
- Estimated mass storage use
- Type of scrapper (mwoffliner, TED, ...)
The scheduler should have a clear procedure to be filled automatically (via an API?) as wee need to periodically do things. Maybe the scheduler has a system to clone task periodically (for example every months). Anyway, we will deall with thousands of ZIM file to build, to this should be pretty easy to add task via a software.
The scheduler needs also a bit of monitoring solutions to follow a bit what is going one. Here the more is the better.
Like written before, clients need to register first. to the service (probably added manually by system operator). So an authentication system needs to be implemented to avoid any high-jacking. Then each client won't be able to execute any kind of task due to limitations: hardware of softwares... so each client has somehow a profile). I suggest here that we also just provide a docker image which would be then easy to configure.