Difference between revisions of "Athens 2023"
(→Goals) |
(→Budget: added total) |
||
(25 intermediate revisions by 5 users not shown) | |||
Line 18: | Line 18: | ||
We need to (does not have to be in this order): | We need to (does not have to be in this order): | ||
* Assess current situation | * Assess current situation | ||
** <s>Present Webrecorder/Kiwix current activities and projects</s> Ilya not available | |||
** Present current sofware stack and how it interacts together | ** Present current sofware stack and how it interacts together | ||
** List and identify the weaknesses (at least the one not clearly identify already) in the current architecture/software | ** List and identify the weaknesses (at least the one not clearly identify already) in the current architecture/software | ||
** [https://github.com/openzim/warc2zim/issues/86 Assess list of weaknesses reported by Jaifroid] | ** [https://github.com/openzim/warc2zim/issues/86 Assess list of weaknesses reported by Jaifroid] | ||
**Go over the crawler's CLI params to understand how/when to use them (<code>docker run --rm -it ghcr.io/openzim/zimit:dev crawl --help</code>) | |||
**<s>Status of <bdi>[https://github.com/webrecorder/browsertrix-crawler/issues/207 Success status code on failure]</bdi></s> | |||
**<s>Status of [https://github.com/webrecorder/browsertrix-crawler/issues/246 Disable browser updates]</s> Fixed in Zimit, but not yet upstream in Browsertrix | |||
**<s>Status of [https://github.com/webrecorder/browsertrix-crawler/issues/159 SSLError]</s> | |||
**[https://github.com/openzim/warc2zim/issues/109 First access to warc2zim file doesn't correctly catch external links] | |||
* Agree on future features/architecture | * Agree on future features/architecture | ||
Line 26: | Line 32: | ||
** [https://github.com/openzim/warc2zim/issues/65 How communicate to a user the boundaries of a ZIM?] | ** [https://github.com/openzim/warc2zim/issues/65 How communicate to a user the boundaries of a ZIM?] | ||
** [https://github.com/openzim/zimit/issues/126 Should we still use Service workers?] | ** [https://github.com/openzim/zimit/issues/126 Should we still use Service workers?] | ||
** [https://github.com/openzim/warc2zim/issues/72 What kind of size optimisation should we run?] | ** <s>[https://github.com/openzim/warc2zim/issues/72 What kind of size optimisation should we run?]</s> WONTFIX | ||
** [https://github.com/openzim/warc2zim/issues/104 Assess pseudo namespaces] | ** [https://github.com/openzim/warc2zim/issues/104 Assess pseudo namespaces] | ||
**<bdi>[https://github.com/openzim/zimit/issues/166 Should we accept invalid HTTPs?]</bdi> | |||
* Fix current bugs and weaknesses | * Fix current bugs and weaknesses | ||
** [https://github.com/openzim/zimit/issues/155 Incorrect relative URLs on top-level landing pages] | ** [https://github.com/openzim/zimit/issues/155 Incorrect relative URLs on top-level landing pages] | ||
** [https://github.com/openzim/zimit/issues/138 Out of scope homepage redirect] | ** [https://github.com/openzim/zimit/issues/138 Out of scope homepage redirect] | ||
** See how to simplify/improve Wabac ZIM related part | |||
**<bdi>[https://github.com/openzim/zimit-frontend/issues/35 Can't clear options]</bdi> | |||
* Implement new features | * Implement new features | ||
Line 37: | Line 46: | ||
** [https://github.com/openzim/zimit/issues/166 Implement --insecure command line argument] | ** [https://github.com/openzim/zimit/issues/166 Implement --insecure command line argument] | ||
** [https://github.com/openzim/python-libzim/issues/164 Add support for Linux/Arm64] | ** [https://github.com/openzim/python-libzim/issues/164 Add support for Linux/Arm64] | ||
** [https://github.com/openzim/warc2zim/issues/101 Introduce E2E automated testing] | |||
*Work on agreed implementation in readers that currently don't support Zimit/WARC | |||
**[https://github.com/kiwix/kiwix-js/issues/644 Kiwix JS] | |||
== Achievements == | == Achievements == | ||
'''Jaifroid:''' | |||
* Increased understanding of the warc2zim implementation and the underlying Replay software greatly, thanks to the help of MGautier, Kelson, and discussions with the others | |||
* Began work on integrating a standard implementation (based on the current Zimit and warc2zim versions) into Kiwix JS using wombat.js and wabac.js (Service Worker) | |||
** Although this is not yet functional, I achieved loading of the landing page into Kiwix JS, but not yet transformation of static links via the Service Worker | |||
** I successfully integrated the Kiwix JS Service Worker and wabac.js into a single Service Worker, with the Fetch routed first to wabac.js, and handed off to Kiwix JS SW when needing to extract assets from the ZIM | |||
** I successfully managed to load wombat.js into the iframe document, but the configuration is not yet correct | |||
** Work so far is in https://github.com/kiwix/kiwix-js/pull/1010 | |||
**EDIT 4/12/2023 '''This goal is now achieved and is in preview release in the Browser Extension offline-first PWA v3.11.5+''' | |||
* Worked on ironing out several issues with my non-SW-based implementation in KJSWL, using knowledge gleaned at the Hackathon | |||
** Greatly increased fidelity of rendering of Zimit-based archives, including a lot of dynamic content | |||
** As a good test, ''mesquartierschinois'' now loads flawlessly in the KJSWL implementation fully offline, with dynamic loading of the entries as the user scrolls. YouTube videos work and stream offline, but Vimeo is not implemented (it is a separate fuzzy transformation) | |||
** Many other dynamic ZIMs are now working very well | |||
** A severe issue with the app attempting to load assets as main pages has been resolved | |||
** Loading is pretty fast at least on a desktop PC, but it also runs on iOS acceptably (only in Safari). Android is slow but useable, especially once a site's assets are cached via Cache API. N.B. On Android, it is not possible to use Firefox, because Firefox for Android unfortunately has a bug which attempts to load the ZIM archive into memory or internal storage, which fails for large archives. Chrome / Edge or Samsung Internet work fine (the fastest is Samsung Internet due to its optimized file reading speed). | |||
** An implementation with the many changes can be tested at https://kiwix.github.io/kiwix-js-windows/dist/ | |||
'''Matthieu:''' | |||
Succeed to create a POC of warc2zim creating zim files with static rewriting and so not needing a Service Worker. | |||
== Agenda == | == Agenda == | ||
Line 57: | Line 90: | ||
[[Category:Hackathon]] | [[Category:Hackathon]] | ||
==Budget== | |||
*Hosting: CHF 1'874.05 | |||
*F&B: CHF 737.24 | |||
*Travel: <!--- 276.5.389.15+280+75+448.4 --->1'469.05 | |||
;Totalː 4'104.1 |
Latest revision as of 08:13, 23 May 2024
This page summarizes the plans for the Kiwix Hackathon 2023 in Athens (to not be confused with Hackathon 2023 Paris.
Date & Venue
From Thursday 18 May (evening) to Friday 26 May (morning) in Athens (we have rent a flat).
Logistics
DO NOT FORGET TO BRING AN EXTENSION CORD (and an adapter if you are not joining from mainland Europe).
FYI Greece uses the same C, E and F sockets as the rest of Europe.
Goals
The main goal of the hackahton is to focus on Zimit and all its software stack: Browsertrix, warc2zim, python-libzim, ...
We want to prepare next big iteration on Zimit, considering that current version is the result of of the first iteration of 2020-21.
We need to (does not have to be in this order):
- Assess current situation
Present Webrecorder/Kiwix current activities and projectsIlya not available- Present current sofware stack and how it interacts together
- List and identify the weaknesses (at least the one not clearly identify already) in the current architecture/software
- Assess list of weaknesses reported by Jaifroid
- Go over the crawler's CLI params to understand how/when to use them (
docker run --rm -it ghcr.io/openzim/zimit:dev crawl --help
) Status of Success status code on failureStatus of Disable browser updatesFixed in Zimit, but not yet upstream in BrowsertrixStatus of SSLError- First access to warc2zim file doesn't correctly catch external links
- Agree on future features/architecture
- Fix current bugs and weaknesses
- Incorrect relative URLs on top-level landing pages
- Out of scope homepage redirect
- See how to simplify/improve Wabac ZIM related part
- Can't clear options
- Implement new features
- Work on agreed implementation in readers that currently don't support Zimit/WARC
Achievements
Jaifroid:
- Increased understanding of the warc2zim implementation and the underlying Replay software greatly, thanks to the help of MGautier, Kelson, and discussions with the others
- Began work on integrating a standard implementation (based on the current Zimit and warc2zim versions) into Kiwix JS using wombat.js and wabac.js (Service Worker)
- Although this is not yet functional, I achieved loading of the landing page into Kiwix JS, but not yet transformation of static links via the Service Worker
- I successfully integrated the Kiwix JS Service Worker and wabac.js into a single Service Worker, with the Fetch routed first to wabac.js, and handed off to Kiwix JS SW when needing to extract assets from the ZIM
- I successfully managed to load wombat.js into the iframe document, but the configuration is not yet correct
- Work so far is in https://github.com/kiwix/kiwix-js/pull/1010
- EDIT 4/12/2023 This goal is now achieved and is in preview release in the Browser Extension offline-first PWA v3.11.5+
- Worked on ironing out several issues with my non-SW-based implementation in KJSWL, using knowledge gleaned at the Hackathon
- Greatly increased fidelity of rendering of Zimit-based archives, including a lot of dynamic content
- As a good test, mesquartierschinois now loads flawlessly in the KJSWL implementation fully offline, with dynamic loading of the entries as the user scrolls. YouTube videos work and stream offline, but Vimeo is not implemented (it is a separate fuzzy transformation)
- Many other dynamic ZIMs are now working very well
- A severe issue with the app attempting to load assets as main pages has been resolved
- Loading is pretty fast at least on a desktop PC, but it also runs on iOS acceptably (only in Safari). Android is slow but useable, especially once a site's assets are cached via Cache API. N.B. On Android, it is not possible to use Firefox, because Firefox for Android unfortunately has a bug which attempts to load the ZIM archive into memory or internal storage, which fails for large archives. Chrome / Edge or Samsung Internet work fine (the fastest is Samsung Internet due to its optimized file reading speed).
- An implementation with the many changes can be tested at https://kiwix.github.io/kiwix-js-windows/dist/
Matthieu:
Succeed to create a POC of warc2zim creating zim files with static rewriting and so not needing a Service Worker.
Agenda
From Friday to Sunday there is the Wikimedia Hackathon for which at least Matthieu and Kelson has registered.
After that we will be all gathered to focus on Zimit.
Attendees
- Kiwix
- Reg (remote)
- Kelson
- MGauthier
- Jaifroid
- Webrecorder
- Ilya (maybe)
Budget
- Hosting: CHF 1'874.05
- F&B: CHF 737.24
- Travel: 1'469.05
- Totalː 4'104.1