Difference between revisions of "Athens 2023"

From Kiwix
Jump to navigation Jump to search
(Created page with "This page summarizes the plans for the '''Kiwix Hackathon 2023''' in Athens (to not be confused with Hackathon 2023 Paris. == Date & Venue == From Thursday 18 May (evenin...")
 
(→‎Budget: added total)
 
(39 intermediate revisions by 5 users not shown)
Line 7: Line 7:


DO NOT FORGET TO BRING AN EXTENSION CORD (and an adapter if you are not joining from mainland Europe).
DO NOT FORGET TO BRING AN EXTENSION CORD (and an adapter if you are not joining from mainland Europe).
FYI Greece uses the same C, E and F sockets as the rest of Europe.


== Goals ==
== Goals ==
Line 15: Line 17:


We need to (does not have to be in this order):
We need to (does not have to be in this order):
# Asses current situation
* Assess current situation
# Agree on future architecture
** <s>Present Webrecorder/Kiwix current activities and projects</s> Ilya not available
# Plan future Zimit 2.0.0
** Present current sofware stack and how it interacts together
# Fix current bugs and weaknesses
** List and identify the weaknesses (at least the one not clearly identify already) in the current architecture/software
# Implement new features
** [https://github.com/openzim/warc2zim/issues/86 Assess list of weaknesses reported by Jaifroid]
**Go over the crawler's CLI params to understand how/when to use them (<code>docker run --rm -it ghcr.io/openzim/zimit:dev crawl --help</code>)
**<s>Status of <bdi>[https://github.com/webrecorder/browsertrix-crawler/issues/207 Success status code on failure]</bdi></s>
**<s>Status of [https://github.com/webrecorder/browsertrix-crawler/issues/246 Disable browser updates]</s> Fixed in Zimit, but not yet upstream in Browsertrix
**<s>Status of [https://github.com/webrecorder/browsertrix-crawler/issues/159 SSLError]</s>
**[https://github.com/openzim/warc2zim/issues/109 First access to warc2zim file doesn't correctly catch external links]
 
* Agree on future features/architecture
** [https://github.com/openzim/warc2zim/issues/81 Get WACZ presented and decided if we should use it]
** [https://github.com/openzim/warc2zim/issues/65 How communicate to a user the boundaries of a ZIM?]
** [https://github.com/openzim/zimit/issues/126 Should we still use Service workers?]
** <s>[https://github.com/openzim/warc2zim/issues/72 What kind of size optimisation should we run?]</s> WONTFIX
** [https://github.com/openzim/warc2zim/issues/104 Assess pseudo namespaces]
**<bdi>[https://github.com/openzim/zimit/issues/166 Should we accept invalid HTTPs?]</bdi>
 
* Fix current bugs and weaknesses
** [https://github.com/openzim/zimit/issues/155 Incorrect relative URLs on top-level landing pages]
** [https://github.com/openzim/zimit/issues/138 Out of scope homepage redirect]
** See how to simplify/improve Wabac ZIM related part
**<bdi>[https://github.com/openzim/zimit-frontend/issues/35 Can't clear options]</bdi>
 
* Implement new features
** [https://github.com/openzim/warc2zim/issues/98 Better SW non-supported page]
** [https://github.com/openzim/zimit/issues/166 Implement --insecure command line argument]
** [https://github.com/openzim/python-libzim/issues/164 Add support for Linux/Arm64]
** [https://github.com/openzim/warc2zim/issues/101 Introduce E2E automated testing]
*Work on agreed implementation in readers that currently don't support Zimit/WARC
**[https://github.com/kiwix/kiwix-js/issues/644 Kiwix JS]


== Achievements ==
== Achievements ==
'''Jaifroid:'''
* Increased understanding of the warc2zim implementation and the underlying Replay software greatly, thanks to the help of MGautier, Kelson, and discussions with the others
* Began work on integrating a standard implementation (based on the current Zimit and warc2zim versions) into Kiwix JS using wombat.js and wabac.js (Service Worker)
** Although this is not yet functional, I achieved loading of the landing page into Kiwix JS, but not yet transformation of static links via the Service Worker
** I successfully integrated the Kiwix JS Service Worker and wabac.js into a single Service Worker, with the Fetch routed first to wabac.js, and handed off to Kiwix JS SW when needing to extract assets from the ZIM
** I successfully managed to load wombat.js into the iframe document, but the configuration is not yet correct
** Work so far is in https://github.com/kiwix/kiwix-js/pull/1010
**EDIT 4/12/2023 '''This goal is now achieved and is in preview release in the Browser Extension offline-first PWA v3.11.5+'''
* Worked on ironing out several issues with my non-SW-based implementation in KJSWL, using knowledge gleaned at the Hackathon
** Greatly increased fidelity of rendering of Zimit-based archives, including a lot of dynamic content
** As a good test, ''mesquartierschinois'' now loads flawlessly in the KJSWL implementation fully offline, with dynamic loading of the entries as the user scrolls. YouTube videos work and stream offline, but Vimeo is not implemented (it is a separate fuzzy transformation)
** Many other dynamic ZIMs are now working very well
** A severe issue with the app attempting to load assets as main pages has been resolved
** Loading is pretty fast at least on a desktop PC, but it also runs on iOS acceptably (only in Safari). Android is slow but useable, especially once a site's assets are cached via Cache API. N.B. On Android, it is not possible to use Firefox, because Firefox for Android unfortunately has a bug which attempts to load the ZIM archive into memory or internal storage, which fails for large archives. Chrome / Edge or Samsung Internet work fine (the fastest is Samsung Internet due to its optimized file reading speed).
** An implementation with the many changes can be tested at https://kiwix.github.io/kiwix-js-windows/dist/
'''Matthieu:'''
Succeed to create a POC of warc2zim creating zim files with static rewriting and so not needing a Service Worker.


== Agenda ==
== Agenda ==
Line 27: Line 77:
From Friday to Sunday there is the [https://www.mediawiki.org/wiki/Wikimedia_Hackathon_2023 Wikimedia Hackathon] for which at least Matthieu and Kelson has registered.
From Friday to Sunday there is the [https://www.mediawiki.org/wiki/Wikimedia_Hackathon_2023 Wikimedia Hackathon] for which at least Matthieu and Kelson has registered.


After that we will be all gathered to focs on Zimit.
After that we will be all gathered to focus on Zimit.


== Attendees ==
== Attendees ==
Line 40: Line 90:


[[Category:Hackathon]]
[[Category:Hackathon]]
==Budget==
*Hosting: CHF 1'874.05
*F&B: CHF 737.24
*Travel: <!--- 276.5.389.15+280+75+448.4 --->1'469.05
;Totalː 4'104.1

Latest revision as of 08:13, 23 May 2024

This page summarizes the plans for the Kiwix Hackathon 2023 in Athens (to not be confused with Hackathon 2023 Paris.

Date & Venue

From Thursday 18 May (evening) to Friday 26 May (morning) in Athens (we have rent a flat).

Logistics

DO NOT FORGET TO BRING AN EXTENSION CORD (and an adapter if you are not joining from mainland Europe).

FYI Greece uses the same C, E and F sockets as the rest of Europe.

Goals

The main goal of the hackahton is to focus on Zimit and all its software stack: Browsertrix, warc2zim, python-libzim, ...

We want to prepare next big iteration on Zimit, considering that current version is the result of of the first iteration of 2020-21.

We need to (does not have to be in this order):

Achievements

Jaifroid:

  • Increased understanding of the warc2zim implementation and the underlying Replay software greatly, thanks to the help of MGautier, Kelson, and discussions with the others
  • Began work on integrating a standard implementation (based on the current Zimit and warc2zim versions) into Kiwix JS using wombat.js and wabac.js (Service Worker)
    • Although this is not yet functional, I achieved loading of the landing page into Kiwix JS, but not yet transformation of static links via the Service Worker
    • I successfully integrated the Kiwix JS Service Worker and wabac.js into a single Service Worker, with the Fetch routed first to wabac.js, and handed off to Kiwix JS SW when needing to extract assets from the ZIM
    • I successfully managed to load wombat.js into the iframe document, but the configuration is not yet correct
    • Work so far is in https://github.com/kiwix/kiwix-js/pull/1010
    • EDIT 4/12/2023 This goal is now achieved and is in preview release in the Browser Extension offline-first PWA v3.11.5+
  • Worked on ironing out several issues with my non-SW-based implementation in KJSWL, using knowledge gleaned at the Hackathon
    • Greatly increased fidelity of rendering of Zimit-based archives, including a lot of dynamic content
    • As a good test, mesquartierschinois now loads flawlessly in the KJSWL implementation fully offline, with dynamic loading of the entries as the user scrolls. YouTube videos work and stream offline, but Vimeo is not implemented (it is a separate fuzzy transformation)
    • Many other dynamic ZIMs are now working very well
    • A severe issue with the app attempting to load assets as main pages has been resolved
    • Loading is pretty fast at least on a desktop PC, but it also runs on iOS acceptably (only in Safari). Android is slow but useable, especially once a site's assets are cached via Cache API. N.B. On Android, it is not possible to use Firefox, because Firefox for Android unfortunately has a bug which attempts to load the ZIM archive into memory or internal storage, which fails for large archives. Chrome / Edge or Samsung Internet work fine (the fastest is Samsung Internet due to its optimized file reading speed).
    • An implementation with the many changes can be tested at https://kiwix.github.io/kiwix-js-windows/dist/


Matthieu:

Succeed to create a POC of warc2zim creating zim files with static rewriting and so not needing a Service Worker.

Agenda

From Friday to Sunday there is the Wikimedia Hackathon for which at least Matthieu and Kelson has registered.

After that we will be all gathered to focus on Zimit.

Attendees

Kiwix
  • Reg (remote)
  • Kelson
  • MGauthier
  • Jaifroid
Webrecorder
  • Ilya (maybe)

Budget

  • Hosting: CHF 1'874.05
  • F&B: CHF 737.24
  • Travel: 1'469.05
Totalː 4'104.1