Difference between revisions of "TED"

← Older edit

TED (view source)

Revision as of 20:17, 1 March 2014

1,304 bytes added , 10 years ago

→‎Agenda

RashiqAhmad

21

edits

@@ Line 21: / Line 21: @@
 It would be best to scrape this site and add the metadata (Category, playlist etc.) by ourselves later on.
-== Libraries ==
+== Ideas to achieve it ==
-=== Networking ===
-The networking library we are going to use will be [http://requests.readthedocs.org/en/latest/ requests].<br>
-Requests is pretty easy to use and straightforward.
-=== Scraping ===
-The scraping library, that we are going to use will be [http://www.crummy.com/software/BeautifulSoup/ Beautifulsoup4 ].<br>
-You can easily go through all nodes of an HTML document with it. HTML elements can be either selected by CSS selectors or by regular expressions.
-=== Downloading Videos ===
-Downloading videos from TED is easy and straightforward. An example of an URL to a video can be found [http://download.ted.com/talks/YannDallAglio_2012X-light.mp4?apikey=489b859150fc58263f17110eeb44ed5fba4a3b22 here]
-The subtitles of videos are harder to get. They are all available on [http://www.amara.org/en/teams/ted/videos/ here] in multiple formats. We will use the caption format SRT. <br>
-=== Building HTML sites out of the scraped content  ===
-We want to 'export' our scraped data to html, so we can run the zim tool on it and create compressed zim files off it. <br>
-Out of all the possibilities [http://jinja.pocoo.org/docs/ Jinja2] seems to be the best library for that.
-== One way to achieve it ==
 # Retrieve the list of TED(x) presentations with medatas in a local database
 ## A whole list of the available TED talks is available [http://www.ted.com/talks/quick-list here] (official) or [http://goo.gl/lx9Ro here] (unofficial)
@@ Line 52: / Line 33: @@
 # Fill the HTML templates with the data from the XML/RDF and write the index pages in a target directory
 # Run zimwriterfs to create the corresponding ZIM file of your target directory
+== Agenda ==
+* First three days (17.-19.02.2014):
+** Planning on how this project can be realized {{done}}
+** Creation of a concept including a conceptional Zim file, that demonstrates the very basics of this project {{done}}
+* Rest of the first week (20. - 23.02.2104):
+** Collection of all the data {{done}}
+*** Writing the Scraper, that scrapes TED.com  {{done}}
+*** Writing the Scraper, that scrapes the TED translation page on ww.amara.org {{done}}
+*** Writing the html templates {{done}}
+*** Writing a python script, that dumps the scraped data into the HTML pages, creating static content {{done}}
+* First three days of the second week (24. - 26.02.2014):
+** Implementing the local database, that manages all the content  {{done}}
+** Implementing the search engine in Javascript, that allows the user to search through all of the content {{done}}
+** Finally: Creating the first prototype zim files
+* Rest of the second week (27.02 - 2.03.2014):
+** Improving everything
+** Fixing possible bugs
+** Possible other things:
+*** Implement a way to play html5 videos on the Android version of Kiwix (Bug can be found [http://sourceforge.net/p/kiwix/bugs/465/ here])
+== Implementation ==
+==== Networking ====
+The networking library we are going to use will be [http://requests.readthedocs.org/en/latest/ requests]. Requests is pretty easy to use and straightforward.
+==== Scraping ====
+The scraping library, that we are going to use will be [http://www.crummy.com/software/BeautifulSoup/ Beautifulsoup4 ]. You can easily go through all nodes of an HTML document with it. HTML elements can be either selected by CSS selectors or by regular expressions.
+==== Downloading Videos ====
+Downloading videos from TED is easy and straightforward. An example of an URL to a video can be found [http://download.ted.com/talks/YannDallAglio_2012X-light.mp4?apikey=489b859150fc58263f17110eeb44ed5fba4a3b22 here]
+==== Downloading subtitles ====
+The subtitles of videos are harder to get. They are all available on [http://www.amara.org/en/teams/ted/videos/ here] in multiple formats. We will use the caption format WebVTT.
+==== Building HTML sites out of the scraped content  ====
+We want to 'export' our scraped data to html, so we can run the zim tool on it and create compressed zim files off it. Out of all the possibilities [http://jinja.pocoo.org/docs/ Jinja2] seems to be the best library for that.
+==== Javascript client side filter/search solution ====
+...
+==== Templating solution to create pages ====

Difference between revisions of "TED"

TED (view source)

Revision as of 20:17, 1 March 2014

Navigation menu

Search