From 7fedafc1aa45c2491e895b20532f1c6d6ef56bde Mon Sep 17 00:00:00 2001 From: Priit Laes Date: Wed, 9 Jun 2010 17:54:02 +0300 Subject: Added gsoc reports and proposal --- README | 124 ------------------------------------------------ docs/gsoc/01-report.txt | 61 ++++++++++++++++++++++++ docs/gsoc/proposal.txt | 124 ++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 185 insertions(+), 124 deletions(-) delete mode 100644 README create mode 100644 docs/gsoc/01-report.txt create mode 100644 docs/gsoc/proposal.txt diff --git a/README b/README deleted file mode 100644 index 47a5d75..0000000 --- a/README +++ /dev/null @@ -1,124 +0,0 @@ -Priit Laes -April 08, 2010 -draft-project-grumpy-gsoc-3 - -Project Grumpy -============== - -Project Objective -+++++++++++++++++ -There are many moments in every package maintainers life when one wishes that -one or another thing would be done automatically for him/her: - * Check which packages have identified common QA issues. - * Generate a stabilization list for the selection of packages. - * Get notified of packages that have new upstream versions. - * Get notifications of packages that can be stabilized if following the - 30-day guideline. - -Many such automated or semi-automated applications/scripts do exist, but they -are currently dispersed across the Internet in various different locations, -with typically no good connection between packages and the maintainer looking -for the information. These applications include tinderbox rindex/dindex -reports, gentoo-bumpchecker, manual repoman/pcheck runs, and so on. - -Project "Grumpy" is intended as a Gentoo Linux project to aggregate -the functionality of all these tools into one centralized application. - -Abstract -++++++++ -Project Grumpy is a set of applications for gathering, indexing and interacting -with various ebuild- and developer-related metadata. - -Grumpy Component Overview (aka deliverables) -++++++++++++++++++++++++++++++++++++++++++++ -This section gives an overview about the components and technologies that are -going to be used for this project. - -Grumpy Application Backend --------------------------- -Grumpy Application backend is the core of the Grumpy Application. Backend -handles data storage and indexing and consists of following components: - * Database storage for ebuild metadata - * Tools for gathering and managing metadata - * Portage indexer - * Upstream information checks (version bumps, issues, etc) - * User-interface tools (Web interface, commandline utilities) - -Database -~~~~~~~~ -The aim is to use document-oriented storage system (MongoDB) that allows easy -storing and retrieval of metadata in JSON-like data schema. MongoDB stands in a -gap between key-value stores (Memcached) and traditional RDBMS (PostgreSQL, -MySQL) systems. MongoDB also has facilities for advanced data aggregation -options like Map/reduce, replication and fail-over support and auto-sharding, -if ever needed. - -Tools for metadata management -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -There are basically three types of tools: - * Firstly, the tools that deal with low-level operations like keeping portage - contents in synchronization with database. For this part it is not yet clear - whether it is possible to use already existing software (Portage API or - pkgcore) or should it be implemented from scratch. - - * Secondly the tools that are used to query outside information for ebuild - related information (upstream version bumps, bugzilla status, tinderbox - results). Implementing tools for this part also requires working together - with various parties in order to make sure we always get the up-to-date - data in a format that can be easily understood by our tools. - - * Thirdly, utilities that allow users (Gentoo developers) to maintain various - kinds of information they are interested in. For this purpose there are - mainly two types of utilities in mind: Web application providing both - HTML-based interface and JSON API. The latter can be used also for - various command-line utilities. - -Timeline and Development Plan -+++++++++++++++++++++++++++++ -It is quite clear that the most crucial part in this project is the data -storage and portage indexer. When it is clear that contents in the database -can be kept in synchronization with Portage (this also includes package -moves, slotting changes) then works on other parts like upstream indexers -and web application can be started. Therefore I propose following tentative -timeline: - -24. May: Official start of project - * Implement portage synchronization with database - * Implement 30-day stabilization checker - * Implement upstream version checker for GNOME project -12. July: Mid-term evaluation submitting starts - * Inquiries on whether it's possible to use LDAP authentication for web app -16. July: Deadline for mid-term student evaluations - * First sketches for JSON-API via web application - * Few simple commandline utils for developers to manage packages of interest -9. August: Start of 'pencils down' -20. August: Final evaluation deadline - -Biography -+++++++++ -I am an undergraduate student of theoretical physics in University of Tartu and -my main research interest is cosmology and the nature of gravity. - -My leisure time mostly consists of working on various open-source or freelance -projects, reading (either about physics or science-fiction) and spending time -with friends. - -I have also held various positions in the past, including system administrator, -embedded software developer and web application developer. - -FUN: Origin of Grumpy's name -++++++++++++++++++++++++++++ -This is an excerpt from #gentoo-desktop channel on June 11. 2009:: - - 21:58 < leio> ok, I need a good codename for this maintainer website - thing where you would be able to look things up, like what to - bump, etc. Go! - 21:58 < plaes> grumpy ? :) - 21:58 < plaes> grumpy.gentoo.org ? :) - 21:59 < scarabeus> glocate - 21:59 < scarabeus> ;] - 21:59 < EvaSDK> grumpy++ - 22:00 < EvaSDK> that would be awesome - 22:00 < scarabeus> but i agree with grumpy too - 22:00 < scarabeus> :} - 22:00 < scarabeus> sounds cool :] diff --git a/docs/gsoc/01-report.txt b/docs/gsoc/01-report.txt new file mode 100644 index 0000000..6ab40c9 --- /dev/null +++ b/docs/gsoc/01-report.txt @@ -0,0 +1,61 @@ +This is a weekly progress report no. 1 for Project Grumpy. + +As this is the first publicly visible announcement, I am also going to +give a short overview about the project itself. + +The aim of this project is to create a database containing various +developer-related metadata about packages in the Gentoo portage. Metadata +that we are going to store can be used for different kinds of purposes, +some examples include upstream version checks and giving notifications to +developers who are interested about that package. And eventually provide +a nice web and API interface to access this data. +Project's semi-official IRC channel is #gentoo-grumpy on Freenode network. +Just step in say "Hi!" :) + +Last week's progress report +=========================== + +My first week went a bit slowly due to having some "unfinished business" +that I needed to finish, and also because of two exams (which went fine). + +The core issue I wrestled during this week was how to keep portage contents +and database contents in sync - ie. when ebuild is modified, removed or added, +how to make sure that database contents correspond to the portage contents. + +The solution that I came up with is to use a simple daemon that logs changes +to portage tree and modifies database contents when it's appropriate. +Appropriate here means that we shouldn't log updates during the update of +the tree as it might be unsafe (ie package rename). So currently it seems +that daemon has also initiate the rsync progress and push the updates into +database after rsync has finished successfully. (You can already see how +all kinds of weird corner cases start popping up :P ) + +My current approach to logging is using the inotify framework present in +Linux kernel since 2.6.13 (sorry BSD users, but this is Gentoo Linux +afterall) with the help of pyinotify [1]. So far there's only one +drawback to using inotify - by default kernel has a limit of 8192 directory +watches allowed per-process (but portage contains a lots of directories) +so in order to use that approach one has to bump the number watches using +/proc/sys/fs/inotify/max_user_watches tunable. 81920 has worked so far +fine on my machine ;) + +There was also a secondary approach suggested by my mentor Leio to parse +rsync log files, but I am a bit relucant about this idea. + +Anyway, I'll leave this idea simmering here for a while and unless someone +comes up with a better idea (Yes, I have also thought about scanning whole +portage tree every x-hours), I'm going to implement the daemon. + +Plans for current week +====================== + +As I currently consider the core issue solved, the next issue I have to solve +is how to take an ebuild, extract information about it and store it in +database. (Hint: pkgcore) + +I'm not going take bigger tasks because I still have one quite hard exam +(thermodynamics and statistical physics) on 4th of June. And if I pass, it +is the last one. + +PS. Sorry, no blog yet. I was using Zine, but it broke after I updated +my system to SQLAlchemy-0.6. diff --git a/docs/gsoc/proposal.txt b/docs/gsoc/proposal.txt new file mode 100644 index 0000000..47a5d75 --- /dev/null +++ b/docs/gsoc/proposal.txt @@ -0,0 +1,124 @@ +Priit Laes +April 08, 2010 +draft-project-grumpy-gsoc-3 + +Project Grumpy +============== + +Project Objective ++++++++++++++++++ +There are many moments in every package maintainers life when one wishes that +one or another thing would be done automatically for him/her: + * Check which packages have identified common QA issues. + * Generate a stabilization list for the selection of packages. + * Get notified of packages that have new upstream versions. + * Get notifications of packages that can be stabilized if following the + 30-day guideline. + +Many such automated or semi-automated applications/scripts do exist, but they +are currently dispersed across the Internet in various different locations, +with typically no good connection between packages and the maintainer looking +for the information. These applications include tinderbox rindex/dindex +reports, gentoo-bumpchecker, manual repoman/pcheck runs, and so on. + +Project "Grumpy" is intended as a Gentoo Linux project to aggregate +the functionality of all these tools into one centralized application. + +Abstract +++++++++ +Project Grumpy is a set of applications for gathering, indexing and interacting +with various ebuild- and developer-related metadata. + +Grumpy Component Overview (aka deliverables) +++++++++++++++++++++++++++++++++++++++++++++ +This section gives an overview about the components and technologies that are +going to be used for this project. + +Grumpy Application Backend +-------------------------- +Grumpy Application backend is the core of the Grumpy Application. Backend +handles data storage and indexing and consists of following components: + * Database storage for ebuild metadata + * Tools for gathering and managing metadata + * Portage indexer + * Upstream information checks (version bumps, issues, etc) + * User-interface tools (Web interface, commandline utilities) + +Database +~~~~~~~~ +The aim is to use document-oriented storage system (MongoDB) that allows easy +storing and retrieval of metadata in JSON-like data schema. MongoDB stands in a +gap between key-value stores (Memcached) and traditional RDBMS (PostgreSQL, +MySQL) systems. MongoDB also has facilities for advanced data aggregation +options like Map/reduce, replication and fail-over support and auto-sharding, +if ever needed. + +Tools for metadata management +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +There are basically three types of tools: + * Firstly, the tools that deal with low-level operations like keeping portage + contents in synchronization with database. For this part it is not yet clear + whether it is possible to use already existing software (Portage API or + pkgcore) or should it be implemented from scratch. + + * Secondly the tools that are used to query outside information for ebuild + related information (upstream version bumps, bugzilla status, tinderbox + results). Implementing tools for this part also requires working together + with various parties in order to make sure we always get the up-to-date + data in a format that can be easily understood by our tools. + + * Thirdly, utilities that allow users (Gentoo developers) to maintain various + kinds of information they are interested in. For this purpose there are + mainly two types of utilities in mind: Web application providing both + HTML-based interface and JSON API. The latter can be used also for + various command-line utilities. + +Timeline and Development Plan ++++++++++++++++++++++++++++++ +It is quite clear that the most crucial part in this project is the data +storage and portage indexer. When it is clear that contents in the database +can be kept in synchronization with Portage (this also includes package +moves, slotting changes) then works on other parts like upstream indexers +and web application can be started. Therefore I propose following tentative +timeline: + +24. May: Official start of project + * Implement portage synchronization with database + * Implement 30-day stabilization checker + * Implement upstream version checker for GNOME project +12. July: Mid-term evaluation submitting starts + * Inquiries on whether it's possible to use LDAP authentication for web app +16. July: Deadline for mid-term student evaluations + * First sketches for JSON-API via web application + * Few simple commandline utils for developers to manage packages of interest +9. August: Start of 'pencils down' +20. August: Final evaluation deadline + +Biography ++++++++++ +I am an undergraduate student of theoretical physics in University of Tartu and +my main research interest is cosmology and the nature of gravity. + +My leisure time mostly consists of working on various open-source or freelance +projects, reading (either about physics or science-fiction) and spending time +with friends. + +I have also held various positions in the past, including system administrator, +embedded software developer and web application developer. + +FUN: Origin of Grumpy's name +++++++++++++++++++++++++++++ +This is an excerpt from #gentoo-desktop channel on June 11. 2009:: + + 21:58 < leio> ok, I need a good codename for this maintainer website + thing where you would be able to look things up, like what to + bump, etc. Go! + 21:58 < plaes> grumpy ? :) + 21:58 < plaes> grumpy.gentoo.org ? :) + 21:59 < scarabeus> glocate + 21:59 < scarabeus> ;] + 21:59 < EvaSDK> grumpy++ + 22:00 < EvaSDK> that would be awesome + 22:00 < scarabeus> but i agree with grumpy too + 22:00 < scarabeus> :} + 22:00 < scarabeus> sounds cool :] -- cgit v1.2.3-65-gdbad