From 7fedafc1aa45c2491e895b20532f1c6d6ef56bde Mon Sep 17 00:00:00 2001
From: Priit Laes <plaes@plaes.org>
Date: Wed, 9 Jun 2010 17:54:02 +0300
Subject: Added gsoc reports and proposal

---
 README                  | 124 ------------------------------------------------
 docs/gsoc/01-report.txt |  61 ++++++++++++++++++++++++
 docs/gsoc/proposal.txt  | 124 ++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 185 insertions(+), 124 deletions(-)
 delete mode 100644 README
 create mode 100644 docs/gsoc/01-report.txt
 create mode 100644 docs/gsoc/proposal.txt

diff --git a/README b/README
deleted file mode 100644
index 47a5d75..0000000
--- a/README
+++ /dev/null
@@ -1,124 +0,0 @@
-Priit Laes
-April 08, 2010
-draft-project-grumpy-gsoc-3
-
-Project Grumpy
-==============
-
-Project Objective
-+++++++++++++++++
-There are many moments in every package maintainers life when one wishes that 
-one or another thing would be done automatically for him/her:
-  * Check which packages have identified common QA issues.
-  * Generate a stabilization list for the selection of packages.
-  * Get notified of packages that have new upstream versions.
-  * Get notifications of packages that can be stabilized if following the
-    30-day guideline.
-
-Many such automated or semi-automated applications/scripts do exist, but they
-are currently dispersed across the Internet in various different locations,
-with typically no good connection between packages and the maintainer looking
-for the information. These applications include tinderbox rindex/dindex
-reports, gentoo-bumpchecker, manual repoman/pcheck runs, and so on.
-
-Project "Grumpy" is intended as a Gentoo Linux project to aggregate
-the functionality of all these tools into one centralized application.
-
-Abstract
-++++++++
-Project Grumpy is a set of applications for gathering, indexing and interacting
-with various ebuild- and developer-related metadata.
-
-Grumpy Component Overview (aka deliverables)
-++++++++++++++++++++++++++++++++++++++++++++
-This section gives an overview about the components and technologies that are
-going to be used for this project.
-
-Grumpy Application Backend
---------------------------
-Grumpy Application backend is the core of the Grumpy Application. Backend
-handles data storage and indexing and consists of following components:
-  * Database storage for ebuild metadata
-  * Tools for gathering and managing metadata
-  * Portage indexer
-  * Upstream information checks (version bumps, issues, etc)
-  * User-interface tools (Web interface, commandline utilities)
-
-Database
-~~~~~~~~
-The aim is to use document-oriented storage system (MongoDB) that allows easy
-storing and retrieval of metadata in JSON-like data schema. MongoDB stands in a
-gap between key-value stores (Memcached) and traditional RDBMS (PostgreSQL,
-MySQL) systems. MongoDB also has facilities for advanced data aggregation
-options like Map/reduce, replication and fail-over support and auto-sharding,
-if ever needed.
-
-Tools for metadata management
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-There are basically three types of tools:
-  * Firstly, the tools that deal with low-level operations like keeping portage
-    contents in synchronization with database. For this part it is not yet clear
-    whether it is possible to use already existing software (Portage API or
-    pkgcore) or should it be implemented from scratch.
-
-  * Secondly the tools that are used to query outside information for ebuild
-    related information (upstream version bumps, bugzilla status, tinderbox
-    results). Implementing tools for this part also requires working together
-    with various parties in order to make sure we always get the up-to-date
-    data in a format that can be easily understood by our tools.
-
-  * Thirdly, utilities that allow users (Gentoo developers) to maintain various
-    kinds of information they are interested in. For this purpose there are
-    mainly two types of utilities in mind: Web application providing both
-    HTML-based interface and JSON API. The latter can be used also for
-    various command-line utilities.
-
-Timeline and Development Plan
-+++++++++++++++++++++++++++++
-It is quite clear that the most crucial part in this project is the data
-storage and portage indexer. When it is clear that contents in the database
-can be kept in synchronization with Portage (this also includes package
-moves, slotting changes) then works on other parts like upstream indexers
-and web application can be started. Therefore I propose following tentative
-timeline:
-
-24. May: Official start of project
-  * Implement portage synchronization with database
-  * Implement 30-day stabilization checker
-  * Implement upstream version checker for GNOME project
-12. July: Mid-term evaluation submitting starts
-  * Inquiries on whether it's possible to use LDAP authentication for web app
-16. July: Deadline for mid-term student evaluations
-  * First sketches for JSON-API via web application
-  * Few simple commandline utils for developers to manage packages of interest
-9. August: Start of 'pencils down'
-20. August: Final evaluation deadline
-
-Biography
-+++++++++
-I am an undergraduate student of theoretical physics in University of Tartu and
-my main research interest is cosmology and the nature of gravity.
-
-My leisure time mostly consists of working on various open-source or freelance
-projects, reading (either about physics or science-fiction) and spending time
-with friends.
-
-I have also held various positions in the past, including system administrator,
-embedded software developer and web application developer.
-
-FUN: Origin of Grumpy's name
-++++++++++++++++++++++++++++
-This is an excerpt from #gentoo-desktop channel on June 11. 2009::
-
-  21:58 < leio> ok, I need a good codename for this maintainer website
-                thing where you would be able to look things up, like what to
-                bump, etc. Go!
-  21:58 < plaes> grumpy ? :)
-  21:58 < plaes> grumpy.gentoo.org ? :)
-  21:59 < scarabeus> glocate
-  21:59 < scarabeus> ;]
-  21:59 < EvaSDK> grumpy++
-  22:00 < EvaSDK> that would be awesome
-  22:00 < scarabeus> but i agree with grumpy too
-  22:00 < scarabeus> :}
-  22:00 < scarabeus> sounds cool :]
diff --git a/docs/gsoc/01-report.txt b/docs/gsoc/01-report.txt
new file mode 100644
index 0000000..6ab40c9
--- /dev/null
+++ b/docs/gsoc/01-report.txt
@@ -0,0 +1,61 @@
+This is a weekly progress report no. 1 for Project Grumpy.
+
+As this is the first publicly visible announcement, I am also going to
+give a short overview about the project itself.
+
+The aim of this project is to create a database containing various 
+developer-related metadata about packages in the Gentoo portage. Metadata
+that we are going to store can be used for different kinds of purposes,
+some examples include upstream version checks and giving notifications to
+developers who are interested about that package. And eventually provide 
+a nice web and API interface to access this data.
+Project's semi-official IRC channel is #gentoo-grumpy on Freenode network.
+Just step in say "Hi!" :)
+
+Last week's progress report
+===========================
+
+My first week went a bit slowly due to having some "unfinished business"
+that I needed to finish, and also because of two exams (which went fine).
+
+The core issue I wrestled during this week was how to keep portage contents
+and database contents in sync - ie. when ebuild is modified, removed or added,
+how to make sure that database contents correspond to the portage contents.
+
+The solution that I came up with is to use a simple daemon that logs changes
+to portage tree and modifies database contents when it's appropriate.
+Appropriate here means that we shouldn't log updates during the update of 
+the tree as it might be unsafe (ie package rename). So currently it seems
+that daemon has also initiate the rsync progress and push the updates into
+database after rsync has finished successfully. (You can already see how
+all kinds of weird corner cases start popping up :P )
+
+My current approach to logging is using the inotify framework present in
+Linux kernel since 2.6.13 (sorry BSD users, but this is Gentoo Linux 
+afterall) with the help of pyinotify [1]. So far there's only one
+drawback to using inotify - by default kernel has a limit of 8192 directory
+watches allowed per-process (but portage contains a lots of directories)
+so in order to use that approach one has to bump the number watches using 
+/proc/sys/fs/inotify/max_user_watches tunable. 81920 has worked so far
+fine on my machine ;)
+
+There was also a secondary approach suggested by my mentor Leio to parse
+rsync log files, but I am a bit relucant about this idea.
+
+Anyway, I'll leave this idea simmering here for a while and unless someone
+comes up with a better idea (Yes, I have also thought about scanning whole
+portage tree every x-hours), I'm going to implement the daemon.
+
+Plans for current week 
+======================
+
+As I currently consider the core issue solved, the next issue I have to solve
+is how to take an ebuild, extract information about it and store it in
+database. (Hint: pkgcore)
+
+I'm not going take bigger tasks because I still have one quite hard exam
+(thermodynamics and statistical physics) on 4th of June. And if I pass, it
+is the last one.
+
+PS. Sorry, no blog yet. I was using Zine, but it broke after I updated 
+my system to SQLAlchemy-0.6.
diff --git a/docs/gsoc/proposal.txt b/docs/gsoc/proposal.txt
new file mode 100644
index 0000000..47a5d75
--- /dev/null
+++ b/docs/gsoc/proposal.txt
@@ -0,0 +1,124 @@
+Priit Laes
+April 08, 2010
+draft-project-grumpy-gsoc-3
+
+Project Grumpy
+==============
+
+Project Objective
++++++++++++++++++
+There are many moments in every package maintainers life when one wishes that 
+one or another thing would be done automatically for him/her:
+  * Check which packages have identified common QA issues.
+  * Generate a stabilization list for the selection of packages.
+  * Get notified of packages that have new upstream versions.
+  * Get notifications of packages that can be stabilized if following the
+    30-day guideline.
+
+Many such automated or semi-automated applications/scripts do exist, but they
+are currently dispersed across the Internet in various different locations,
+with typically no good connection between packages and the maintainer looking
+for the information. These applications include tinderbox rindex/dindex
+reports, gentoo-bumpchecker, manual repoman/pcheck runs, and so on.
+
+Project "Grumpy" is intended as a Gentoo Linux project to aggregate
+the functionality of all these tools into one centralized application.
+
+Abstract
+++++++++
+Project Grumpy is a set of applications for gathering, indexing and interacting
+with various ebuild- and developer-related metadata.
+
+Grumpy Component Overview (aka deliverables)
+++++++++++++++++++++++++++++++++++++++++++++
+This section gives an overview about the components and technologies that are
+going to be used for this project.
+
+Grumpy Application Backend
+--------------------------
+Grumpy Application backend is the core of the Grumpy Application. Backend
+handles data storage and indexing and consists of following components:
+  * Database storage for ebuild metadata
+  * Tools for gathering and managing metadata
+  * Portage indexer
+  * Upstream information checks (version bumps, issues, etc)
+  * User-interface tools (Web interface, commandline utilities)
+
+Database
+~~~~~~~~
+The aim is to use document-oriented storage system (MongoDB) that allows easy
+storing and retrieval of metadata in JSON-like data schema. MongoDB stands in a
+gap between key-value stores (Memcached) and traditional RDBMS (PostgreSQL,
+MySQL) systems. MongoDB also has facilities for advanced data aggregation
+options like Map/reduce, replication and fail-over support and auto-sharding,
+if ever needed.
+
+Tools for metadata management
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+There are basically three types of tools:
+  * Firstly, the tools that deal with low-level operations like keeping portage
+    contents in synchronization with database. For this part it is not yet clear
+    whether it is possible to use already existing software (Portage API or
+    pkgcore) or should it be implemented from scratch.
+
+  * Secondly the tools that are used to query outside information for ebuild
+    related information (upstream version bumps, bugzilla status, tinderbox
+    results). Implementing tools for this part also requires working together
+    with various parties in order to make sure we always get the up-to-date
+    data in a format that can be easily understood by our tools.
+
+  * Thirdly, utilities that allow users (Gentoo developers) to maintain various
+    kinds of information they are interested in. For this purpose there are
+    mainly two types of utilities in mind: Web application providing both
+    HTML-based interface and JSON API. The latter can be used also for
+    various command-line utilities.
+
+Timeline and Development Plan
++++++++++++++++++++++++++++++
+It is quite clear that the most crucial part in this project is the data
+storage and portage indexer. When it is clear that contents in the database
+can be kept in synchronization with Portage (this also includes package
+moves, slotting changes) then works on other parts like upstream indexers
+and web application can be started. Therefore I propose following tentative
+timeline:
+
+24. May: Official start of project
+  * Implement portage synchronization with database
+  * Implement 30-day stabilization checker
+  * Implement upstream version checker for GNOME project
+12. July: Mid-term evaluation submitting starts
+  * Inquiries on whether it's possible to use LDAP authentication for web app
+16. July: Deadline for mid-term student evaluations
+  * First sketches for JSON-API via web application
+  * Few simple commandline utils for developers to manage packages of interest
+9. August: Start of 'pencils down'
+20. August: Final evaluation deadline
+
+Biography
++++++++++
+I am an undergraduate student of theoretical physics in University of Tartu and
+my main research interest is cosmology and the nature of gravity.
+
+My leisure time mostly consists of working on various open-source or freelance
+projects, reading (either about physics or science-fiction) and spending time
+with friends.
+
+I have also held various positions in the past, including system administrator,
+embedded software developer and web application developer.
+
+FUN: Origin of Grumpy's name
+++++++++++++++++++++++++++++
+This is an excerpt from #gentoo-desktop channel on June 11. 2009::
+
+  21:58 < leio> ok, I need a good codename for this maintainer website
+                thing where you would be able to look things up, like what to
+                bump, etc. Go!
+  21:58 < plaes> grumpy ? :)
+  21:58 < plaes> grumpy.gentoo.org ? :)
+  21:59 < scarabeus> glocate
+  21:59 < scarabeus> ;]
+  21:59 < EvaSDK> grumpy++
+  22:00 < EvaSDK> that would be awesome
+  22:00 < scarabeus> but i agree with grumpy too
+  22:00 < scarabeus> :}
+  22:00 < scarabeus> sounds cool :]
-- 
cgit v1.2.3-65-gdbad