Priit Laes
April 08, 2010
draft-project-grumpy-gsoc-3

Project Grumpy
==============

Project Objective
+++++++++++++++++
There are many moments in every package maintainers life when one wishes that 
one or another thing would be done automatically for him/her:
  * Check which packages have identified common QA issues.
  * Generate a stabilization list for the selection of packages.
  * Get notified of packages that have new upstream versions.
  * Get notifications of packages that can be stabilized if following the
    30-day guideline.

Many such automated or semi-automated applications/scripts do exist, but they
are currently dispersed across the Internet in various different locations,
with typically no good connection between packages and the maintainer looking
for the information. These applications include tinderbox rindex/dindex
reports, gentoo-bumpchecker, manual repoman/pcheck runs, and so on.

Project "Grumpy" is intended as a Gentoo Linux project to aggregate
the functionality of all these tools into one centralized application.

Abstract
++++++++
Project Grumpy is a set of applications for gathering, indexing and interacting
with various ebuild- and developer-related metadata.

Grumpy Component Overview (aka deliverables)
++++++++++++++++++++++++++++++++++++++++++++
This section gives an overview about the components and technologies that are
going to be used for this project.

Grumpy Application Backend
--------------------------
Grumpy Application backend is the core of the Grumpy Application. Backend
handles data storage and indexing and consists of following components:
  * Database storage for ebuild metadata
  * Tools for gathering and managing metadata
  * Portage indexer
  * Upstream information checks (version bumps, issues, etc)
  * User-interface tools (Web interface, commandline utilities)

Database
~~~~~~~~
The aim is to use document-oriented storage system (MongoDB) that allows easy
storing and retrieval of metadata in JSON-like data schema. MongoDB stands in a
gap between key-value stores (Memcached) and traditional RDBMS (PostgreSQL,
MySQL) systems. MongoDB also has facilities for advanced data aggregation
options like Map/reduce, replication and fail-over support and auto-sharding,
if ever needed.

Tools for metadata management
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
There are basically three types of tools:
  * Firstly, the tools that deal with low-level operations like keeping portage
    contents in synchronization with database. For this part it is not yet clear
    whether it is possible to use already existing software (Portage API or
    pkgcore) or should it be implemented from scratch.

  * Secondly the tools that are used to query outside information for ebuild
    related information (upstream version bumps, bugzilla status, tinderbox
    results). Implementing tools for this part also requires working together
    with various parties in order to make sure we always get the up-to-date
    data in a format that can be easily understood by our tools.

  * Thirdly, utilities that allow users (Gentoo developers) to maintain various
    kinds of information they are interested in. For this purpose there are
    mainly two types of utilities in mind: Web application providing both
    HTML-based interface and JSON API. The latter can be used also for
    various command-line utilities.

Timeline and Development Plan
+++++++++++++++++++++++++++++
It is quite clear that the most crucial part in this project is the data
storage and portage indexer. When it is clear that contents in the database
can be kept in synchronization with Portage (this also includes package
moves, slotting changes) then works on other parts like upstream indexers
and web application can be started. Therefore I propose following tentative
timeline:

24. May: Official start of project
  * Implement portage synchronization with database
  * Implement 30-day stabilization checker
  * Implement upstream version checker for GNOME project
12. July: Mid-term evaluation submitting starts
  * Inquiries on whether it's possible to use LDAP authentication for web app
16. July: Deadline for mid-term student evaluations
  * First sketches for JSON-API via web application
  * Few simple commandline utils for developers to manage packages of interest
9. August: Start of 'pencils down'
20. August: Final evaluation deadline

Biography
+++++++++
I am an undergraduate student of theoretical physics in University of Tartu and
my main research interest is cosmology and the nature of gravity.

My leisure time mostly consists of working on various open-source or freelance
projects, reading (either about physics or science-fiction) and spending time
with friends.

I have also held various positions in the past, including system administrator,
embedded software developer and web application developer.

FUN: Origin of Grumpy's name
++++++++++++++++++++++++++++
This is an excerpt from #gentoo-desktop channel on June 11. 2009::

  21:58 < leio> ok, I need a good codename for this maintainer website
                thing where you would be able to look things up, like what to
                bump, etc. Go!
  21:58 < plaes> grumpy ? :)
  21:58 < plaes> grumpy.gentoo.org ? :)
  21:59 < scarabeus> glocate
  21:59 < scarabeus> ;]
  21:59 < EvaSDK> grumpy++
  22:00 < EvaSDK> that would be awesome
  22:00 < scarabeus> but i agree with grumpy too
  22:00 < scarabeus> :}
  22:00 < scarabeus> sounds cool :]