path: root/docs/gsoc
diff options
Diffstat (limited to 'docs/gsoc')
2 files changed, 185 insertions, 0 deletions
diff --git a/docs/gsoc/01-report.txt b/docs/gsoc/01-report.txt
new file mode 100644
index 0000000..6ab40c9
--- /dev/null
+++ b/docs/gsoc/01-report.txt
@@ -0,0 +1,61 @@
+This is a weekly progress report no. 1 for Project Grumpy.
+As this is the first publicly visible announcement, I am also going to
+give a short overview about the project itself.
+The aim of this project is to create a database containing various
+developer-related metadata about packages in the Gentoo portage. Metadata
+that we are going to store can be used for different kinds of purposes,
+some examples include upstream version checks and giving notifications to
+developers who are interested about that package. And eventually provide
+a nice web and API interface to access this data.
+Project's semi-official IRC channel is #gentoo-grumpy on Freenode network.
+Just step in say "Hi!" :)
+Last week's progress report
+My first week went a bit slowly due to having some "unfinished business"
+that I needed to finish, and also because of two exams (which went fine).
+The core issue I wrestled during this week was how to keep portage contents
+and database contents in sync - ie. when ebuild is modified, removed or added,
+how to make sure that database contents correspond to the portage contents.
+The solution that I came up with is to use a simple daemon that logs changes
+to portage tree and modifies database contents when it's appropriate.
+Appropriate here means that we shouldn't log updates during the update of
+the tree as it might be unsafe (ie package rename). So currently it seems
+that daemon has also initiate the rsync progress and push the updates into
+database after rsync has finished successfully. (You can already see how
+all kinds of weird corner cases start popping up :P )
+My current approach to logging is using the inotify framework present in
+Linux kernel since 2.6.13 (sorry BSD users, but this is Gentoo Linux
+afterall) with the help of pyinotify [1]. So far there's only one
+drawback to using inotify - by default kernel has a limit of 8192 directory
+watches allowed per-process (but portage contains a lots of directories)
+so in order to use that approach one has to bump the number watches using
+/proc/sys/fs/inotify/max_user_watches tunable. 81920 has worked so far
+fine on my machine ;)
+There was also a secondary approach suggested by my mentor Leio to parse
+rsync log files, but I am a bit relucant about this idea.
+Anyway, I'll leave this idea simmering here for a while and unless someone
+comes up with a better idea (Yes, I have also thought about scanning whole
+portage tree every x-hours), I'm going to implement the daemon.
+Plans for current week
+As I currently consider the core issue solved, the next issue I have to solve
+is how to take an ebuild, extract information about it and store it in
+database. (Hint: pkgcore)
+I'm not going take bigger tasks because I still have one quite hard exam
+(thermodynamics and statistical physics) on 4th of June. And if I pass, it
+is the last one.
+PS. Sorry, no blog yet. I was using Zine, but it broke after I updated
+my system to SQLAlchemy-0.6.
diff --git a/docs/gsoc/proposal.txt b/docs/gsoc/proposal.txt
new file mode 100644
index 0000000..47a5d75
--- /dev/null
+++ b/docs/gsoc/proposal.txt
@@ -0,0 +1,124 @@
+Priit Laes
+April 08, 2010
+Project Grumpy
+Project Objective
+There are many moments in every package maintainers life when one wishes that
+one or another thing would be done automatically for him/her:
+ * Check which packages have identified common QA issues.
+ * Generate a stabilization list for the selection of packages.
+ * Get notified of packages that have new upstream versions.
+ * Get notifications of packages that can be stabilized if following the
+ 30-day guideline.
+Many such automated or semi-automated applications/scripts do exist, but they
+are currently dispersed across the Internet in various different locations,
+with typically no good connection between packages and the maintainer looking
+for the information. These applications include tinderbox rindex/dindex
+reports, gentoo-bumpchecker, manual repoman/pcheck runs, and so on.
+Project "Grumpy" is intended as a Gentoo Linux project to aggregate
+the functionality of all these tools into one centralized application.
+Project Grumpy is a set of applications for gathering, indexing and interacting
+with various ebuild- and developer-related metadata.
+Grumpy Component Overview (aka deliverables)
+This section gives an overview about the components and technologies that are
+going to be used for this project.
+Grumpy Application Backend
+Grumpy Application backend is the core of the Grumpy Application. Backend
+handles data storage and indexing and consists of following components:
+ * Database storage for ebuild metadata
+ * Tools for gathering and managing metadata
+ * Portage indexer
+ * Upstream information checks (version bumps, issues, etc)
+ * User-interface tools (Web interface, commandline utilities)
+The aim is to use document-oriented storage system (MongoDB) that allows easy
+storing and retrieval of metadata in JSON-like data schema. MongoDB stands in a
+gap between key-value stores (Memcached) and traditional RDBMS (PostgreSQL,
+MySQL) systems. MongoDB also has facilities for advanced data aggregation
+options like Map/reduce, replication and fail-over support and auto-sharding,
+if ever needed.
+Tools for metadata management
+There are basically three types of tools:
+ * Firstly, the tools that deal with low-level operations like keeping portage
+ contents in synchronization with database. For this part it is not yet clear
+ whether it is possible to use already existing software (Portage API or
+ pkgcore) or should it be implemented from scratch.
+ * Secondly the tools that are used to query outside information for ebuild
+ related information (upstream version bumps, bugzilla status, tinderbox
+ results). Implementing tools for this part also requires working together
+ with various parties in order to make sure we always get the up-to-date
+ data in a format that can be easily understood by our tools.
+ * Thirdly, utilities that allow users (Gentoo developers) to maintain various
+ kinds of information they are interested in. For this purpose there are
+ mainly two types of utilities in mind: Web application providing both
+ HTML-based interface and JSON API. The latter can be used also for
+ various command-line utilities.
+Timeline and Development Plan
+It is quite clear that the most crucial part in this project is the data
+storage and portage indexer. When it is clear that contents in the database
+can be kept in synchronization with Portage (this also includes package
+moves, slotting changes) then works on other parts like upstream indexers
+and web application can be started. Therefore I propose following tentative
+24. May: Official start of project
+ * Implement portage synchronization with database
+ * Implement 30-day stabilization checker
+ * Implement upstream version checker for GNOME project
+12. July: Mid-term evaluation submitting starts
+ * Inquiries on whether it's possible to use LDAP authentication for web app
+16. July: Deadline for mid-term student evaluations
+ * First sketches for JSON-API via web application
+ * Few simple commandline utils for developers to manage packages of interest
+9. August: Start of 'pencils down'
+20. August: Final evaluation deadline
+I am an undergraduate student of theoretical physics in University of Tartu and
+my main research interest is cosmology and the nature of gravity.
+My leisure time mostly consists of working on various open-source or freelance
+projects, reading (either about physics or science-fiction) and spending time
+with friends.
+I have also held various positions in the past, including system administrator,
+embedded software developer and web application developer.
+FUN: Origin of Grumpy's name
+This is an excerpt from #gentoo-desktop channel on June 11. 2009::
+ 21:58 < leio> ok, I need a good codename for this maintainer website
+ thing where you would be able to look things up, like what to
+ bump, etc. Go!
+ 21:58 < plaes> grumpy ? :)
+ 21:58 < plaes> grumpy.gentoo.org ? :)
+ 21:59 < scarabeus> glocate
+ 21:59 < scarabeus> ;]
+ 21:59 < EvaSDK> grumpy++
+ 22:00 < EvaSDK> that would be awesome
+ 22:00 < scarabeus> but i agree with grumpy too
+ 22:00 < scarabeus> :}
+ 22:00 < scarabeus> sounds cool :]