This is a weekly progress report no. 1 for Project Grumpy.
As this is the first publicly visible announcement, I am also going to
give a short overview about the project itself.
The aim of this project is to create a database containing various
developer-related metadata about packages in the Gentoo portage. Metadata
that we are going to store can be used for different kinds of purposes,
some examples include upstream version checks and giving notifications to
developers who are interested about that package. And eventually provide
a nice web and API interface to access this data.
Project's semi-official IRC channel is #gentoo-grumpy on Freenode network.
Just step in say "Hi!" :)
Last week's progress report
My first week went a bit slowly due to having some "unfinished business"
that I needed to finish, and also because of two exams (which went fine).
The core issue I wrestled during this week was how to keep portage contents
and database contents in sync - ie. when ebuild is modified, removed or added,
how to make sure that database contents correspond to the portage contents.
The solution that I came up with is to use a simple daemon that logs changes
to portage tree and modifies database contents when it's appropriate.
Appropriate here means that we shouldn't log updates during the update of
the tree as it might be unsafe (ie package rename). So currently it seems
that daemon has also initiate the rsync progress and push the updates into
database after rsync has finished successfully. (You can already see how
all kinds of weird corner cases start popping up :P )
My current approach to logging is using the inotify framework present in
Linux kernel since 2.6.13 (sorry BSD users, but this is Gentoo Linux
afterall) with the help of pyinotify . So far there's only one
drawback to using inotify - by default kernel has a limit of 8192 directory
watches allowed per-process (but portage contains a lots of directories)
so in order to use that approach one has to bump the number watches using
/proc/sys/fs/inotify/max_user_watches tunable. 81920 has worked so far
fine on my machine ;)
There was also a secondary approach suggested by my mentor Leio to parse
rsync log files, but I am a bit relucant about this idea.
Anyway, I'll leave this idea simmering here for a while and unless someone
comes up with a better idea (Yes, I have also thought about scanning whole
portage tree every x-hours), I'm going to implement the daemon.
Plans for current week
As I currently consider the core issue solved, the next issue I have to solve
is how to take an ebuild, extract information about it and store it in
database. (Hint: pkgcore)
I'm not going take bigger tasks because I still have one quite hard exam
(thermodynamics and statistical physics) on 4th of June. And if I pass, it
is the last one.
PS. Sorry, no blog yet. I was using Zine, but it broke after I updated
my system to SQLAlchemy-0.6.