This is a weekly progress report no. 1 for Project Grumpy. As this is the first publicly visible announcement, I am also going to give a short overview about the project itself. The aim of this project is to create a database containing various developer-related metadata about packages in the Gentoo portage. Metadata that we are going to store can be used for different kinds of purposes, some examples include upstream version checks and giving notifications to developers who are interested about that package. And eventually provide a nice web and API interface to access this data. Project's semi-official IRC channel is #gentoo-grumpy on Freenode network. Just step in say "Hi!" :) Last week's progress report =========================== My first week went a bit slowly due to having some "unfinished business" that I needed to finish, and also because of two exams (which went fine). The core issue I wrestled during this week was how to keep portage contents and database contents in sync - ie. when ebuild is modified, removed or added, how to make sure that database contents correspond to the portage contents. The solution that I came up with is to use a simple daemon that logs changes to portage tree and modifies database contents when it's appropriate. Appropriate here means that we shouldn't log updates during the update of the tree as it might be unsafe (ie package rename). So currently it seems that daemon has also initiate the rsync progress and push the updates into database after rsync has finished successfully. (You can already see how all kinds of weird corner cases start popping up :P ) My current approach to logging is using the inotify framework present in Linux kernel since 2.6.13 (sorry BSD users, but this is Gentoo Linux afterall) with the help of pyinotify [1]. So far there's only one drawback to using inotify - by default kernel has a limit of 8192 directory watches allowed per-process (but portage contains a lots of directories) so in order to use that approach one has to bump the number watches using /proc/sys/fs/inotify/max_user_watches tunable. 81920 has worked so far fine on my machine ;) There was also a secondary approach suggested by my mentor Leio to parse rsync log files, but I am a bit relucant about this idea. Anyway, I'll leave this idea simmering here for a while and unless someone comes up with a better idea (Yes, I have also thought about scanning whole portage tree every x-hours), I'm going to implement the daemon. Plans for current week ====================== As I currently consider the core issue solved, the next issue I have to solve is how to take an ebuild, extract information about it and store it in database. (Hint: pkgcore) I'm not going take bigger tasks because I still have one quite hard exam (thermodynamics and statistical physics) on 4th of June. And if I pass, it is the last one. PS. Sorry, no blog yet. I was using Zine, but it broke after I updated my system to SQLAlchemy-0.6.