Diffstat (limited to 'docs/gsoc/01-report.txt')
1 files changed, 61 insertions, 0 deletions
diff --git a/docs/gsoc/01-report.txt b/docs/gsoc/01-report.txt
new file mode 100644
@@ -0,0 +1,61 @@
+This is a weekly progress report no. 1 for Project Grumpy.
+As this is the first publicly visible announcement, I am also going to
+give a short overview about the project itself.
+The aim of this project is to create a database containing various
+developer-related metadata about packages in the Gentoo portage. Metadata
+that we are going to store can be used for different kinds of purposes,
+some examples include upstream version checks and giving notifications to
+developers who are interested about that package. And eventually provide
+a nice web and API interface to access this data.
+Project's semi-official IRC channel is #gentoo-grumpy on Freenode network.
+Just step in say "Hi!" :)
+Last week's progress report
+My first week went a bit slowly due to having some "unfinished business"
+that I needed to finish, and also because of two exams (which went fine).
+The core issue I wrestled during this week was how to keep portage contents
+and database contents in sync - ie. when ebuild is modified, removed or added,
+how to make sure that database contents correspond to the portage contents.
+The solution that I came up with is to use a simple daemon that logs changes
+to portage tree and modifies database contents when it's appropriate.
+Appropriate here means that we shouldn't log updates during the update of
+the tree as it might be unsafe (ie package rename). So currently it seems
+that daemon has also initiate the rsync progress and push the updates into
+database after rsync has finished successfully. (You can already see how
+all kinds of weird corner cases start popping up :P )
+My current approach to logging is using the inotify framework present in
+Linux kernel since 2.6.13 (sorry BSD users, but this is Gentoo Linux
+afterall) with the help of pyinotify . So far there's only one
+drawback to using inotify - by default kernel has a limit of 8192 directory
+watches allowed per-process (but portage contains a lots of directories)
+so in order to use that approach one has to bump the number watches using
+/proc/sys/fs/inotify/max_user_watches tunable. 81920 has worked so far
+fine on my machine ;)
+There was also a secondary approach suggested by my mentor Leio to parse
+rsync log files, but I am a bit relucant about this idea.
+Anyway, I'll leave this idea simmering here for a while and unless someone
+comes up with a better idea (Yes, I have also thought about scanning whole
+portage tree every x-hours), I'm going to implement the daemon.
+Plans for current week
+As I currently consider the core issue solved, the next issue I have to solve
+is how to take an ebuild, extract information about it and store it in
+database. (Hint: pkgcore)
+I'm not going take bigger tasks because I still have one quite hard exam
+(thermodynamics and statistical physics) on 4th of June. And if I pass, it
+is the last one.
+PS. Sorry, no blog yet. I was using Zine, but it broke after I updated
+my system to SQLAlchemy-0.6.