This is a progress report #3 for Project Grumpy. Now, since report two, there has been a big change of focus in the course of development, which means that we decided to drop our beloved and also greatly hated NoSQL approach (MongoDB) and instead go forward using regular RDBMS which in our case is good old PostgreSQL. Although there were some compelling arguments (ease of use being my favorable) for MongoDB, the biggest nail in its coffing was its lack of "support" for it from Gentoo's infra team. For them it was just another application they would have to take care of and around interwebs there's lots of 'MongoDB ate my data' reports on how error-prone MongoDB actually is (although data volumes in most of these cases were so high, that I cannot really imagine Grumpy running into these problem). But I can really understand their concerns. Besides, if you take a look at list of commits in MongoDB's official development repository [1], you can see why people are a bit concerned ;) [1] http://github.com/mongodb/mongo/commits Therefore we switched over to PostgreSQL, using SQLAlchemy as a glue layer between the database and application. SQLAlchemy is a blessing because using its object relational model, you do not actually have to write any SQL (just take a peek in the 'grumpy_sync' utility). Progress so far =============== So far I have implemented portage -> database sync utility that is used to keep database in sync with portage content. Although it seems to handle most of the various portage quirks (like package moves via 'profiles/sync'), it still might run into issues in some corner cases and there is also minimal error recovery: it is currently designed to crash with RuntimeError when it detects something out of ordinary. Of course, the data model is far from complete - no proper handling of keywords, and I do not even store ebuild depends, rdepends and licenses in database - mainly because I currently don't have any use cases for these. Syncer can be found under 'utils' directory in the project directory. Future plans ============ As model and controller are ready, next stop is to write rudimentary web app for browsing portage contents, so people can finally see that I actually haven't slacked all this time.. :) Also, during portage import I noticed some really simple QA issues like invalid herd names in 'metadata.xml'. Plan is to write a 'herdcheck' plugin and implement database storage for these QA issues. And as I cannot let anyone to simply write to database, I need to implement API to let plugins interact with app. Having API means that I can start integrating with other QA tools around there, mainly tinderbox. And finally, testing. I currently have simple doctesting and auditing (via PyFlakes) framework in place, but general unit testing is still missing. As you can see, I'm a bit lagging my proposed timeline - I still haven't actually started looking how to create the 30-day stabilisation and upstream version checkers, but hopefully I can pick up the speed because I can now say that I have passed the biggest hurdle.. :) And I have also dropped my 'secret agenda' of documenting my experience with NoSQL databases as a series of articles written during this project... Project info ============ Git repository of Grumpy repo is available from [2]. [2] http://git.overlays.gentoo.org/gitweb/?p=proj/grumpy.git;a=summary Project's semi-official IRC channel is #gentoo-grumpy on Freenode network, if you run into troubles when testing out this project, then just ping me with a message. PS. Bonus points for those who noticed that I dropped 'weekly' ;)