aboutsummaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
Diffstat (limited to 'docs')
-rw-r--r--docs/gsoc/02-report.txt121
1 files changed, 121 insertions, 0 deletions
diff --git a/docs/gsoc/02-report.txt b/docs/gsoc/02-report.txt
new file mode 100644
index 0000000..debf139
--- /dev/null
+++ b/docs/gsoc/02-report.txt
@@ -0,0 +1,121 @@
+This is a weekly progress report no. 2 for Project Grumpy.
+
+As reported previously, I am building a system to index portage packages
+and related metadata to make package maintainership a bit easier for
+developers.
+
+First, a few words about the document metadata storage. For this project, the
+plan is to use a document-oriented and schema-free database (MongoDB) instead
+of a regular relational database system (like SQLite or PostgreSQL).
+
+This also means that we can create a single document collection, where
+documents correspond to simply "category/package" and collection containing
+whole ebuild tree.
+
+Document itself in the collection, is just a JSON-formatted dictionary with
+following structure (beware, this is work in progress, so some things are
+still missing)::
+
+ {
+ # "package/category" (primary index, unique)
+ '_id' : string,
+
+ # Version of the schema, used internally (just in case)
+ 'schema_ver' : integer,
+
+ # Package category
+ 'cat' : string,
+
+ # Package name
+ 'pkg' : string,
+
+ ## Data from metadata.xml
+ # List of herds maintaining this package
+ 'herds' : [ string, ... ],
+ # Long description of the package
+ 'ldesc' : string,
+ # List of maintainers (by email addresses)
+ 'maintainers' : [ string, ... ],
+
+ ## Data from ebuilds itself (but should be general)
+ # Description
+ "desc" : string,
+ # Upstream url(s) (FIXME: Do we need list here?)
+ 'homepage' : string,
+
+ # Array of all the package versions and their specific info
+ 'ebuilds' : [
+ # Package version (from category/package-version)
+ 'version' : string,
+
+ # Eapi version
+ "eapi" : integer,
+ # List of USE flags supported by this ebuild
+ 'iuse' : [ string, ... ],
+ # Package keywords ("x86", "~amd64", ...)
+ 'keywords : [ string, ... ],
+ # Licenses
+ 'licence' : [ string, ... ],
+ # Package slot
+ 'slot' : string,
+
+ # Need to figure out proper structure for these, so we can also
+ # map out USE flags ;)
+ 'depend' : TODO!!!
+ 'rdepend' : TODO!!!
+ ]
+ }
+
+So how about querying the data? That's easy. (Please note we are using MongoDB
+shell). So, what if a developer wants to know which packages he is supposedly
+maintaining::
+
+ > db.ebuilds.find({'maintainers' : '...@gentoo.org' })
+ {... document data ...} # (Too much info :) )
+ > db.ebuilds.find({'maintainers' : '...@gentoo.org' }).count()
+ 7
+
+And the results come fast. I mean really fast.
+Ok, how about checking how many packages under 'dev-python' are using specific
+EAPI version::
+
+ > db.ebuilds.find({'cat' : 'dev-python', 'ebuilds.eapi' : 0}).count()
+ 202
+ > db.ebuilds.find({'cat' : 'dev-python', 'ebuilds.eapi' : 1}).count()
+ 3
+ > db.ebuilds.find({'cat' : 'dev-python', 'ebuilds.eapi' : 2}).count()
+ 255
+ > db.ebuilds.find({'cat' : 'dev-python', 'ebuilds.eapi' : 3}).count()
+ 125
+ > db.ebuilds.find({'cat' : 'dev-python', 'ebuilds.eapi' : 4}).count()
+ 0
+ > db.ebuilds.find({'cat' : 'dev-python' }).count()
+ 504
+ > 202+3+255+125 - 504
+ 81
+
+Ahem.. looks like we have a "design issue" with our document structure. So
+back to the drawing board.
+
+Last week's progress report
+===========================
+
+Last week's progress has been a bit slow, I have mostly played with document
+structure and played a bit with pkgcore's internals. Although I now have
+portage contents inside the database the document structure itself is far from
+ideal (as you can see from the example with EAPI counts given earlier).
+
+I have committed some of the stuff I have been working on into Grumpy's repo,
+so in case you are interested check it out from [1].
+
+[1] http://git.overlays.gentoo.org/gitweb/?p=proj/grumpy.git;a=summary
+
+First a warning, the portage->mongodb syncer is slow. I mean really slow - it
+takes about 3 hours (or even more) on my laptop to fully scan the contents of
+portage and store the data in database.
+
+Plans for current week
+======================
+
+1) Speed up the portage syncer
+2) Improve document structure