Binary Host Servers for Gentoo GRP ================================== $Id: binary_hosts_draft.txt,v 1.3 2005/05/01 01:00:19 carpaski Exp $ --------------------------------------- BINHOST support enables a machine to utilize standard FTP or HTTP protocols over a network to determine dependency, download, and install binary packages. Binaries, as presented on a GRP cd, or as created in the normal operation of portage, contain all the information required by portage to install and maintain packages with the added benefit that provided packages are known to be good, working sets. GRP provides a fast installation method for those downloading the GRP packages CDs along side the LiveCDs and stages for an installation. The benefit of a Gentoo BINHOST server (mirror) would extend the benefits of GRP to end users and removes the requirement of a packages CD download. The PR benefit of this would be the contradiction of the common stereotype of Gentoo's quickness to install. Depending upon the connection, one could install a basic system in under 15 minutes, given that you understood the process already. Adding huge sets of packages such as kde and gnone would be minimal additions on top of those 15 minutes. The obvious objections to such a mirror system for GRP would be the abuse of the system by end users. The abuse occurs as follows: A user, having an installed system, decides to install a package that would take considerable amounts of time to install. The user then concludes that binaries, available from Gentoo's BINHOST system will provide the "quick installation" that they can then update as they have time. The fallacy here is a non-complete installation of reversion in the systems complete dependency tree. Downgrading major libraries based upon highly mismatched stable systems causes high time-cost damage to repair. This is the identical abuse that one sees with GRP distribution on CDs. The fear here is an increase in the occurances of damaged systems and problem reports. The resolution to such reports should simply be 'INVALID'. The documentation would make pointed statements on the applicibility of GRP BINHOSTS and indicate the one-time-use system (in the basic scenario). The feasability of disabling GRP and source installations is discussed in problem discussion 1. Advanced scenarios of this BINHOST system would be the availability of slow-changing systems, such as Server, or Stable/Security GRP. The packages can be maintained on the mirror system, updated as required by security or periodic advancement. The same system of limitation could be enabled for Server/Security GRPs as for the general end-user GRP for quick installation. Minimally, GRP can now be used as a security-minded binary platform, a slow-moving target. For Vendors and some users, Gentoo can now provide it's benefits to a wider market at NO COST TO THE ACTIVE TREE. Resource utilization is fairly minimal. GRP BINHOSTs could be managed from the main rotation of Gentoo-controlled mirrors. This provides the sanist and most reliable system for distribution. As a side effect of GRP providing the binaries already on CDs, the Gentoo-controlled mirrors can utilize almost no additional space by loopback mounting the Packages CDs in a particular and seperate directory structure based on system specs. The bandwidth utilization of BINHOST GRP is completely unknown at this time, but it should form an absolute peak at the current distribution of LiveCDs across the mirrors. The suggested upper bound should be 50% of a LiveCD at equal distribution. BINHOSTs do not require the entirity of the package to extract the needed dependency information, thus the entire LiveCD is an excessive download in comparison. ===================================================================== Operating with BINHOST: ----------------------- The normal packages directory created and filled in normal portage operation provides binaries suitable for distribution via BINHOST systems. The actual files in the 'All/' subdirectory of the packages directory matches the format expected of a binhost mirror. Catalyst or standard 'emerge -b' operations will fill the directory as needed. Portage binhost support uses a short term (1 day) cache. At the initial connection to a remote binhost, portage lists and fetches all the metadata from packages is an intelligent manner by skimming the data from the hosted package using FTP or HTTP range support. This data can be fouled as updates occur to packages on the host. The solution to this problem is the use of 'metadata.idx' files on the mirrors along side the package tarballs. When connecting to a mirror the first time, portage creates two copies of the metadata. One is a permanent record that is maintained in a general "pickle" for all hosts you connect to. The other file is located at /var/cache/edb/metadata.idx.most_recent and is created when portage first connects and fetches the metadata from the host. This "most_recent" file can be exploited to speedup and remove the data corruption as it's always downloaded at least once. Given that every update to the files is accompanied by a newly generated index, you prevent users from having bad metadata when a package is updated. Metadata indexes can be stamped (HIGHLY RECOMMENDED) and compressed with gzip. These two in combination makes binhosts extremely fast and mostly immune to race/time-based metadata corruption. Process: 1. Update binaries. 2. rm /var/cache/edb/*metadata* 3. PORTAGE_BINHOST="http://localhost/path" emerge -egp world 4. cp /var/cache/edb/metadata.idx.most_recent ./metadata.idx 5. mv metadata.idx metadata.idx-`date +%s` 6. gzip -9 metadata.idx-`date +%s` 7. mv metadata.idx-`date +%s`.gz /path/to/All/ 8. Sync mirrors. Perferably atomically. ===================================================================== Problem Discussion 1: --------------------- The ability to prevent downgrading through GRP should be possible with minimal side effects for end users of the explicit intent of the basic GRP scenario. The process would require forging RDEPEND statements of binary packages to require a specific version of a package that is not generally available and blocking all others. The package selected would be a base system package and should be created with a unique full version and then inserted forged into every package.