Search:  
Gentoo Wiki

TIP_speed_up_portage_with_cdb

This article is part of the Tips & Tricks series.
Terminals / Shells Network X Window System Portage System Filesystems Kernel Other

Contents

Introduction

This tip will teach you how to drastically increase portage's speed after syncing, and for calculating dependencies. This method uses a database, rather than a bunch of flat files to store its metadata. It requires the module cdb, and you can view the original post in the Gentoo Forums.

There is also an beta ebuild in bugzilla, but it's better you follow the steps as laid out here by hand.

A similar speed-up with sqlite is described in TIP speed up portage with sqlite

What is cdb?

cdb is a fast, reliable, simple package for creating and reading constant databases. Its database structure provides several features:

Warning

The method this tip uses to plugin a new module to portage is fine - portage devs added it to portage explicitly for purposes like this. That said, using a third party plugin with portage means you have to deal with the bugs- for example, if you unmerge python-cdb, you need to remember to remove the custom /etc/portage/modules setting.

Beyond that, an upcoming cache backport to 2.0.x from the >=2.1 line of portage will break this module. This should occur sometime around >=2.0.54. However, instructions are below which will repair the problem.

Also, portage-2.1_pre4 will not work with this module as is. See http://forums.gentoo.org/viewtopic-t-261580-postdays-0-postorder-asc-start-175.html#3068457

Beware: this method does not currently work with the portage-2.2 line.

Getting what you need

Not much is needed for this. Just a text editor, and the cdb module. So emerge it!

emerge dev-python/python-cdb

Setting up portage

Portage will require us to create/edit two files for this to work:

Create Our new Module

We first need to create a new file, telling portage how to work with cdb.

So with portage <2.1, create the new file /usr/lib/portage/pym/portage_db_cdb.py and put the following in it:

File: /usr/lib/portage/pym/portage_db_cdb.py
# Copyright 2004, 2005 Tobias Bell <tobias.bell@web.de>
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA

import portage_db_template
import os
import os.path
import cPickle
import cdb


class _data(object):

    def __init__(self, path, category, uid, gid):
        self.path = path
        self.category = category
        self.uid = uid
        self.gid = gid
        self.addList = {}
        self.delList = []
        self.modified = False
        self.cdbName = os.path.normpath(os.path.join(
            self.path, self.category) + ".cdb")
        self.cdbObject = None

    def __del__(self):
        if self.modified:
            self.realSync()

        self.closeCDB()

    def realSync(self):
        if self.modified:
            self.modified = False
            newDB = cdb.cdbmake(self.cdbName, self.cdbName + ".tmp")
           
            for key, value in iter(self.cdbObject.each, None):
                if key in self.delList:
                    if key in self.addList:
                        newDB.add(key, cPickle.dumps(self.addList[key], cPickle.HIGHEST_PROTOCOL))
                        del self.addList[key]
                elif key in self.addList:                   
                    newDB.add(key, cPickle.dumps(self.addList[key], cPickle.HIGHEST_PROTOCOL))
                    del self.addList[key]
                else:
                    newDB.add(key, value)
               

            self.closeCDB()

            for key, value in self.addList.iteritems():
                newDB.add(key, cPickle.dumps(value, cPickle.HIGHEST_PROTOCOL))
           
            newDB.finish()
            del newDB
           
            self.addList = {}
            self.delList = []

            self.openCDB()

    def openCDB(self):
        prevmask = os.umask(0)
       
        if not os.path.exists(self.path):
            os.makedirs(self.path, 02775)
            os.chown(self.path, self.uid, self.gid)
           
        if not os.path.isfile(self.cdbName):
            maker = cdb.cdbmake(self.cdbName, self.cdbName + ".tmp")
            maker.finish()
            del maker
            os.chown(self.cdbName, self.uid, self.gid)
            os.chmod(self.cdbName, 0664)

        os.umask(prevmask)
           
        self.cdbObject = cdb.init(self.cdbName)

    def closeCDB(self):
        if self.cdbObject:
            self.cdbObject = None


class _dummyData:
    cdbName = ""

    def realSync():
        pass
    realSync = staticmethod(realSync)


_cacheSize = 4
_cache = [_dummyData()] * _cacheSize


class database(portage_db_template.database):   

    def module_init(self):
        self.data = _data(self.path, self.category, self.uid, self.gid)

        for other in _cache:
            if other.cdbName == self.data.cdbName:
                self.data = other
                break
        else:
            self.data.openCDB()
            _cache.insert(0, self.data)           
            _cache.pop().realSync()
           
    def has_key(self, key):
        self.check_key(key)
        retVal = 0

        if self.data.cdbObject.get(key) is not None:
            retVal = 1

        if self.data.modified:
            if key in self.data.delList:
                retVal = 0
            if key in self.data.addList:
                retVal = 1
           
        return retVal

    def keys(self):
        myKeys = self.data.cdbObject.keys()

        if self.data.modified:
            for k in self.data.delList:
                myKeys.remove(k)
            for k in self.data.addList.iterkeys():
                if k not in myKeys:
                    myKeys.append(k)
                   
        return myKeys

    def get_values(self, key):
        values = None
       
        if self.has_key(key):
            if key in self.data.addList:
                values = self.data.addList[key]
            else:
                values = cPickle.loads(self.data.cdbObject.get(key))

        return values
   
    def set_values(self, key, val):
        self.check_key(key)
        self.data.modified = True
        self.data.addList[key] = val

    def del_key(self, key):
        retVal = 0
       
        if self.has_key(key):
            self.data.modified = True
            retVal = 1
            if key in self.data.addList:
                del self.data.addList[key]
            else:
                self.data.delList.append(key)

        return retVal
                   
    def sync(self):
        pass
   
    def close(self):
        pass


if __name__ == "__main__":
    import portage
    uid = os.getuid()
    gid = os.getgid()
    portage_db_template.test_database(database,"/tmp", "sys-apps", portage.auxdbkeys, uid, gid)

In portage 2.1 you instead need to create the file /usr/lib/portage/pym/cache/cdb.py

File: /usr/lib/portage/pym/cache/cdb.py
# Copyright: 2005 Gentoo Foundation
# Author(s): Brian Harring (ferringb@gentoo.org)
# License: GPL2
# $Id: anydbm.py 1911 2005-08-25 03:44:21Z ferringb $


cdb_module = __import__("cdb")
try:
	import cPickle as pickle
except ImportError:
	import pickle
import copy
import os
import fs_template
from template import reconstruct_eclasses
import cache_errors


class database(fs_template.FsBased):

	autocommits = True
	cleanse_keys = True
	serialize_eclasses = False

	def __init__(self, *args, **config):
		super(database,self).__init__(*args, **config)

		self._db_path = os.path.join(self.location, fs_template.gen_label(self.location, self.label)+".cdb")
		self.__db = None
		try:
			self.__db = cdb_module.init(self._db_path)

		except cdb_module.error:
			try:
				self._ensure_dirs()
				self._ensure_dirs(self._db_path)
				self._ensure_access(self._db_path)
			except (OSError, IOError), e:
				raise cache_errors.InitializationError(self.__class__, e)

			try:
				cm = cdb_module.cdbmake(self._db_path, self._db_path+".tmp")
				cm.finish()
				self._ensure_access(self._db_path)
				self.__db = cdb_module.init(self._db_path)
			except cdb_module.error, e:
				raise cache_errors.InitializationError(self.__class__, e)
		self._adds = {}
		self._dels = {}


	def iteritems(self):
		self.commit()
		return iter(self.__db.each, None)


	def _getitem(self, cpv):
		if cpv in self._adds:
			d = copy.deepcopy(self._adds[cpv])
		else:
			d = pickle.loads(self.__db[cpv])
		return d


	def _setitem(self, cpv, values):
		if cpv in self._dels:
			del self._dels[cpv]
		self._adds[cpv] = values


	def _delitem(self, cpv):
		if cpv in self._adds:
			del self._adds[cpv]
		self._dels[cpv] = True


	def commit(self):
		if not self._adds and not self._dels:
			return
		cm = cdb_module.cdbmake(self._db_path, self._db_path+str(os.getpid()))
		for (key, value) in iter(self.__db.each, None):
			if key in self._dels:
				del self._dels[key]
				continue
			if key in self._adds:
				cm.add(key, pickle.dumps(self._adds.pop(key), pickle.HIGHEST_PROTOCOL))
			else:
				cm.add(key, value)
		for (key, value) in self._adds.iteritems():
			cm.add(key, pickle.dumps(value, pickle.HIGHEST_PROTOCOL))
		cm.finish()
		self._ensure_access(self._db_path)
		self.__db = cdb_module.init(self._db_path)
		self._adds = {}
		self._dels = {}


	def iterkeys(self):
		self.commit()
		return iter(self.__db.keys())


	def has_key(self, cpv):
		return cpv not in self._dels and (cpv in self._adds or cpv in self.__db)


	def __del__(self):
		if getattr(self, "__db", None):
			self.commit()
			self.__db.finish()

You can also get it here. See also the original forum thread here. I hope jstubbs doesn't mind it being put here. ;-)

If the modules could be put on portage where it's not as easy for any hacker to change the link or code, that'd be nice.

Tell portage to use it

After setting that up, we need to tell portage to use our new database. So create the file /etc/portage/modules (if it's not already there) and put the following in it:

File: /etc/portage/modules
portdbapi.auxdbmodule = portage_db_cdb.database
eclass_cache.dbmodule = portage_db_cdb.database

In the portage 2.1 release, you'll need to add the following lines instead:

File: /etc/portage/modules
portdbapi.auxdbmodule = cache.cdb.database
eclass_cache.dbmodule = cache.cdb.database

Tell eix to use it

If we use eix (in version < 0.5.4), we will need to inform eix about our new database. So create the file /etc/eixrc (if it's not already there) and put the following in it:

File: /etc/eixrc
PORTDIR_CACHE_METHOD='cdb'

Warning: This doesn't work for newer versions of eix anymore. Just leave the default method (metadata), because it is independent of the cache used by portage.

Final Steps

Now all we have to do is regenerate the portage cache:

emerge --metadata

Last Thoughts

Why NO cdb?

by Chaosite at
http://forums.gentoo.org/viewtopic-t-261580-postdays-0-postorder-asc-start-50.html

So, I went to #gentoo-portage, and carpaski was nice enough to explain to me the issues with this module:
13:23 < chaosite> What do you guys think about this CDB-portage interface ... ?
13:23 < chaosite> http://forums.gentoo.org/viewtopic-t-261580-postdays-0-postorder-asc-start-0.html
13:24 <@carpaski> Constant DB ?
13:25 < chaosite> Dunno what it stands for...
13:25 < chaosite> It does have a webpage here: http://cr.yp.to/cdb.html
13:27 <@carpaski> cdb modules have been around a while...
13:27 <@carpaski> Someone hacked it in once...
13:27 <@carpaski> Didn't like that hack.
13:28 <@carpaski> First I've seen of an actual module for it though.
13:28 <@carpaski> Anything that relies on an external app that uses C calls is subject to Segfaults.
13:28 <@carpaski> This is why we don't use anydbm by default.
13:29 <@carpaski> flat and cpickle are python... So unless python explodes, portage works.
13:29 <@carpaski> Saves headaches.
13:29 < chaosite> So, basically, you're saying that this adds more runtime dependencies to portage, which might fail?
13:30 <@carpaski> Potentially, yep.
13:30 <@carpaski> If it works for you, great...
13:30 <@carpaski> But I wouldn't have any expectation of it being a default.
13:30 < chaosite> Alright.
13:31 <@carpaski> The major kicker is that segfaults outside of python segfault the entire process.
13:31 <@carpaski> Otherwise it wouldn't be a problem.
13:31 <@carpaski> Seeing this is really disturbing:
13:31 <@carpaski> emerge -e world
13:31 <@carpaski> Segmentation Fault
13:32 < chaosite> Yeah, that won't be any fun...
13:33 < chaosite> Mind if I post this to the forums?
13:33 <@carpaski> Have at it.
13:33 < chaosite> Thanks :)
13:33 <@carpaski> The module can circulate, no problem with that... it could even get included at some point.
13:33 <@carpaski> It just has pretty much no chance of being default.

This goes to show how the Gentoo Devs have differing opinions.
So there :)
EDIT:
http://bugs.gentoo.org/show_bug.cgi?id=83371

(Note that chaosite is Matan Peled is me - Chaosite 16:55, 8 May 2006 (UTC)... Apparently I've become a major proponent of cdb without noticing it :)


This behavior has led to many arguments in the Gentoo community.

Performance Comparison

For a frame of reference, here is a recent (May 2007) performance comparison on a current Gentoo installation with 498 packages installed:

time emerge --metadata (standard config):

   real    11m51.729s
   user    0m11.390s
   sys     0m5.820s

time emerge --metadata (cdb config):

   real    5m45.889s
   user    0m9.110s
   sys     0m3.010s

Almost a 2x performance improvement. This comparison was run on a AMD x64 4400+ nForce4 with 1G and XFS on a md raid 5 volume across four 500G SATA drives.

Retrieved from "http://www.gentoo-wiki.info/TIP_speed_up_portage_with_cdb"

Last modified: Tue, 30 Sep 2008 10:17:00 +0000 Hits: 43,955