Search:  
Gentoo Wiki

TIP_Find_binary_package_duplicates

This article is part of the Tips & Tricks series.
Terminals / Shells Network X Window System Portage System Filesystems Kernel Other

Contents

Overview

emerge can build binary packages for every package you install (see the FEATURES option "buildpkg" in /etc/make.conf or emerge's -b and -B option). The built packages are normally installed in /usr/portage/packages. If you want to provide your compiled packages to other Gentoo machines (e.g., in your LAN or WAN) or are restricted in disk space, you soon might wonder how to clean up your packages directory, because package files are only added and never deleted there. Deleting the packages by age is some kind of option, but ideally, you only want to keep the newest binary package for each ebuild.

Provided here is a fast Perl script that checks for binary packages that can be deleted. The script is slot-aware, so it actually checks for slot duplicates. The packages need not be installed on (i.e., merged into) the system (see emerge's -B option). The script only needs to check /usr/portage/packages and /usr/portage/metadata/cache to find duplicates.

Before deleting a binary package, the script asks for permission to do so.

The current version of the script is version-aware, which means it can properly determine the ordering of package versions and can automatically delete all packages in the same slot, except the newest one (or the newest n packages). Note that "newest package" means the newest built package available in /usr/portage/packages, which is not necessarily the newest package available in portage.

Just run this script once in a while to delete unneeded packages.

FYI - there is now mainly support for some of this in Gentoolkit via the eclean command.

The script

File: find_pkg_dups.pl
# find_pkg_dups.pl
#
# Version: 0.3 (19-Oct-2005)
#  Author: Thomas Schuerger (thomas@schuerger.com)
#
# Finds and (after confirmation) deletes all unneeded binary package duplicates in the
# packages directory.
#
# A package is considered an unneeded duplicate if it has the same slot as another
# package with the same package basename and if it is not the newest one.
#
# It uses a slot cache to speed up slot determination.
#
# Comments and suggestions are welcome!
#
# History:
#
# 0.3: Added usage of a slot cache file.
# 0.2: Added proper package version comparison.

use strict;

use vars qw(%state $PKGDIR $CACHEDIR $SLOTCACHEFILE $KEEP_NEWEST %pkgcount %pkglist %slot);

$PKGDIR = "/usr/portage/packages";
$CACHEDIR = "/usr/portage/metadata/cache";

# where to store the slot cache file
$SLOTCACHEFILE = "/tmp/find_pkg_dups.cache";

# number of newest versions per pkg to keep (1 = only newest, 2 = the two newest, etc.)
$KEEP_NEWEST = 1;

# order of package states
%state = ("_alpha" => 0,"_beta" => 1,"_pre" => 2,"_rc" => 3,"" => 4,"_p" => 5);

chdir "$PKGDIR" or die "$PKGDIR not found";

my($cache_modified) = 0;

read_slot_cache();

# get all available binary packages, including category and version
# (excludes "All" directory), sorted by package name and version
my(@pkgs) = sort pkg_cmp map {/^(.*).tbz2/;$1} glob "[a-z]*/*.tbz2";

my($pkg);

foreach $pkg (@pkgs)
{
  # extract the basename (is there a better way?)
  $pkg =~ /^(.*\/.+)\-\d/;
  my($pkgbase) = $1;

  my($slot) = get_slot($pkg);

  if($slot == -1)
  {
    # package version is not in the portage tree any longer
    print "Unknown package $pkg (not in portage tree)\n";
    delete_pkg($pkg);
  }
  else
  {
    $pkgcount{$pkgbase}{$slot}++;
    push(@{$pkglist{$pkgbase}{$slot}},$pkg);
  }
}

# write current slot cache
write_slot_cache() if($cache_modified);

# output duplicates

foreach $pkg (sort keys %pkgcount)
{
  my($slot);
  foreach $slot (sort keys %{$pkgcount{$pkg}})
  {
    if($pkgcount{$pkg}{$slot} > 1)
    {
      print "Packages for $pkg (slot $slot): ".join(", ",@{$pkglist{$pkg}{$slot}})."\n";

      # delete all but the newest $KEEP_NEWEST packages
      delete_pkg(@{$pkglist{$pkg}{$slot}}[0..$#{$pkglist{$pkg}{$slot}}-$KEEP_NEWEST]);
    }
  }
}

# write current slot cache again
write_slot_cache() if($cache_modified);

# Returns the slot number for the given package (with category and version)
# Slot 0 means "unslotted package", -1 indicates an error.
# Uses a slot cache to avoid opening files.

sub get_slot
{
  my($pkg) = $_[0];
  my($slot) = $slot{$pkg};

  if(defined $slot)   # check slot cache
  {
    # check if metadata file still exists
    return $slot if(-f "$CACHEDIR/$pkg");

    # doesn't exist, remove entry from cache

    delete $slot{$pkg};
    $cache_modified = 1;
    return -1;
  }

  # open the metadata file
  open(FILE,"$CACHEDIR/$pkg") or return -1;

  # the third line contains the slot number

  <FILE>;
  <FILE>;
  $slot = <FILE>;
  chomp $slot;

  $slot{$pkg} = $slot;
  $cache_modified = 1;

  print "Got slot for $pkg\n";

  return $slot;
}

# Deletes the package after asking for permission to do so

sub delete_pkg
{
  foreach $pkg (@_)
  {
    print "Delete package $pkg? ";
    $a = <STDIN>;

    if($a =~ /^y$/i)
    {
      # delete softlink
      print "Deleting $PKGDIR/$pkg.tbz2\n";
      unlink("$PKGDIR/$pkg.tbz2");

      # delete package
      $pkg =~ /\/(.*)$/;
      print "Deleting $PKGDIR/All/$1.tbz2\n";
      unlink("$PKGDIR/All/$1.tbz2");

      # remove entry from slot cache
      delete $slot{$pkg};
      $cache_modified = 1;
    }
  }
}

# Compares two package names with versions (usable for sorting)
# Such names are of the form "category/pkg-ver{suf{#}}{-r#}"
# (see ebuild HOWTO)

sub pkg_cmp
{
  my($r);

  $a =~ /^(.*)-(\d+(?:\.\d+)*)([a-z])?(?:(_alpha|_beta|_pre|_rc|_p)(\d+)?)?(?:-r(\d+))?$/;
  my($apkg,$aver,$aversuf,$astate,$astatenum,$arevnum) = ($1,$2,$3,$4,$5,$6);
  $b =~ /^(.*)-(\d+(?:\.\d+)*)([a-z])?(?:(_alpha|_beta|_pre|_rc|_p)(\d+)?)?(?:-r(\d+))?$/;
  my($bpkg,$bver,$bversuf,$bstate,$bstatenum,$brevnum) = ($1,$2,$3,$4,$5,$6);

  # compare package name

  $r = $apkg cmp $bpkg;
  return($r) if($r != 0);

  # compare version list

  my(@aver) = split(/\./,$aver);
  my(@bver) = split(/\./,$bver);

  my($i);
  my($c) = ($#aver <= $#bver ? $#aver : $#bver);

  for($i=0;$i<=$c;$i++)
  {
    $r = $aver[$i] <=> $bver[$i];
    return($r) if($r != 0);
  }

  $r = $#aver <=> $#bver;
  return($r) if($r != 0);

  # compare version letter (may be undefined)

  $r = $aversuf cmp $bversuf;
  return($r) if($r != 0);

  # compare states (_alpha, _beta, etc.), may be undefined

  $r = $state{$astate} <=> $state{$bstate};
  return($r) if($r != 0);

  # compare state number (may be undefined)

  $r = (!defined $astatenum ? -1 : $astatenum) <=> (!defined $bstatenum ? -1 : $bstatenum);
  return($r) if($r != 0);

  # compare revision (may be undefined)

  $r = (!defined $arevnum ? -1 : $arevnum) <=> (!defined $brevnum ? -1 : $brevnum);
  return $r;
}

sub read_slot_cache
{
  open(FILE,"<$SLOTCACHEFILE");

  while(<FILE>)
  {
    if(/^(.*?) (.*)$/)
    {
      $slot{$1} = $2;
    }
  }

  close FILE;
}

sub write_slot_cache
{
  my($i);

  open(FILE,">$SLOTCACHEFILE") or warn("Couldn't write slot cache file $SLOTCACHEFILE");

  foreach $i (sort keys %slot)
  {
    print FILE "$i $slot{$i}\n";
  }

  close FILE;

  $cache_modified = 0;
}

Running the script

Simply start the script by running "perl find_pkg_dups.pl" or by making it executable and by running "./find_pkg_dups.pl".

If you want all duplicate packages to be removed automatically, you might want to run "yes | perl find_pkg_dups.pl".

Use at your own risk!


A second script

This script is very similar to the one above (I read the above's source code while writig this one) with a few differences:

(If you have a central package server for a network this might be tha case.) thus, the assumed structure is like:

 /usr/portage/packages/
   athlonXP/
     All
     app-admin
     ...
File: cleanpkg.py
#!/usr/bin/env python
# -*- iso-8859-1 -*-

"""
  Cleans old packages from /usr/portage/packages
  
  :Author:
    Henning Hasemann (henning at hasemail dot de)

  Known Bugs
  ----------
  * This script doesnt recognize Dates encoded in version numbers.
    I.e. they will be handled like version numbers, for example:
    foo-20010523 > foo-1.0 (since the number 20010523 is greater than 1)
    Comparing date-versioned packages with other date-versioned packages is
    no problem though.
  * I didnt test this script very much up to now, so use at you own risk.

  History
  -------

  ======= ========== =======================================
  Version Date       Changes / Author
  ======= ========== =======================================
      0.1 2006-28-02 Created (henning at hasemail dot de)
  ======= ========== =======================================
"""

__version__ = "0.1"
__docformat__ = "restructuredtext"

import os, os.path as path, re

# This will tell you a lot of uninteresting information
# when turned on
verbose = False

pkgdir = "/usr/portage/packages"
cachedir = "/usr/portage/metadata/cache"

splitfilename = re.compile(
  r"^(?P<name>.+)-"
  r"(?P<version>[0-9]+[0-9.]*)(?P<appendix>[a-z]*)"
  r"(?P<state>(?:_alpha|_beta|_pre|_rc|_p)?)(?P<stateinfo>[0-9]*)"
  r"((-r(?P<release>[0-9]+))?)\.tbz2")

packages = {}  

statemap = {
  "_alpha": 1,
  "_beta": 2,
  "_pre": 3,
  "_rc": 4,
  "_p": 5,
  
  "": 10,
}


def get_slot(filename):
  # The slot is in line 3 in $cachedir/$category/$filename
  try:
    cachefile = open(filename)
  except IOError:
    return None
  
  cachefile.readline()
  cachefile.readline()
  slot = cachefile.readline()
  cachefile.close()
  return slot.strip()

def delete(arch, cat, fn):
  yes = ("y", "yes", "j", "ja", "Y", "Yes", "J", "Ja")
  no = ("n", "no", "nein", "N", "No", "Nein")

  print "Do you want to delete %s/%s/%s?" % (arch, cat, fn),
  ans = ""
  while not (ans in yes or ans in no):
    ans = raw_input()
  
  if ans in yes:
    print "[*] Deleting %s/%s/%s" % (arch, cat, fn)
    # Remove symlink
    os.remove(path.join(pkgdir, arch, cat, fn))
    # Remove real file
    os.remove(path.join(pkgdir, arch, "All", fn))

if __name__ == "__main__":
  for arch in os.listdir(pkgdir):
    archdir = path.join(pkgdir, arch)
    for category in os.listdir(archdir):
      if category != "All":
        
        categorydir = path.join(archdir, category)
        for filename in os.listdir(categorydir):
          m = re.search(splitfilename, filename)
          if m is None:
            print "Attention! Couldnt parse package-name '%s'!" % filename
          else:
            slot = get_slot(path.join(cachedir, category, path.splitext(filename)[0]))
            g = m.groupdict()
            packet = g["name"]
            
            # Make version a tuple for fast comparsion later
            version = (
              map(int, g["version"].split(".")),
              g["appendix"],
              statemap.get(g["state"], 0), g["stateinfo"],
              int(g["release"] or 0)
            )
            pkg = (arch, category, packet)
            

            if verbose:
              print filename
              print arch, category, packet, version, "==>", slot

            if slot is None:
              print "Package %s/%s/%s not found in the portage tree." % (
                  arch, category, filename)
              delete(arch, category, filename)
            else:
              if not packages.has_key(pkg):
                packages[pkg] = {}
              if not packages[pkg].has_key(slot):
                packages[pkg][slot] = []

              # version is for quick comparsion/sorting of versions later
              # filename for easy finding of the file
              packages[pkg][slot].append( (version, filename) )

  # Now look for old packages, and remove them
  #
  # packages now looks like this:
  # {
  #   ("desktop-AthlonXP", "dev-lang", "cpp"): {
  #     0: [ ([1, 2], "a", 10, "", 6), ... ],
  #   }
  #   ...
  # }
  for (arch, cat, packet), slotinfo in packages.iteritems():
    for slot, versions in slotinfo.items():
      if len(versions) > 1:
        vsorted = sorted(versions)
        for v, filename in vsorted[:-1]:
          print ("\nBinary packet %s/%s/%s is obsolete.\n"
            "(%s seems better).") % (arch, cat, filename, vsorted[-1][1])
          delete(arch, cat, filename)
Retrieved from "http://www.gentoo-wiki.info/TIP_Find_binary_package_duplicates"

Last modified: Wed, 26 Dec 2007 15:15:00 +0000 Hits: 10,836