Search:  
Gentoo Wiki

HOWTO_prefetch_files_on_boot

This article is part of the HOWTO series.
Installation Kernel & Hardware Networks Portage Software System X Server Gaming Non-x86 Emulators Misc


Please improve it in any way that you see fit, and remove this notice {{Cleanup}} from the article. For tips on cleaning and formatting see Cleanup process


Contents

Summary

Getting your system to prefetch files on boot is pretty simple. Prefetching means that your computer will try to load every file it will need to boot, as fast as possible. Ahead of when they are needed. The system loads the files into RAM, into buffers. This speeds up the boot process because the system doesn't have to wait as long for files. It is much faster to get them from RAM than from the disk.

Emerge It

First, do a pretend emerge to see what else may be installed.

# emerge --pretend readahead-list

Okay, now you have an idea how long it will take to emerge. So emerge it.

Get it working

It needs to be added to the boot runlevel. You're getting used to this, right? ;)

# rc-update add readahead-list-early boot
# rc-update add readahead-list boot

You can be done now

Okay, now your system will likely boot faster. If you need more than that, read on.

Or, ubertweak it

The lists of files in the ebuild may not exactly match everyone's system. That is okay, if other files are needed, they will still be read when needed. But what about files that are read but not needed? That isn't helping, it is harming performance. So, don't be weak. You know that you want all the performance possible. So, you want to customize the file lists to load exactly the needed files, right? No more, no less.

Check out this simple script which is offered to get you started:

File: /sbin/utweak-readahead-list.sh
#!/bin/bash
# Copyright 2005, Mick Reed <ykill@110mail.com>
# /sbin/utweak-readahead-list.sh
# ubertweaks the readahead list of initscripts to those that are actually used.
# Okay, it isn't really an ubertweak, that would be more detailed.
# However, it does improve the performance on a few files.
# and it gives you a good starting point, so you can take it to the
# next level yourself.  Hey, and put your results out on the inter-net,
# and share them with us!
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA


for RUNLEVELS_TO_DO in  "/etc/readahead-list/runlevel-default default" \
                        "/etc/readahead-list/runlevel-boot boot"
do

  #Parse RUNLEVELS_TO_DO into positional parameters $1, $2
  set -- $RUNLEVELS_TO_DO

  # Clean out old buggers
  if [ -e "$1".tmp ]
    then rm "$1".tmp
  fi

  # Pull out the /etc/init.d items for this runlevel
  grep -v /etc/init.d $1 >> "$1".tmp

  # Put it back where it goes
  mv -f "$1".tmp $1

  # Append the /etc/init.d/somescript lines to the file.
  ls -1 /etc/runlevels/$2/ | sed 's/^/\/etc\/init.d\//g' >> $1

done

Explanation

This script looks in the boot and default runlevel directories. It gets the actual list of initscripts needed for each runlevel on your system as it is right now. Then, it updates the lists used by the readahead-list package to reflect what is actually needed. Note that this doesn't touch the exec_sbin_rc list, which has more files that are needed to boot the system. This is your mission, should you be geek enough to accept it.

Thoughts

I bet you will actually see better boot times by using the readahead-list package. Especially if you have some personal files that will benefit from readahead. So, to take this to the next level: write something that autogenerates the readahead lists for the rest of the system. Run this script after rc-update or for the smart and lazy, cron it daily or so.

--Petlab 07:51, 19 November 2005 (GMT)

Autogenerating the list of files

Warning: This was not thoroughly tested; use at your own risk
Warning: This may, under a strange set of circumstances, b0rk your install. I don't see how it would happen, but I disclaim all liability

On the Gentoo forums, someone (I can't remember who) suggested that LD_PRELOAD could be used to intercept every file open and log it. I implemented a sample tool to do just that; the skeleton is from [1].

So, create the folder /usr/local/src/opentrace:

mkdir /usr/local/src/opentrace
cd /usr/local/src/opentrace

and create opentrace.c:

File: /usr/local/src/opentrace/opentrace.c
/* relevant includes, the define is needed on Linux
 * but doesn't break it for FreeBSD
 */

#define _GNU_SOURCE
#include <dlfcn.h>

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdarg.h>
#include <stdlib.h>
#include <stdio.h>

/* mark init() to be executed on library load
 */
static void init (void) __attribute__ ((constructor));
static void onExit (void) __attribute__ ((destructor));

/* pointer to the real write call */
static int (*old_open)() = 0;

static int oFD;
/* any initialization code here (reading configfiles, whatever)
 */
static void init (void)
{
        char* oFile;
        /* find the 'real' write, we might need it later */
        old_open= (int (*)()) dlsym(RTLD_NEXT, "open");
        if (old_open== NULL)
        {
                fprintf(stderr, "dlsym: %s\n", dlerror());
                exit(1);
        }

        oFile = "/tmp/open_trace.log";

        if (oFile)
                oFD = old_open (oFile, O_CREAT|O_WRONLY|O_APPEND, 0622);
        else
                oFD = 2;
}

static void onExit (void)
{
        close (oFD);

}

/* grab the open() library call
 * hijacking a function is as easy as creating one with the
 * same name, since we're the last library loaded
 */
int open (__const char *file, int oflag, ...)
{
        va_list ap;

        va_start (ap, oflag);

        /* do funky stuff */
        write (oFD, file, strlen(file));
        write (oFD, "\n", 1);

        /* call the real open function */
        if (oflag & O_CREAT)
                return old_open(file, oflag, va_arg(ap,mode_t));
        else
                return old_open(file, oflag);
        va_end(ap);
}

Then compile and install it with:

gcc -fpic -nodefaultlibs -shared opentrace.c -ldl -o opentrace.so
su cp -a opentrace.so /opentrace.so

Now, you need to tell Gentoo to preload this. Earlier I suggested an alternative init script, but that didn't seem to work.

So, the new method is to use /etc/ld.so.preload.

Basically, at a shell prompt, type

echo "/opentrace.so" >>/etc/ld.so.preload

and follow the foricible reboot instructions below.

Then, you need to forcibly reboot. This doesn't mean hitting "reset" or pulling the power; instead, you need to get the system halted without opening any more files.

Forcible reboot

Now, there are multiple ways to do this; the best way is using "the magic SysRq key" (a kernel config option, under "debugging options").

SysRq (preferred)

Type

<Left Alt>+<SysRq>+S E I U B

(Meaning, type "seiub" while holding the LEFT alt key and the SysRq key (usually above "print screen")) Your computer will reboot in a jiffy. Continue at the section below "Continue Here"

Kill init (less good, but still safe. May add a few files to the log)

Type:

sync; sync; sync
mount -o remount,ro /
<likewise for any other mounted filesystems>
kill -9 1

You will get a kernel panic; at this point, hit reset.

Continue here

Now, take a deep breath and reboot.

Let it boot to the point at which you want preloading to stop (ex, full system boot and login). Then, you need to get it to stop logging.

So, start up a SMALL text editor (like vi or ed... no matter how evangelical you are about Emacs, this is one place in which it will really throw things off.) and remove the line from /etc/ld.so.preload that you added earlier. Now, reboot normally.

Now, for the third boot... (thank heavens this will be faster soon!) Select your standard boot option, but hit 'e' to edit it on the grub command line. On the "kernel" line, add "init=/bin/bash" at the end.

Now, this boot will be FAST... but don't get excited; it's loading the kernel, bash, and NOTHING else. So, you need / to be mounted read-write:

mount -o remount,rw /

The list of ALL files opened at boot is in /tmp/

Now, you need to filter out all of the non-existant files:

This script will do the trick:

File: listExisting.sh
#!/bin/sh

while :; do
        read x
        [ -e "$x" ] && echo "$x"
done |sort |uniq

Run it as

./listExisting.sh </tmp/open_trace.log >/tmp/open_trace.log.clean

Now, the file /tmp/open_trace.log.clean can be used as your prefetch file.

Unfortunately, it is everything loaded at boot, which may or may not be a good thing. Fixing that is an exercise left to the reader :-)

Alternative auto generating using a small daemon

The above solution of auto generating a suitable list of files didn't worked for me well - it returned mostly files located in /proc or /dev. So i wrote a little daemon in python which logs all opened files using the inotify interface provided by the kernel. There is a ability to configure which file directories have to be logged and the daemon can be invoked by an init script.

Preparations

In order to run the daemon, you need:

Readahead-watcher

Save this source code to a file named readahead-watcher.py:

File: readahead-watcher.py
#!/usr/bin/python

"""
Date: 2007-10-14
Autor: Stephan Birkl (sbp a-t extio d0t de)

This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; version 2 of the License.
 
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.
"""

import os
import sys
import pyinotify
import stat

WATCH_BASEPATHS = ["/bin",
                   "/lib",
                   "/sbin",
                   "/usr/kde",
                   "/usr/qt",
                   "/usr/local/bin",
                   "/usr/bin",
                   "/usr/lib",
                   "/usr/share",
                   "/usr/X11R6",
                   "/usr/sbin"]

WATCHED_EVENTS  = pyinotify.EventsCodes.IN_OPEN | pyinotify.EventsCodes.IN_ACCESS
WORKDIR         = "/tmp"
VERIFY_FILENAME = "readahead-watcher-verify-file"
OUTFILE         = "readahead-watcher-output"
LOGFILE         = "readahead-watcher-log"
CHECKREADY      = False

class ProcessEventHandler(pyinotify.ProcessEvent):
  def __init__(self):
    self.ready = False
    self.watchedFiles = {}
    self.lastFile = ""
  
  def process_IN_ACCESS(self, event):
    self.__process_event(event)
    
  def process_IN_OPEN(self, event):
    self.__process_event(event)

  def __process_event(self, event):
    try:
      if not CHECKREADY or self.ready:
        filename = os.path.join(str(event.path), str(event.name))
        if filename != self.lastFile:
          self.lastFile = filename
          
          fileMode = os.stat(filename)[stat.ST_MODE]
          if stat.S_ISREG(fileMode):
            if not self.watchedFiles.has_key(filename):
              self.watchedFiles[filename] = 0
              
            self.watchedFiles[filename] += 1
      
      else:
        if str(event.name) == VERIFY_FILENAME:
          print "File watching is now ready."
          self.ready = True
    
    except:
      pass

class Main(object):
  def __init__(self):
    print "Scanning directory structure. This can take up to a couple of minutes..."
    sys.stdout.flush()

    self.WatchMngr = pyinotify.WatchManager()
    if CHECKREADY:
      self.WatchMngr.add_watch(WORKDIR, WATCHED_EVENTS, rec=True)
    for basepath in WATCH_BASEPATHS:
      self.WatchMngr.add_watch(basepath, WATCHED_EVENTS, rec=True)
    
    self.EvtHandler = ProcessEventHandler()
    self.Notifier = pyinotify.Notifier(self.WatchMngr, self.EvtHandler)
 
  def Run(self):
    if CHECKREADY:
      self.__TouchVerifyFile()
    
    print "File watching is now active."
    
    self.__SetDaemonMode()
    
    while(True):
      try:
        self.Notifier.process_events()
        if self.Notifier.check_events():
          self.Notifier.read_events()
      except:
        self.Notifier.stop()
        break

    self.OutputResult()

  def OutputResult(self):
    # Sort output list by the count of accesses
    tmpOutList = self.EvtHandler.watchedFiles.keys()
    tmpOutList.sort(self.__outListSort)
    
    f = open(os.path.join(WORKDIR, OUTFILE), "w")
    f.write("\n".join(tmpOutList))
    f.close()
  
  def __outListSort(self, x, y):
    cnt1 = self.EvtHandler.watchedFiles[x]
    cnt2 = self.EvtHandler.watchedFiles[y]
    
    if cnt1 < cnt2:
      return -1
    else:
      return (cnt1 > cnt2)

  def __TouchVerifyFile(self):
    f = open(os.path.join(WORKDIR, VERIFY_FILENAME), "w")
    f.write("ok")
    f.close()

  def __SetDaemonMode(self):
    sys.stdout = sys.stderr = open(os.path.join(WORKDIR, LOGFILE), "w")
    
    pid = os.fork()
    if pid > 0:
        sys.exit(0)

    os.chdir("/")
    os.setsid()
    os.umask(0)

    pid = os.fork()
    if pid > 0:
        sys.exit(0)

MainObj = Main()
MainObj.Run()

In order to be able to run it, you have to set the correct access rights:

chmod a+x readahead-watcher.py

As you can see on the WATCH_BASEPATHS list on the top of the file, only certain paths are watched for file accesses (which works good for me, but probably not for you). So feel free to change it that it fits to your system. The specified paths are watched recursively - e.g. when you specify "/usr/share" also "/usr/share/X11/" is being watched.

In order to get the daemon to do his job, it has to be called at boot time. There are several ways to achieve this - we do this by an init script:

File: /etc/init.d/readahead-watcher
#!/sbin/runscript

depend() {
        need localmount
}

start() {
        ebegin "Starting readahead-watcher"
        <path-to-readahead-watcher>
        eend $?
}

Replace the placeholder "path-to-readahead-watcher" with the full path to the daemon script - e.g. /home/sb/readahead-watcher.py

Don't forget to chmod it after creating!

Add this script to the boot runlevel:

rc-update add readahead-watcher boot

Result

Now reboot your machine. On the next boot the daemon will be activated and is logging all opened files.

After your system startup, we first have to stop the logging. To get him saving the list of opened files, we do this by sending the signal SIGINT to the daemon process:

  ps ax | grep readahead-watcher
  kill -2 <PID>

The list of all opened files is now in /tmp/readahead-watcher-output. It is sorted by the amount of accesses!

Finally copy the file /tmp/readahead-watcher-output to /etc/readahead-list/runlevel-default.

If you are happy with the list, don't forget to deactivate the readahead-watcher using rc-update since the list of opened files on startup won't change very often.

Troubleshooting

If all files in your watch path are listed in the result, regardless of the accesses, try setting CHECKREADY to True in the daemon.

If you have any questions or feedback, feel free to contact me: sbp a-t extio d0t de

Retrieved from "http://www.gentoo-wiki.info/HOWTO_prefetch_files_on_boot"

Last modified: Mon, 25 Feb 2008 14:50:00 +0000 Hits: 18,616