Search:  
Gentoo Wiki

HOWTO_Troubleshoot_Diskless_Install

If you followed all the instructions in HOWTO Gentoo Diskless Install, carefully and with your brain turned on, and it still doesn't work, this page should one day help you. Unfortunately, it doesn't yet all that much, and as I am in the same situation as you, I am only a blind leading the blind.

THIS PAGE NEEDS TO BE WRITTEN/EXTENDED. syslog and tcpdump are not enough to help out of many real problem situations.

Contents

Check components individually

Of course, before attemping a network boot, which is a very complex process, you should make sure that the parts it uses are working.

DHCP: Connect with an DHCP client with local OS to test the DHCP server and if the client can use the network. Make sure it uses your Gentoo DHCP server, not that of your router.

NFS: Mount the root filesystem on a normal client and try to read files from it.

System log

The system log /var/log/messages on the server should give some information about the stages the boot process walked through, for example DHCP clients (with their ethernet MAC address) asking for IP addresses, which addresses got assigned, which files were requested through TFTP etc.. This can help at least to narrow down the problem by seeing which steps finished successfully.

You may have to enable this information in the daemons.

Iptables

If you are using iptables and have strict rules add something like this for tftp to work. Note that unblocking port 69 alone is not sufficient.

iptables -A INPUT -s 10.1.1.0/255.255.255.0 -p udp -j ACCEPT

Stuck at grub> prompt?

This might be because pxegrub is trying to load /boot/grub/menu.lst for some reason. You can hack around this by creating a symbolic link:

ln -s client/boot/grub/grub.conf boot

Adapt this to your own "tftp tree".

dnsmasq

By default, the DHCP server already writes some informative messages in the system log. For example:

sigma dnsmasq[10257]: DHCPDISCOVER(eth0) 00:18:39:5c:2d:ea
sigma dnsmasq[10257]: DHCPOFFER(eth0) 192.168.1.233 00:18:39:5b:2d:ea
sigma dnsmasq[10257]: DHCPDISCOVER(eth0) 00:18:39:5c:2d:ea
sigma dnsmasq[10257]: DHCPOFFER(eth0) 192.168.1.233 00:18:39:5b:2d:ea
sigma dnsmasq[10257]: DHCPREQUEST(eth0) 192.168.1.233 00:18:39:5c:2d:ea
sigma dnsmasq[10257]: DHCPACK(eth0) 192.168.1.233 00:18:39:5c:2d:ea eta

The line "log-queries" in /etc/dnsmasq.conf applies only to the DNS server, so probably won't help you and only spam the log.

DHCPd

to be written

tftp-hpa

(same for netkit-tftp)

In /etc/conf.d/in-tftpd, you add -vvvvvv to the INTFTPD_OPTS parameters, e.g.

INTFTPD_OPTS="-u ${INTFTPD_USER} -s ${INTFTPD_PATH} -vvvvvv"

This gives output like the following:

sigma in.tftpd[16638]: RRQ from 192.168.1.233 filename pxegrub
sigma in.tftpd[16638]: tftp: client does not accept options
sigma in.tftpd[16639]: RRQ from 192.168.1.233 filename pxegrub
...

which is not very much, but at least shows which files were requested.

Note that the "client does not accept options" message seems to be only a harmless warning, not a sign for error.

tcpdump

If the above doesn't give you any clues, you can try to listen on the wire traffic using tcpdump. First, note the IP address of the server and - using the system log described above - the dynamically assigned IP address for the client (the DHCP server makes it so that the IP address is usually the same for the same ethernet card / MAC address, so just use the IP address of the last attempt). Then you can do something like (as root):

# tcpdump host 192.168.1.20 and host 192.168.1.233 -s 0 -w attempt1.dump

where 192.168.1.20 is the server, 192.168.1.233 the client, -s 0 means to record the whole packets instead of truncating them, and -w writes everything to a file instead of stdout. You can then look at the dump using

$ tcpdump -r attempt.dump

(which should give the same output as live stdout would have) or

$ less -f attempt1.dump

to look at the whole packet contents in raw form. This is very long, because it includes all the files transferred, including the pxegrub binary, kernel and configuration files. You will see a lot of error message strings which are part of the binary, not actually happening in your case, so differentiate and decode as necessary.

Try pxelinux and a stock netboot kernel

TODO: rewrite this section to be less anecdotic

In my case, pxegrub froze on start. I used pxelinux instead. I added a boot prompt in the config, so that I could see that pxelinux worked properly, and proceeded to load the Linux kernel.

That kernel froze then, however. So, I tried the debian installer kernel made for netboot: kernel, initrd, and configured that

prompt 1
default 0
timeout 300
label Debian installer
  kernel debian/linux
  append vga=normal initrd=debian/initrd.gz ramdisk_size=10934 root=/dev/rd/0 devfs=mount,dall rw  --

and loaded it, and it worked properly. So, the problem is my self-compiled kernel.

Retrieved from "http://www.gentoo-wiki.info/HOWTO_Troubleshoot_Diskless_Install"

Last modified: Fri, 05 Sep 2008 06:22:00 +0000 Hits: 6,649