Viewing 17 posts - 1 through 17 (of 17 total)
  • Linux help, diskless workstation (PXE, dracut, kernel knowledge needed)
  • aracer
    Free Member

    I suppose I might as well try on here first. Trying to get a diskless workstation working with Fedora 21 – ie booting via PXE with the root disk mounted over NFS from a server. Have been trying and failing for ages.

    Well it does work if I use the initrd.img downloaded from http://www.mirrorservice.org/sites/dl.fedoraproject.org/pub/fedora/linux/releases/21/Server/i386/os/images/pxeboot/ but that is hugely bloated, takes a long time to copy to the client and takes ages to run. Not only that, but it is also built for an older kernel and won’t work with an updated installation. I want to use my own smaller ramdisk image created using dracut (or any similar tool).

    If I built my own, then it hangs when trying to mount the nfs share. The irritating thing is it works fine under Centos 6.6 (equivalent to ~Fedora 17 or 18). With Centos 6.6 I just run dracut with no options to generate the ramdisk. If I do that with F21 I get errors – I get rid of the errors by running
    dracut -m "nfs network base" --add-drivers nfsv4
    which adds dracut modules and kernel drivers, but it still hangs and doesn’t boot.

    Anybody done something similar and have any ideas? Anybody even know what I’m talking about? I think the issue is that I need to add more kernel drivers, but I don’t know which ones (I’ve already tried adding all possible dracut modules, but don’t know where to get a similar list of kernel drivers).

    p.s. not tried with Centos 7 as that’s 64 bit only and running on old hardware with performance issues, could try an older version of Fedora, but don’t think I’ll learn much more

    brassneck
    Full Member

    Been ages!

    My ill informed guesses would be:

    NFS share options (server side)
    Network card driver issue
    video card (if booting to X or have a splash screen. unl;ikely at this point i’d have thought)..

    Think there is a log generated though? can you et to the rd.shell or does it just hang on you?

    aracer
    Free Member

    Thanks for the reply. Have checked the nfs share options – can mount fine from a different machine, and as I said it works if I use the downloaded (bloated, old) ramdisk, or using Centos 6.6. Not a video issue, one of my test installations is a minimal text only install. It boots to rd shell, but the log generated doesn’t give any clues at all.

    Network card driver is a good call though – I didn’t think it was as it reports initialising it, but haven’t explicitly included a driver so will have another look.

    brassneck
    Full Member

    If you can swap it out on the wkstn try something really vanilla like
    an Intel Pro/1000 if you have such a thing knocking about, might be quicker than tracking down drivers

    Just wondered if there were some screwy differences in default mount options between the distros, but if it mounts without hassle elsewhere it’s unlikely.

    aracer
    Free Member

    It is an Intel E1000! Am testing on a VM and that’s the default (have tried on real hardware, which strangely didn’t need the nfsv4 kernel driver, but otherwise got stuck at the same place).

    oldnpastit
    Full Member

    The usual things I hit when doing very similar things are:

    1. nfs version woes, so one side is doing v3 and the other only talks v4 or vice-versa. Check what the server supports and force it on the kernel command line with nfsvers=4 or nfsvers=3.

    2. not running the locking daemon (which I think you only need for v3). Either read the man page to work out how to fix it properly (which I’ve never done, so would be interested in the answer) or add nolock to the kernel command line NFS options.

    3. client is using an “insecure” port (port number >1024). Add insecure to your /etc/exports options. It’s not really insecure, it’s just a booby trap for the unwary.

    Also, check the logs on the server if at all possible, and try running tcpdump on the server (it could be something lower level that prevents your ethernet from working at all, for example).

    brassneck
    Full Member

    t is an Intel E1000! Am testing on a VM and that’s the default (have tried on real hardware, which strangely didn’t need the nfsv4 kernel driver, but otherwise got stuck at the same place).

    Hmm, that should be OK then. Shocking performance but at least work 🙂

    Worth trying to trace NFS activity on the server (http://wiki.linux-nfs.org/wiki/index.php/General_troubleshooting_recommendations .. not tried it but looks like an idea . trace debugging section) just to rule it out if nothing else.

    IA
    Full Member

    If you have a setup that works, but wrong linux version – have you booted the working one and made a note of the modules it loads? Or even just taken a copy of the logs from a successful boot to compare against?

    aracer
    Free Member

    Thanks for all the help – surprised how much useful info I’ve got here! A few things to try:

    – should all be nfsv4 as I’m running F21 for the server and the client, but I’ll double check.
    – fairly sure I’ve tried noting the kernel modules loaded with a working version, but that may have been before I got the right dracut modules loaded (without which it falls over earlier with F21), so will try again

    dobo
    Free Member

    i dont know fedora but some of the config is different for nfs v3 and v4
    i would suggest not using nfsv4, my media player for instance simply will not connect to a nfsv4 share whilst another linux client will just fine. it also doesnt like udp and tcp is more tolerant of issues.

    aracer
    Free Member

    I don’t think I have the option with F21 – is likely to cause more problems if I try, given I get an earlier error if I just include the nfs rather than the nfsv4 kernel module (the nfsv4 module includes nfs as a dependency I think, though maybe I should make that explicit).

    Maybe Centos 6.6 is using nfsv3 which might explain the observed difference? I just can’t find any details of anybody doing this in F21 (or F20, though some people asking why they can’t get that working), when it clearly used to be extremely straightforward.

    I’ll see what I can do with the nfs config as that might be the key based on what you’re suggesting.

    dobo
    Free Member

    any luck aracer?

    go on the server what does ‘rpcinfo -p’ show. you should get something like

    100003 2 tcp 2049 nfs
    100003 3 tcp 2049 nfs
    100003 4 tcp 2049 nfs

    this confirming that you have nfs server capable of v2 v3 v4 over tcp on port 2049.

    a bit simplistic but if that looks good then it could just be a config error in your etc/exports on the server.

    aracer
    Free Member

    I get


    100003 3 tcp 2049 nfs
    100003 4 tcp 2049 nfs

    So v3 and v4 but not v2 presumably – though that shouldn’t be an issue?

    oldnpastit
    Full Member

    What’s in /etc/exports ?

    Is there anything in /var/log/whatever?

    EDIT: Unfortunately, Fedora has now switched to systemd which means all of your logs are conveniently packaged in a non-human-readable format 🙁 Hopefully someone on here knows how to make sense of the shiny new binary format.

    aracer
    Free Member

    /etc/exports


    /var/lib/tftpboot/fedora21/root *(rw,sync,no_root_squash,no_all_squash)
    /mnt/nfsroot *(rw,sync,no_root_squash,no_all_squash)

    (options from an article on PXE booting with Centos 6)
    one of the shares is a minimal chroot, the other the disk for a standalone workstation installation (dead easy doing that sort of stuff with VMs) – oh and yes I have edited /etc/fstab in the latter

    Nothing relevant I can see in /var/log on the server, though there’s stuff under /var/lib/nfs which I’ll monitor next time I try – should point out that I’ve not done much with this since starting the thread, so have a few things lined up to look at.

    dobo
    Free Member

    not having v2 support on server shouldnt be an issue as long as the client supports v3 or v4.
    exports look ok, time to check logs and your diskless booting setup i think.
    i’m tempted to try some diskless booting

    aracer
    Free Member

    Just in case anybody is interested (and because I’ve found various reports of this problem but no solution, and find it amusing if the only solution on the net is on STW 😉 ), I’ve found the problem and got it working.

    Nothing at all to do with the nfs server or nfs mounts, that was working fine. The problem was that it was trying and failing to mount some stuff when still running the init ramdisk before switching to the mounted nfs image – I’d seen this in the logs before (and probably should have shared them here, somebody would have spotted it instantly) but not paid any attention.

    Solved by editing /etc/fstab to remove all entries on the machine where I was generating the ramdisk and using the –fstab option (which forces it to use the now empty fstab to select mounts). This works, but seems like a real bodge, I’m sure there must be a better way to do it without having to edit /etc/fstab, but not yet found it.

    Anyway thanks for all the help, which did at least get me looking at it properly when I was feeling really frustrated by it.

Viewing 17 posts - 1 through 17 (of 17 total)

The topic ‘Linux help, diskless workstation (PXE, dracut, kernel knowledge needed)’ is closed to new replies.