Viewing 25 posts - 1 through 25 (of 25 total)
  • Waaaay OT: Talk to me about cluster computing
  • bristolbiker
    Free Member

    I know there are some seriously clever IT bods on here who might be able to assist me with the basics, as the Microsoft website has left me feeling more confused.

    The basics: we have 3x Windows 7/8-core/32Gb RAM/1Tb HD machines that are used as desktops during the day. Would like to run this as a cluster for number-crunching overnight, plus scavenging the other Windows 7 desktops when they are available. Is Windows HPC Server 2008 the way to go? Are there other 3rd-party software tools to do this? Has anyone set up something similar and advise on a cost effective way to do it/provide indicative software costs/describe how much hastle it is to impliment.

    May be best done off the forum, so hands-up anyone who can help in the first instance. Ta!

    <awaits tumbleweed to come rolling by…..>

    jca
    Full Member

    You need a condor.

    Good for cycle-scavenging, supports MPI and checkpointing (so if someone comes along and starts using a machine you a running a job on it gets moved to another machine without needing to restart…). Not too hard to setup (erm…I’ve only used it for unix machines, so can’t really say how the windows side works). And it’s free…

    FuzzyWuzzy
    Full Member

    Sounds more like you need some grid computing enabling software, not dabbled in it myself but imagine anything decent out there is designed for a server OS not Windows 7.

    Fresh Goods Friday 696: The Middling Edition

    Fresh Goods Friday 696: The Middlin...
    Latest Singletrack Videos
    Cougar
    Full Member

    Yeah.

    Interesting question, but the only clustering I’ve dealt with in anger is Enterprise-class application clustering for things like Exchange and SQL.

    HPC might well do what you’re asking, for a price of course. I could be wrong but I have it in the back of my head that HPC requires a server OS for its Compute nodes (so W7 wouldn’t cut it), but could be wrong. I’d check if I were you.

    The good thing is that HPC is available as a six-month free trial version, so you can find out first hand if it’s right for you.

    bristolbiker
    Free Member

    Thanks for the input.

    I have checked – as far as I can accertain, you only need HPC server on the master node and an HPC extension license to W7 on the compute nodes (this is my reading of the Microsoft bumpf) – this is the most basic way to do it AFAIK. The minimum spec HPC server license (Enterpise edition, to handle between 5 and 64 sockets per node) is going to be between £500 and £1000. The W7 node HPC extensions are about £100 a throw – so for the three machines, £1500 should do it in software. If we don’t cluster the existing hardware and just buy a bigger single machine a) the existing hardware is “wasted” and the cost of new hardware to equal the capability of the three machines is going to be many thousands of pounds. I accept the equation isn’t as simple as that as performance scaling with our existing infrastructure isn’t going to be (anywhere near) linear, but the clustering requirment is potentially intermittent as well.

    I have seen the trial of HPC server, but don’t know if this will cover some HPC extension licenses for the the W7 nodes to actually try driving and managing a cluster (though a few hundred quid could be found to try it). Maybe this is a way to go to dip a toe in the water, setting up the cluster as virtual machines on the existing hardware….

    I see there are other options, like Platform LSF, but the determining costs and practicalities is tricky for a like-for-like comparison.

    EDIT: – ….and Condor does indeed look interesting…..

    Cougar
    Full Member

    Cool.

    Do me a favour and keep us posted? I’d be interested to hear your findings, at any rate.

    brassneck
    Full Member

    HPC on a budget = Beowulf
    Clicky

    You might be better off spending the licence costs on older hardware to dedicate – old DL360 G4 Proliants would be a decent choice and cheap enough to buy a rackload – if your process can be massively parallel (like SETI or folding for example) you might be better off with ‘lots’ rather than ‘fast’ .
    I’m not involved in our HPC work but Beowulf is what they use AFAIK .. though a recent global deal with MS may change this. The trick is also having developers who can push it to do what you need doing.. as an infrastructure nerd this is bad mojo to me. It’s typically the owners/developers who look after the platform too, as in academia.
    In general your application and budget will decide the platform per normal.

    toys19
    Free Member

    There might be an issue with your analysis software in that they wont let it run on your cluster without another special (expensive) licence (I think you are doing the same kind of stuff as me). I tried this and have so far failed although I’ve been told unofficially that it can be bypassed..

    bristolbiker
    Free Member

    Again, thanks.

    Toys – It will be primarily Abaqus runs. I have an info request in with their support desk to tell me the implications of what I am proposing with regards to tokens etc. I have layed out the cluster we propose and have asked for software recommendations, as well as issues specific to Abaqus. I know they cluster their training PC’s overnight for large diagnostic jobs – they were offering 3 free runs on the cluster to evaluate job management/speed etc as clustering is something they are pushing. What code are you thinking of?

    Beyond that, another part of the company develops bespoke engineering software and the clients that use it are asking for development to include clustered architectures, hence a cluster may be developed and implimented for other purposes….. even if I don’t get my greasy mits on it for a while!

    xiphon
    Free Member

    Get a bucket load of older Dells or HPs, and run it on them.

    Got a (relatively) high powered server at home – DL385 / dual 2.2 Opteron / 24GB RAM / 200GB SCSI for about £150! It’s my virtual platform.

    Else…. have you considered dual booting the PCs?

    Server OS on one disk, desktop OS on the other?

    molgrips
    Free Member

    Depend what calculations you are running.

    Some problems can be split up into chunks and the boundaries communicated – simple networking code could solve that.

    bristolbiker
    Free Member

    Xiphon – the hardware isn’t an issue….. it the software options to cluster the hardware that I’m interested in.

    Else…. have you considered dual booting the PCs?

    Server OS on one disk, desktop OS on the other?

    Yes – this is how I believe Simulia/Abaqus do it. I like the simplicity of the scavenging with a W7 HPC extension, but accept it may get more complicated than that.

    Some problems can be split up into chunks and the boundaries communicated – simple networking code could solve that.

    I BELIEVE load balancing in the first instance, with network paralelisation coming along in future relases of Abaqus will make this more trouble than it’s worth, but noted and may be appropriate for our in house coding development.

    bristolbiker
    Free Member

    Xiphon/brassneck – we may have been talking at cross purposes. I see Beowulf is for Linux, and I think Xiphon meant dual-booting Linux/W7. Noted, but I(‘m sure the purse holders here) would like to stay in a Windows universe. Linux is an option, but this would be a much harder sell into the business.

    xiphon
    Free Member

    I know Beowulf is for Linux 🙂

    I was thinking more like Windows Server / Windows 7 dual booting, if Win7 client can’t run the software?

    bristolbiker
    Free Member

    Yeah, ok – HPC server/W7 dual boot is on the table if I’ve got the wrong end of the stick about the W7 HPC extension….. but if it comes to that the costs will be (reltively) huge for the size and capability of the cluster. More reading/info required……..

    xiphon
    Free Member

    Sounds an interesting project, so I’d like to know the progress of it too…

    bristolbiker
    Free Member

    It is intersting, but…. it’s part of larger IT infrastructure review for our company of 15, I’m the only one interested in attepting clustering (for the moment – though that will no doubt changewhen management sees the bill for new hardware as the alternative, compared to using the machines we have!) and the only reason for the questions at the moment is to see if any issues relating to clustering in the future affect out immediate decisions about hardware and software. Looks strongly like we could build a cluster as and when it’s needed (within reason), without impacting the rest of the busniess to any great extent….. hence, with that information in hand it may be put on the back burner for a while.

    Thanks for all the thoughts though – it’s been genuinely useful. If we take it further I’ll report back.

    TheBrick
    Free Member

    I’ve been running code on one for the last 6 months. I’m not administer it but it uses sun grid engine, previously it used PSB. SGE is a bit nicer than PSB from a user point of view, or maybe I’ve just become more accustomed to HPC.

    I did my masters at Imperial and they had windows boxes that rebooted over night into linux and ran clustered jobs. This can be easy to set up apparently with http://clusterknoppix.sw.be/.

    Were I am now there is also a condor cluster working on windows machines, I looked into using that too. From what I understand I think that an advantage on it in that cluster jobs may use free cpu cycles, this could be good as it would mean when someone is on lunch break their pc could be number crunching. I think it just essentially puts cluster jobs at nice -19 or whatever the windows equivalent is.

    The important thing about HPC is working out what type of problem you have.

    Is it one that requires high bandwith? Think infaband think $$ or maybe 10g Ethernet might be ok. Is the calculation its self parallelisable? E.g a program that regular inverts large matrices is parallelisable but may require high bandwidth to stitch the results back together again and pass off on to the next job. How much devel time is needed to rewrite the program so it is parallel? Is the program already multi threaded? If so then something like cluster knoppix would be a good quick solution as the special patched kernel can deal with all of the management without any extra development.

    Alternatively can the job be split at the start and run as separate jobs added together at the end? Is a lot of serial jobs with different parameter values? This is another easy to cluster job.

    brassneck
    Full Member

    I know Beowulf is for Linux

    Likewise 🙂

    I’m suggesting that your licence costs could be used to purchase hardware to dedicate to the task 24×7 and it might be a better use of the available funds than messing about with some desktops. depending on the task, lots of older processors may do the job as well as a few modern ones.

    In my experience the purse holders are interested in whatever ticks all the boxes required for the least outlay – and they usually prefer less boxes than more money 🙂

    The Brick has covered what you need to think about before you decide on a platform .. for an example closer to my line of work we run Oracle RAC on Linux because experience has shown it gives us the best uptime and performance – but if a customer comes to me and says it has to be on Windows for support of their app (for whatever reason) we’ll run it on Windows.

    Another, less fun option of course, is to outsource your cluster requirements to someone who already has the infrastructure, might be better if it’s occasional demand.

    Be interesting to know what comes of it though.

    buzz-lightyear
    Free Member

    I always wanted to build a Beowulf out of interest, but will probably never get around to it. Good thread.

    bristolbiker
    Free Member

    News from Simulia – the latest version of Abaqus supports clusters directly using internal parallelisation of the code and the free MS HPC MPI pack, which gets installed by default when Abaqus is installed on a 64bit systems. I ‘just’ have to turn on and set the MPI parameters for our machines at the master node and it should rock-and-roll (subject to all the usual infrastructure limitations and software frustrations).

    That gets me up and running for next to no effort – the others in the software deveopment team can go an poke it! 😉

    Thanks all!

    Cougar
    Full Member

    Ooof.

    Keep us posted?

    toys19
    Free Member

    Happy days. Wish I could afford Abaqus.

    bristolbiker
    Free Member

    Wish I could afford Abaqus

    This is now the ‘biggy’. The cost of extra tokens to make use of, say, another 8 cpus is going to be pretty big, even compared to the software costs of going HPC Server route….. hence I imagine this is why a basic clustering capability has implimented direct as no-one would bother if they had to buy the token AND additonal software to run the cluster. Toys – what are you using?

    When I have a minute (which won’t be anytime soon) I’ll give it a go, just spreading the tokens we have between two machines to check it works.

    toys19
    Free Member

    My company owns a licence of Algor multiphysics (stopped paying sus in jan 2009) and I try to use Femap NX Nastran (I can get access from a company who I rent a desk off every now and then when algor won’t or cant solve particular probs). They are as good as each other in their own ways.. It’s all about usability now, as in how fast can I set up and run a job.

    Algor doesn’t have the token system, you own a license you can do what you want on a single machine (including running the software in multiple instances at the right time) . It runs well on multi processor machines so I have a dual processor quad core machine (thats 8 cores) with 32gb of ram and a solid state hard drives in striped raid so its pretty pokey..

    To do cluster solving they make you have an extra licence which is about 50% of the purchase price and since Algor got taken over by autodesk they have lost my custom.. But I have been told that you can get windows to take the threads and run them across clusters without the software knowing, that is what I would love to understand/execute..

Viewing 25 posts - 1 through 25 (of 25 total)

The topic ‘Waaaay OT: Talk to me about cluster computing’ is closed to new replies.