Shared computers at Teknikringen 33 (ETK, EPS, E2C)

Purpose

For many years in the ETK division, we have run shared computation servers to allow people to run jobs on faster processors and more memory than their own computers had.

For several reasons the advantage of this is getting less now. It's now really cheap to get large amounts of memory and multi-core, fast, processors, and most programs still don't take advantage of more than one or a few processor cores at once. Therefore, a user who needs these a lot could be better just to order a good computer, which not long ago would have been unreasonably expensive.

However, we still see some advantage for users, in having:

so we carry on running these computers.

Speed, Parallel Computing, Supercomputers, etc.

Users generally want their problems to solve, and to do so quickly. That's why they consider our servers: either they haven't enough memory of their own, or they just want to speed things up.

As mentioned above, modern desktop and laptop computers are remarkably fast and cheap, so we can't easily provide computers that are much better. Some types of problem, however, can efficiently be run over tens or thousands of processors. We have at KTH a `parallel computing centre' PDC offering many multicore processors, tightly connected together.

BUT: NOTE! for most of the users at ETS a `parallel' facility is useless. That's because most of us run things in proprietary programs that can run on just one processor core at a time. Some programs now have multithreading, but this typically applies only to certain parts of the algorithm, and often gives total speedup of only something like 50% faster or up to a very few times faster, even when using several cores; it may give little or no speedup beyond 4 or 8 threads. So a typical modern desktop processor already has enough cores for optimum speed of most proprietary software packages.

In fact, a big parallel computer will generally use a large number of not-top-of-range processors, to get good value for money (the very fastest processors get a lot more expensive that slightly slower ones). For running the above types of single-threaded or few-threaded program quickly, one would be better to get a high-range processor with a few cores.

Serious users of parallel computing are likely to have their own code in C or Fortran, designed to take advantage of parallel processing. Not all types of calculation even can be efficiently parallel-processed. Note also that if one uses proprietary software, the licensing restrictions may well constrain it to a few threads or only one or a few total running instances.

The main case in which we can do lots of parallel work is when we want to run lots of independent calculations: for example, run a particular script/program with hundreds of different input parameters. This can be scripted so that it would run on all cores of all of our computers. Of course, this is most easily done if the programs can be run purely in a text terminal, not as graphical interfaces.

Note, also, that several widely-used proprietary programs have their own methods of running independent stages or parameter values on multiple cores or computers. Matlab has a parallel computing toolbox. Comsol allows parametric studies and can use multiple computers in parallel for some problems.

Status

Situation: 2012-06.

There is currently just one `official' shared computer. It's called simlin2 (you might need to call it simlin2.ets.kth.se when you connect). The operating system is ScientificLinux-6.2. It's a simple thing: 6-core AMD Phenon 3.2 GHz, 16 GB RAM. Some modern laptops could be faster for some single-threaded tasks! The main advantage would be being able to leave this running a long time, or making use of more cores.

There's also diagsim. This is older: it's got 8 Xeon cores 2.33 GHz, also with 16 GB RAM. It's system is an OLD installation of Gentoo Linux. It only supports a couple of simultaneous "nx" connections, but of course any number of ssh, scp, vnc, etc. The range of proprietary technical programs is exactly as for the others. There are many other programs. This computer is likely to be retired (removed) soon.

For the near future: after this summer, depending on money and demand, we'll add a simlin1, with more RAM (64GB?) and just a few cores (4?) but the fastest ones available. If there are users wanting to run many jobs at once, we'll add a simlin3 with the same details as simlin2.

A common desire for shared computers is users wanting to leave long jobs running somewhere, but they only have a laptop. We notice that people are more and more unfamiliar with anything other than a ms-windows graphical-interface. Some programs are only produced for these operating systems. Putting these points together, we see there may be an advantage in providing some ms-windows based shared computers, so that people can do simple jobs. This is an option for the computer that might be simlin3 (or simwin1). On the other hand, it's no bad idea for people to learn other systems!

Software

So far, we run only Linux. As mentioned above, we could consider a multi-user ms-windows system if there's demand. (Or, we could run these in virtualisation.)

The new standard, started in 2012, is to run Scientific Linux. This is basically RedHat Enterprise Linux, compiled by major laboratories (Fermilab,CERN) for use of themselves and others. We're replacing our Gentoo systems with this, to take advantage of the long support-time during which we can automatically update all our systems.

We make a quite thorough installation of the OS: many libraries, applications, compilers etc. Then we add a few more Free packages from other sources (Scilab, Octave). The KDE and Gnome desktops are both installed, along with some lighter, simpler ones. [Unfortunately KDE is only in version 4.3.x; the latest is 4.8, which is really beginning to feel polished and good (finally, like an improved KDE3 in many ways). I can't answer for Gnome.]

We also, under the /pkg directory, add many proprietary programs to which KTH has access. These include Matlab, Comsol, Maple, Mathematica, Labview , GAMS and others. Of course, as expected when one uses proprietary software, we sometimes hit problems with licences, expiry, licence-servers down, etc.

Connecting

The main way to connect is by SSH (secure shell). This is available from all internet addresses. Include the domain .ets.kth.se after the computer's name, e.g. simlin2.ets.kth.se. The three main `types' of SSH access available on our computers are:

Accessing files

On the servers you have a private directory (your "home directory") with a name like /home/username. This is stored on a fileserver (penguin). You therefore see the same files in your home directory from all servers. The home directory on penguin has nightly backups.

You can use SCP (see "File-copy access", above) from any internet address, to copy files to and from the server.

Within the network in our building, you can also access your home-directory directly from ms-windows by mapping a drive to \\penguin\username. It probably calls this "username on penguin". The username will be the same as for the servers; the password might not be. From ms-windows you probably need to give the username as EKC.KTH.SE\username. The advantage of mapping the home-directory as a drive is that you can transparently work with files from your own computer and the servers, instead of copying back and forth.

Local files. For large temporary files (not backed up), or files needing very quick access, you can use the directory /local on any server. These are not the same on each server: you'll only see the files on the one server that they're on. Please make your own subdirectory of /local to avoid an ugly spread of files! Please remove junk, and don't rely on the local disks for important things that aren't backed up.

Running programs

Graphical programs that are provided within Scientific Linux are likely to be visible through the desktop menus. Most of the proprietary programs, such as Matlab, are not visible in any menu. That's just because they don't provide desktop-files and icons to make this easy, and there's practically no point. The way to start programs, in general, is to give a command in a shell.

For our purposes, doing simple things in a graphical environment, the names "shell", "terminal", "command-line", "command-prompt", and "console", can all be assumed to mean similar things: a window that displays simple text and lets you type commands and see the answers.

In the main available desktops, there should be in the taskbar a black icon supposed to look like a computer screen. Otherwise, you can try to find something in the menu, or do "run command" (commonly Alt-F2) and run one of the common terminals such as konsole or xterm.

When you've got your shell prompt, you can try a few simple commands: simple examples (take one at a time, and press enter after typing it) are: whoami, date, ls, ls -l, who.

To start a program, start to type its name. Pressing the Tab key, after writing a few letters of the name, will try to "autocomplete" the command. In many cases there are several versions available: you'll then see a list of all, e.g. matlab-7.12 matlab-7.13 etc. If you just type the name, e.g. matlab, you'll get whatever we consider the latest good version. You'll find that many programs can be started with options that change their behaviour. For example, matlab -nodesktop -nosplash will start a matlab session within your shell, without starting the matlab desktop interface (and not the startup splash-screen either). Comsol has options to choose the number of processors to use, set which libraries to use for computation, and more.

Other sorts of access

There are other ways of using the servers remotely. There are several ways even of getting remote graphical interfaces: it's not just NX (although that's what we recommend if you are new to this).

One is plain X-tunnelling. If you connect from a unixish system, just connect by ssh with the options -X and -Y (e.g. ssh -XY user@host and it should automatically tunnel to your own computer any graphical interface that you start: so if you then type the command xeyes, you'll see the tradtional old eyes following your mouse ... more usefully, you could run bigger things this way. It's simple and quick, but not good on a slow connection, and the programs can't keep running if your connection goes down.

Another is vnc. Using the command vncserver on the server, you start a program that runs a desktop and makes it available for connections from the vncviewer on your computer. Within our network you can connect directly to any vnc server that you run on the servers. From outside, the ports will be blocked and you'd have to use an ssh tunnel (the unix vncviewers will do this for you if you just include the option -via username@host).

You might not need any graphical interface on the server. For example, Comsol has a "server" mode where it can be started from a shell and listens on some port for network connections from a Comsol client program. The client, on your computer, can then be the graphical interface to the computation part on a server.  

Keyboard layouts (particularly in NX)

The initial keyboard layout is taken from your own computer's display. If this is not sent/recognised, the server's default (Swedish: "se") is used.

To change the setting in the remote session (running on simlinX) you can use the command setxkbmap (set X keyboard map) from a shell, telling it the desired two-letter code, e.g. "se" (Swedish), "fr", "de", "gb", "us", etc. Thus: setxkbmap us for English[US] layout. Or, in KDE or Gnome desktops you can set up a little icon to switch between layouts. For KDE it's under the KDE menu, Settings, System Settings, Regional and Language, keyboard layout.

More?

See the Shared Computers index page for more details and links, from the older pages about our computers.


Page started: 2012-06-28
Last change: 2012-07-11
Contact: Nathaniel Taylor @ee.kth.se