A quick-start of commands

The command-line and scripts: potentially time-saving

Since 1970, unix systems have evolved their set of commands, some taken from other systems of the time, and some new. In spite of moderate variations in syntax there are many commands that can be expected to exist and to function in much the same way on any of many unix systems (freebsd, openbsd, netbsd, linux, solaris, hp-ux, aix etc.). Indeed, the IEEE "POSIX" (portable operating system) standard defines quite a few, clearly having tried to standardise unix practice. In a linux based system, and in many others, the GNU utilities are used for the common commands.

Familiarity with this shell language and the basic system commands, as well as some more job-specific commands, can be very useful if you are going do anything that might need to be automated. As well as the following sections, you could try searching on google or similar, with such terms as `unix command' or `linux command' (i.e. GNU command) for a `cheat-sheet' or tutorial.

Getting help information on commands

To get more information about a command, there are various things you can do:

Manual pages

The `man' (manual) system is the traditional unix way. Most commands and many configuration files and C functions have their own pages. Use the space-bar to move down, or Ctrl-D and Ctrl-B for down and up half pages --- page up and page down keys also work on most systems. Just press q to exit the page. Searching can be done by a slash followed by a basic regular expression (so just /string to search for string). The manual files are divided into sections, e.g. 1 is user-commands, 2 is system calls (C functions), 3 is standard C library functions, 5 is configuration file syntax, etc., so `man 3 realpath' is used to look up the C function called `realpath' rather than the user command of the same name. Not all commands provide manual pages. See the recipes below for fun ways to format manual pages for pretty printing.

Info pages

The `info' system is the GNU one, so it provides pages for all the core commands. It uses links (denoted by stars) to split the information by menus. If you type "info" by itself, the top level will be displayed, allowing you to see all the GNU commands and what they do. Note that there exist many other non-GNU commands that won't show up here. The GNU commands have manual pages too, but these sometimes claim to be only a subset of the info content. The virtue of info is most seen when dealing with complex programs, for example the shell `bash'. This comes out as >110 pages when printed on A4 in 10pt text, so having it all as one long manual page is not very helpful for quickly finding the right part (though the pattern search is often helpful)!

Commands' other help

Some programs thrown into the system may not have any manual or info pages. There may be text or html documentation installed under /usr/share/doc/<programname-version>. Often, a command provides some help information if called with a command-line argument of `-h' or `--help', e.g. `acroread -h' or `ls --help'.

Command and documentation conventions

A program (command, GUI thing, ...) can be started with an array of arguments; when a command is run in the shell, the shell splits the command line on white-space, and passes each part of it as an argument to the command whose name is first on the line. Some arguments are some general sort of data, e.g. a filename; others are options that affect the behaviour of the command or state what the following argument denotes. Options may be traditional single letters, usually prefixed with a hyphen to distinguish them from other arguments. Several option-letters may be used together. Long forms of options use full words, usually prefixed with a double hyphen (the GNU way). A few commands have only long options, and use just a single hyphen. Examples: `ls -l -r /tmp' is the same as `ls -lr /tmp', and `ls -Q /usr' is the same as `ls --quote-word /usr'.

Descriptions of command usage, e.g. in the manual pages, traditionally uses [] around optional fields, <> around obligatory fields, and | to denote alternatives. For example, command [-a|-b] <arg1> <arg2> [arg3] ... suggests that either of the -a and -b options can be given, then two required arguments, then any (integer, non-negative) number of additional arguments.

Some basic commands

See the man or info pages for more details of them.

cp copy a file
cp -R copy recursively (sub-files too)
cp -a copy for archive -- preserve original dates etc.
mv move a file or directory
rm remove a file
rm -r recursively remove directory and contents
rm -rf as above, but without asking for confirmation...
mkdir make a new directory (name must be given)
rmdir remove a (empty) directory
ln -s ln -s 'dest' 'link' makes a `symbolic link' called link, to dest
find find files of certain name, age, owner, etc.
ps list running processes on the current terminal
ps auxf list all processes, with user name and tree-structure
kill kill process by number (found from ps)
killall kill all processes with a certain name, e e.g. killall -9 MATLAB
env see the environment variables of your shell
tar (short for "tape archive") -- make an archive
gzip compress a file
gunzip decompress a file
vi edit a text file (Esc then :q to exit, Esc then :wq to save and exit)
nano edit a text file (the traditional version is "pico")
passwd change your password (it's fussy about what you choose)
df fullness of disk partitions
du space taken on disk by files
free details of memory usage
quota details of disk quota for the current user
grep search for matches of a regular expression in each line
sed do text-editing operations automatically on a file or stream
awk a more general programming-language-like interpreter, often used for text processing
at run a command at a specified date/time, once
batch run commands when system load is low (special case of at)
crontab edit regular automatic jobs

Bear in mind that there are many commands available; pressing Tab a few times quickly will eventually cause a question of whether the 4000 or so on the system should be listed on the screen! Of course, some of these are GUI programs not suitable for scripting, although it may sometimes be useful to run a GUI program in a loop. Tab causes autocompletion of commands and filename arguments. One press completes if there's no ambiguity, and two quickly cause a list of possible matches to be displayed.

Shell command recipes: a quick taste

The following may not be immediately useful, but they should give a good introduction to some shell syntax, command and option existence, and general idiom.

The basics of shell are that the command line is split into sub-parts separated by |, & , ||, & and ; symbols, then these parts are split into space separated command and arguments. | connects the preceding command's `standard output' stream to the `standard input' of the following command. ; just runs the commands separately. & runs the preceding command in the background and goes straight on to the following one. && runs the following command iff the preceding command returned a successful status code when it finished. || runs the following command iff the preceding one returned an unsuccessful code when it finished.
The redirection operators > and < redirect a command's standard output or input (resp.) to or from files. Many commands will read either from standard input or a file if one is named as an argument. Some want files to be named, but will accept a hyphen as meaning that the file for reading/writing is standard input/output.
Shell variables are accessed by prefixing the variable name with a dollar symbol, else the name would be treated as text.
Quoting "..." prevents spaces breaking the line into arguments, but variables are still replaced by their contents. Quoting '...' prevents any special interpretation of the quoted string.
Backtick substitution (command substitution) runs the command in the backticks then replaces the whole backtick structure with the standard output of that command; $((command)) is equivalent to `command`.
Escaping with a backslash makes the following character be treated as literal, so \$name prevents the dollar making name be interpreted as a variable, or \\$name causes the second backslash to be treated as literal and therefore the $name is interpreted as a variable...
Filename `globs' are primitive pattern matching codes for `wildcarding' filenames; e.g. *.jpg is anything ending in .jpg, name.??? is `name.' followed by three of any character, file[0-5].txt is any of file0.txt through to file5.txt, and f[^a1].jpg is f followed by any one character that is not a or 1, then by `.jpg' (the ^ symbol logically negates the [...] matching).

Show a command's built-in help, but show it one screenful at a time with the pager program `less':
psbind --help | less

Run an executable file called `simulation' in the current directory; when it finishes, mail all its standard output to name@place with a subject line containing the hostname. In the second example, make standard error also go into the pipe instead of to the screen, and log to a file as well as mailing.
./simulation | mail -s "`hostname` finished" name@place ./simulation 2>&1 | tee sim.log | mail -s "`hostname` finished" name@place

Find all files below /local/run whose names end with _results.txt; extract all lines starting with `V=' (permitting spaces around the `=' or between the start of the line and V), then sort these lines numerically and display to the screen (because there's nothing to catch the output of sort). The * in the input filename pattern needs to be escaped so the `find' will see it; if there's a file matching this pattern in the current directory, the shell would otherwise expand the pattern to the matching name[s] before calling find, if the pattern were not protected by escaping or quoting. -exec runs the given command on each found file, replacing "{}" with the name. The final \; is a ; for the find command (not the shell) to see, hence the escaping of it. `sort' uses = rather than non-space to space transition as the field separator. The second line does the same thing; it quotes the whole *_results.txt pattern, and uses xargs to put each line of standard input (a filename from find) as an argument to grep.
find /local/run -name \*_results.txt -exec grep '^ *V *= *' "{}" \; | sort -f= -n find /local/run -name '*_results.txt' | xargs grep '^ *V *= *' | sort -f= -n

Convert a manual page (for the `ls' command) to plain text.
man ls | col -b >outfile.txt

Convert a manual page (for the `ps' command) to a postscript file (first line) or pdf file (second line), or print it directly (third line).
man -t ps >file.ps man -t ps | ps2pdf - >file.pdf man -t ps | lpr

Convert all files with names ending in `.bmp' (assumed to be images) to half-sized images with the same base-name but of name and type .jpg. This is thanks to a component of the handy ImageMagick suite.
for f in *.bmp ; do convert -resize 50% "$f" "`basename "$f" .bmp`".jpg ; done

For each file (including directories and links) in the current directory, rename it so the name has all non-letter characters replaced with underscores, and multiple consecutive underscores are reduced to a single one. Instruct `mv' to ask before overwriting, and to print to screen every operation.
for f in * ; do mv -iv "$f" "`echo "$f" | sed -e 's/[^a-zA-Z]/_/g' -e 's/__*/_/g'`"; done

Convert a pdf file to a postscript file, by Ghostscript or by Adobe's acroread program:
pdf2ps -sPAPERSIZE=a4 infile.pdf outfile.ps acroread -toPostScript -level3 -size a4 -pairs infile.pdf outfile.ps

Use psbind to convert postscript to 4 `pages per sheet' (4-up); psbind scales the pages to fit as large as possible, which makes 4-up quite readable usually! A4 paper is explicitly requested for output.
psbind -N -n4 -pa4 infile.ps >outfile.ps

Use mpage to do 4-up conversion (without special scaling) and put a duplex (2-sided) instruction in the file.
mpage -4 -t infile.ps >outfile.ps

Use psbind for scaled 2-up and mpage for duplex, then send to printer `bromberg'. NOTE: several programs modifying the postscript seems to reduce the probability of dodgy files actually printing... by `dodgy' is meant the antithesis of a text-only output from Latex, e.g. scans, word-processor output, etc. But it's usually ok.
psbind -N -n2 -pa4 infile.ps | mpage -o -t -1 -k | lpr -Pbromberg

Find any line in the local user-list that contains the string :0:, then show just the fifth colon-separated field; in the first case this goes just to a file, and in the second case it is teed to the file and to screen. The second example shows grep reading directly from the file rather than by cat. The third is the same as the second (in the case of grep), but done by redirection rather than naming the file to grep.
cat /etc/passwd | grep ':0:' | cut -d \: -f 5 > users grep ':0:' /etc/passwd | cut -d \: -f 5 | tee users grep ':0:' </etc/passwd | cut -d \: -f 5 | tee users

Set a variable with a base filename. Dump the audio part of an avi video file of this name into a wav file of the same base name. Normalise the volume of the wav file. Encode it into 128kbps ogg-vorbis, and iff this doesn't report an error then remove the wav file. Note a backslash at the very end of the oggenc line, to escape the newline; the last two commands are logically part of the same line, else the >> wouldn't work as intended.
file=vidfilename mplayer "$file.avi" -ao pcm:waveheader:file="$file.wav" -vc dummy -vo null normalize "$file.wav" oggenc -b 128 -o "$file.ogg" "$file.wav" \ && rm "$file.wav"

List running processes in standard format; omit first line (the headers); sort in reverse (ascending) order, numerically, on space-separated field number 4 (%mem); take the last 5 lines; display just the %mem, user and command fields...
ps aux | tail -n +2 | sort -n -k 4 | tail -n 5 | awk '{ printf "%s\t%s\t%s\n", $4, $1, $11 }' Know Your Unix System Administrator has some rather fun examples of this sort of line: worth a look!

This was a handy way of reading 256*256*2+27 bytes of data from a file and throwing this away (redirect into /dev/null), then reading the next 40 lots of 10 bytes and displaying each group as a row of 10 characters on the screen, prefixed with the group-number 1-40. Note use of the pipe into the parentheses (sub-shell) so that after the first dd has thrown away 256*256*2+27 bytes the next command in the sub-shell will read the next bytes. Shell arithmetic is done in the $((...)) construct. `seq 1 40` generates the numbers from 1 to 40. dd does lots of things with copying data from one place to another.
cat nt_tst_2.dat | ( dd bs=1 count=$((256*256*2+27)) >/dev/null ; for n in `seq 1 40` ; do printf "%3d: \"" $n ; dd bs=10 count=1 2>/dev/null ; echo "\"" ; done )

And so on ... To add your own programs so they can be accessed directly by name, put them in your home directory in a subdirectory called "bin". Any scripting language can have executable files by putting `#!INTERPRETERNAME' at the top of the file and setting the file executable by `chmod +x filename'. For a GNU shell script, put `#!/bin/bash' at the top; /bin/sh is the default for and executable text file with no special line. On this system, the users' bin directories, ~/bin, are at the beginning of the PATH variable (the list of directories searched for a program name), so users' commands will override system ones.

Page started: 2007-11-xx
Last change: 2007-11-14