Since 1970, unix systems have evolved their set of commands, some taken from other systems of the time, and some new. In spite of moderate variations in syntax there are many commands that can be expected to exist and to function in much the same way on any of many unix systems (freebsd, openbsd, netbsd, linux, solaris, hp-ux, aix etc.). Indeed, the IEEE "POSIX" (portable operating system) standard defines quite a few, clearly having tried to standardise unix practice. In a linux based system, and in many others, the GNU utilities are used for the common commands.
Familiarity with this shell language and the basic system commands, as well as some more job-specific commands, can be very useful if you are going do anything that might need to be automated. As well as the following sections, you could try searching on google or similar, with such terms as `unix command' or `linux command' (i.e. GNU command) for a `cheat-sheet' or tutorial.
To get more information about a command, there are various things you can do:
The `man' (manual) system is the traditional unix way. Most commands and many configuration files and C functions have their own pages. Use the space-bar to move down, or Ctrl-D and Ctrl-B for down and up half pages --- page up and page down keys also work on most systems. Just press q to exit the page. Searching can be done by a slash followed by a basic regular expression (so just /string to search for string). The manual files are divided into sections, e.g. 1 is user-commands, 2 is system calls (C functions), 3 is standard C library functions, 5 is configuration file syntax, etc., so `man 3 realpath' is used to look up the C function called `realpath' rather than the user command of the same name. Not all commands provide manual pages. See the recipes below for fun ways to format manual pages for pretty printing.
The `info' system is the GNU one, so it provides pages for all the core commands. It uses links (denoted by stars) to split the information by menus. If you type "info" by itself, the top level will be displayed, allowing you to see all the GNU commands and what they do. Note that there exist many other non-GNU commands that won't show up here. The GNU commands have manual pages too, but these sometimes claim to be only a subset of the info content. The virtue of info is most seen when dealing with complex programs, for example the shell `bash'. This comes out as >110 pages when printed on A4 in 10pt text, so having it all as one long manual page is not very helpful for quickly finding the right part (though the pattern search is often helpful)!
Some programs thrown into the system may not have any manual or info pages. There may be text or html documentation installed under /usr/share/doc/<programname-version>. Often, a command provides some help information if called with a command-line argument of `-h' or `--help', e.g. `acroread -h' or `ls --help'.
A program (command, GUI thing, ...) can be started with an array of arguments; when a command is run in the shell, the shell splits the command line on white-space, and passes each part of it as an argument to the command whose name is first on the line. Some arguments are some general sort of data, e.g. a filename; others are options that affect the behaviour of the command or state what the following argument denotes. Options may be traditional single letters, usually prefixed with a hyphen to distinguish them from other arguments. Several option-letters may be used together. Long forms of options use full words, usually prefixed with a double hyphen (the GNU way). A few commands have only long options, and use just a single hyphen. Examples: `ls -l -r /tmp' is the same as `ls -lr /tmp', and `ls -Q /usr' is the same as `ls --quote-word /usr'.
Descriptions of command usage, e.g. in the manual pages, traditionally uses [] around optional fields, <> around obligatory fields, and | to denote alternatives. For example, command [-a|-b] <arg1> <arg2> [arg3] ... suggests that either of the -a and -b options can be given, then two required arguments, then any (integer, non-negative) number of additional arguments.
See the man or info pages for more details of them.
Bear in mind that there are many commands available; pressing Tab a few times quickly will eventually cause a question of whether the 4000 or so on the system should be listed on the screen! Of course, some of these are GUI programs not suitable for scripting, although it may sometimes be useful to run a GUI program in a loop. Tab causes autocompletion of commands and filename arguments. One press completes if there's no ambiguity, and two quickly cause a list of possible matches to be displayed.
The following may not be immediately useful, but they should give a good introduction to some shell syntax, command and option existence, and general idiom.
The basics of shell are that the command line is split
into sub-parts separated by |, & , ||, & and ; symbols, then these
parts are split into space separated command and arguments. | connects
the preceding command's `standard output' stream to the `standard input'
of the following command. ; just runs the commands separately. &
runs the preceding command in the background and goes straight on to the
following one. && runs the following command iff the preceding
command returned a successful status code when it finished. || runs
the following command iff the preceding one returned an unsuccessful code
when it finished.
The redirection operators > and < redirect a
command's standard output or input (resp.) to or from files. Many commands
will read either from standard input or a file if one is named as an
argument. Some want files to be named, but will accept a hyphen as meaning
that the file for reading/writing is standard input/output.
Shell variables
are accessed by prefixing the variable name with a dollar symbol, else
the name would be treated as text.
Quoting "..." prevents spaces breaking
the line into arguments, but variables are still replaced by their
contents. Quoting '...' prevents any special interpretation of the
quoted string.
Backtick substitution (command substitution) runs the
command in the backticks then replaces the whole backtick structure
with the standard output of that command; $((command)) is equivalent
to `command`.
Escaping with a backslash makes the following character
be treated as literal, so \$name prevents the dollar making name be
interpreted as a variable, or \\$name causes the second backslash to
be treated as literal and therefore the $name is interpreted as
a variable...
Filename `globs' are primitive pattern matching codes
for `wildcarding' filenames; e.g. *.jpg is anything ending in .jpg,
name.??? is `name.' followed by three of any character, file[0-5].txt
is any of file0.txt through to file5.txt, and f[^a1].jpg is f followed
by any one character that is not a or 1, then by `.jpg' (the ^ symbol
logically negates the [...] matching).
Show a command's built-in help, but show it one screenful at a time
with the pager program `less':
psbind --help | less
Run an executable file called `simulation' in the current directory;
when it finishes, mail all its standard output to name@place with a subject line
containing the hostname. In the second example, make standard error also
go into the pipe instead of to the screen, and log to a file as well as mailing.
./simulation | mail -s "`hostname` finished" name@place
./simulation 2>&1 | tee sim.log | mail -s "`hostname` finished" name@place
Find all files below /local/run whose names end with _results.txt; extract all
lines starting with `V=' (permitting spaces around the `=' or between the start
of the line and V), then sort these lines numerically and display to the screen
(because there's nothing to catch the output of sort).
The * in the input filename pattern needs to be escaped so the `find' will see it;
if there's a file matching this pattern in the current directory, the shell would
otherwise expand the pattern to the matching name[s] before calling find, if the
pattern were not protected by escaping or quoting. -exec runs the given command
on each found file, replacing "{}" with the name. The final \; is a ; for the
find command (not the shell) to see, hence the escaping of it. `sort' uses = rather
than non-space to space transition as the field separator.
The second line does the
same thing; it quotes the whole *_results.txt pattern, and uses xargs to put each
line of standard input (a filename from find) as an argument to grep.
find /local/run -name \*_results.txt -exec grep '^ *V *= *' "{}" \; | sort -f= -n
find /local/run -name '*_results.txt' | xargs grep '^ *V *= *' | sort -f= -n
Convert a manual page (for the `ls' command) to plain text.
man ls | col -b >outfile.txt
Convert a manual page (for the `ps' command) to a postscript file (first line)
or pdf file (second line), or print it directly (third line).
man -t ps >file.ps
man -t ps | ps2pdf - >file.pdf
man -t ps | lpr
Convert all files with names ending in `.bmp' (assumed to be images) to
half-sized images with the same base-name but of name and type .jpg.
This is thanks to a component of the handy
ImageMagick suite.
for f in *.bmp ; do convert -resize 50% "$f" "`basename "$f" .bmp`".jpg ; done
For each file (including directories and links) in the current directory,
rename it so the name has all non-letter characters replaced with underscores,
and multiple consecutive underscores are reduced to a single one. Instruct
`mv' to ask before overwriting, and to print to screen every operation.
for f in * ; do mv -iv "$f" "`echo "$f" | sed -e 's/[^a-zA-Z]/_/g' -e 's/__*/_/g'`"; done
Convert a pdf file to a postscript file, by Ghostscript or by Adobe's
acroread program:
pdf2ps -sPAPERSIZE=a4 infile.pdf outfile.ps
acroread -toPostScript -level3 -size a4 -pairs infile.pdf outfile.ps
Use psbind to convert postscript to 4 `pages per sheet' (4-up);
psbind scales the pages to fit as large as possible, which makes
4-up quite readable usually! A4 paper is explicitly requested for output.
psbind -N -n4 -pa4 infile.ps >outfile.ps
Use mpage to do 4-up conversion (without special scaling) and put a
duplex (2-sided) instruction in the file.
mpage -4 -t infile.ps >outfile.ps
Use psbind for scaled 2-up and mpage for duplex, then send to printer `bromberg'.
NOTE: several programs modifying the postscript seems to reduce the
probability of dodgy files actually printing... by `dodgy' is meant
the antithesis of a text-only output from Latex, e.g. scans, word-processor
output, etc. But it's usually ok.
psbind -N -n2 -pa4 infile.ps | mpage -o -t -1 -k | lpr -Pbromberg
Find any line in the local user-list that contains the
string :0:, then show just the fifth colon-separated field;
in the first case this goes just to a file, and in the
second case it is teed to the file and to screen. The second
example shows grep reading directly from the file rather than
by cat. The third is the same as the second (in the case of
grep), but done by redirection rather than naming the file to grep.
cat /etc/passwd | grep ':0:' | cut -d \: -f 5 > users
grep ':0:' /etc/passwd | cut -d \: -f 5 | tee users
grep ':0:' </etc/passwd | cut -d \: -f 5 | tee users
Set a variable with a base filename. Dump the audio part of an avi video
file of this name into a wav file of the same base name. Normalise the volume
of the wav file. Encode it into 128kbps ogg-vorbis, and iff this doesn't
report an error then remove the wav file. Note a backslash at the very end
of the oggenc line, to escape the newline; the last two commands are logically
part of the same line, else the >> wouldn't work as intended.
file=vidfilename
mplayer "$file.avi" -ao pcm:waveheader:file="$file.wav" -vc dummy -vo null
normalize "$file.wav"
oggenc -b 128 -o "$file.ogg" "$file.wav" \
&& rm "$file.wav"
List running processes in standard format; omit first line (the headers);
sort in reverse (ascending) order, numerically,
on space-separated field number 4 (%mem); take the last 5 lines; display
just the %mem, user and command fields...
ps aux | tail -n +2 | sort -n -k 4 | tail -n 5 |
awk '{ printf "%s\t%s\t%s\n", $4, $1, $11 }'
Know Your Unix
System Administrator has some rather fun examples of this sort of line: worth a look!
This was a handy way of reading 256*256*2+27 bytes of data from a file and throwing
this away (redirect into /dev/null), then reading the next 40 lots of 10 bytes and
displaying each group as a row of 10 characters on the screen, prefixed with the
group-number 1-40. Note use of the pipe into the parentheses (sub-shell) so that
after the first dd has thrown away 256*256*2+27 bytes the next command in the
sub-shell will read the next bytes. Shell arithmetic is done in the $((...)) construct.
`seq 1 40` generates the numbers from 1 to 40. dd does lots of things with copying
data from one place to another.
cat nt_tst_2.dat | ( dd bs=1 count=$((256*256*2+27)) >/dev/null ; for n in `seq 1 40` ; do printf "%3d: \"" $n ; dd bs=10 count=1 2>/dev/null ; echo "\"" ; done )
And so on ...
To add your own programs so they can be accessed directly by name,
put them in your home directory in a subdirectory called "bin".
Any scripting language can have executable files by putting
`#!INTERPRETERNAME' at the top of the file and setting the file
executable by `chmod +x filename'. For a GNU shell script, put
`#!/bin/bash' at the top; /bin/sh is the default for and executable
text file with no special line.
On this system, the users' bin directories, ~/bin, are at the
beginning of the PATH variable (the list of directories searched for
a program name), so users' commands will override system ones.
Page started: 2007-11-xx
Last change: 2007-11-14