What are Scripting Languages? There's no hard distinction between the `compiled' and `scripting' languages themselves: any formal representation of commands ought to be able to be compiled into an executable file (`machine code' to run directly on the processor) or to be interpreted by a further program every time it is run.
There are, however, some strong relations between particular languages, their nature (types of variables and functions, memory management etc.), and the available compilers/interpreters, that allow a strong division into several categories of languages. Shell scripts really are interpreted, line by line, by the shell program; Perl and Python scripts are also generally written then run directly in an interpreter, but the interpreter first goes through the whole program and converts it to a more efficient executable form; C, C++ and Fortran (`compiled languages') are written, compiled into and executable of machine code, then this executable is run directly by the user. Several numerical computation environments, e.g. matlab, octave and scilab (see the technical programs page) count as a form of scripting (interpreted) language.
Generally, a compiled language has better performance (speed, memory) than an interpreted language. For many purposes other than long simulations, this is of little importance; interpreted languages generally have easier, higher-level functions, easier debugging, and take care of memory management automatically. The time thereby saved by the initial author and all subsequent people who try to modify the program, dwarfs the possible runtime disadvantage. Even for demanding tasks, a compiled language might be used for a few important parts of the work, with a scripting language to bind these parts together with less critical code.
The standard, old Unix shell (command interpreter) is the `Bourne shell'; this is the command `/bin/sh'. The Korn shell `ksh' (early 1980s) and then the GNU `bash' (Bourne again shell), are popular modern shells compatible with the Bourne shell language but with some additions.
The obvious purpose of the command interpreter is to allow the user to start programs; the Bourne shell also allows connection of programs' inputs and outputs (pipelines), redirection of keyboard/screen input/output to be from/to files instead, and provides programming functions such as loops, conditionals and arithmetic (only integers in sh and bash). The syntax, with spaces used to separate commands and arguments (hence the need to quote filenames that contain spaces) is designed for easy interactive use for simple commands; when dealing with scripts that must tolerate bizarrely named files, the quoting can require a little thought...
By putting a list of shell commands into a file then running this file in the shell, a script has been made. This could be run in the Bourne shell by the command `sh scriptname'. Or, to use the script as though it were a normal program, put the line `#!/bin/sh' (or `#!/bin/bash' if only using systems with GNU commands and wanting the exta bash functionality) at the top of the script, set it as executable by `chmod +x scriptname', then run it as a program (either `./scriptname' from the current directory, or put it in a directory on the search path, e.g. `~/bin/', then run just the command name `scriptname'). Many useful commands used in shell are not `builtins' (like `for' loops or `echo'), but are further programs, e.g. grep, sed, fold, cut etc. Partly for this reason, it is quite hard to write a really portable shell script of any complexity, that will run on a wide variety of ages and types of Unix system; there is high risk of certain commands differing in the availability of advanced features. Since all our systems use the GNU shell and basic commands (and of similar versions too) this should be no problem to us.
A knowledge of shell basics, both for interactive commands and for scripting, can be very useful for almost any work: renaming multiple files, running a batch of simulations locally or remotely (with ssh), generating report templates for lab results, resizing pictures, system administration, etc. The command-line recipes examples page is a form of quick introduction. There are plenty of tutorials available on the web, e.g. linked from Wikipedia's Bourne Shell page or by a web search. For a more rigourous, less example-based introduction, try the (long) manual page `man bash', or the (multi-page) GNU info page `info bash'. A similarly rigourous treatment is given in a standard on the web.
The Wikipedia Comparison of computer shells lists a remarkably large number of shells' features.
The Unix (in our case, GNU) shell is good at tasks of joining programs together to do a job. Its disadvantages show in various ways, including:
Many other interpreted languages exist, some very famous and widely used, that avoid these problems and allow a wealth of powerful high-level programming constructs to simplify data processing, searching, report-generation, network programming, binding technical programs and libraries together, and so on. These generally have many available add-on libraries. A few major, general purpose languages are listed below. Specific ones, such as those for algebra, are on the technical programs page.
Perl has been around for some 20 years. It could be considered as a distillation of features of many Unix languages, and others too; there are a lot of shell and C constructs. In accordance with the principle ` TIMTOWTDI', Perl syntax permits many different styles to be used. Perl was initially made for fast searching and report generation on system logfiles (to avoid the slowness and other limitations of shell, as described above), but it is very wide in its scope and has even been used for calculations in simulations. It is widely used for the `CGI' scripts that run on webservers to generate dynamic pages. There are thousands of modules for such tasks as interpreting different file-formats from other programs, creating and editing pdf files, and even for dealing with complex numbers and other maths. `Object oriented' style is supported. This introduction to Perl is one of many.
Python is a little younger than Perl, and is more based on conventional programming languages (rather than scripting languages). A stark contrast is that rather than encouraging as many programming styles as can be managed without ambiguity, it aims to force there to be as little flexibility as possible with how to write, even down to the indentation! The touted advantage is that collaboration and changing of old code are easier. It is very much a `glue language' for tying together many others. SciPy (Scientific Python) and NumPy are libraries that facilitate the use of Python programs directly in scientific computation. Matplotlib provides a plotting library with matlab-like syntax. Learning Python looks a promising place to start, but examples on the SciPy site are perhaps more directly relevant to us.
Tcl and Ruby are just a few others among the many. Python and Perl are likely to be of more interest to the technical user, as are some of the interpreted or compiled languages covered under technical programs.
No even vaguely unix-oriented work would be complete without some mention of these! Regular expressions, also known as REs or regexps, are a sort of mini-language for describing patterns in text. They are incredibly useful in all sorts of ways, for rearranging formats of text, extracting particular elements, and searching.
What is generally meant nowadays by a regular expression is an `extended' or even a `Perl-compatible' regexp (PCRE). A PCRE is a superset of all the functionality of the others -- bear in mind that the early REs were in the 1960s, and the art, need, and computing resources are very different now!
Perl (of course), Python, many other programs, and even Matlab (taken from Perl) can parse Perl compatible regular expressions. The GNU utilities such as the commands `grep' and `sed' use basic regexps by default (for compatibility) but have options to enable extended regexps. Other programs such as the pager `less' (used to display the output of manual pages, etc. to the screen) and the editor `vi' can search for basic regexps if one types a slash `/' followed by the regexp, or can perform substitutions with a command starting `:%s'.
A python site gives an
introduction to regular expressions -- again, one of many! There
are plenty of introductions on the web; regexps are popular for website
programmers, to allow pages to tell the user that `your name must have
at least two letters', or `email addresses must be in the form ....'.
Page started: 2007-11-xx
Last change: 2008-01-30