Lab image compression

Images from general-purpose digital cameras come out as already compressed jpeg images, but without very tight compression: that is, they could be rather smaller without loss of information, but processing power on the camera is too limited to be wasted on trying cleverer compression. Nevertheless, such images are already several or many times smaller than the uncompressed version: for modern cameras this would typically be around (2400*1800)pixels * 3 bytes ~= 13MB rather than the 0.5--2.0 MB that the output images usually are.

It seems our microscope's camera avoids any compression at all, perhaps because jpeg (common for photos) is in general lossy, so the pixels are not perfectly preserved. This camera stores 2080x1540 pixel images in the `tiff' format, without using any even of the lossless compression methods available within tiff: each image is therefore around 9MB even if just showing a blank background. One possible reason for not doing compression could be that it might take longer to save and open (depending on CPU speed versus disk speed). Another reason could be that different programs support different subsets of tiff features, so compression perhaps makes compatibility worse.

If we are to do a lot of lab work, including often-taken sequences of about 50 photos during aging, this ~(1/100)GB per picture will produce about 0.5GB per sequence. Compare this to about 8GB for our networked home directories, 160GB for typical local disks at ETK, or 500GB for quite large internal or external hard disks.

There's no big problem with such a lot of data: 500GB disks are about 1 SEK/GB just now, and bigger ones are getting cheaper too. But there is some reduction in convenience when one's data is spread between several disks and when it takes a long time to make backup copies. Lossless compression is one option that `costs' only in processor time. Else, if the image filesize can be lossily compressed without large loss of quality, it might be worth storing compressed images for regular analysis, and `original' backups on several extra disks or on regularly stored DVDs.

Jargon:  Lossless compression (typically of text, documents, lab data, programs) means reproducing perfectly the original, reducing filesize by representing the information more efficiently (take advantage of redundant information). Lossy compression (typically of image/audio/video) means also making some approximations perhaps based on `near correct' and taking account of limits of human perception. The treeing images have a background that doesn't matter much if it isn't represented perfectly, but the edges between tree and background should presumably be kept very similar by an acceptable compression algorithm.

Results of compression final-size / cpu-time

Below is some investigation of the compressibility of a sample microscope image of an electrical tree. The image is saved in tiff format with several inbuilt compression methods, then the tiff image is compressed with common data-compression programs at several levels of speed/size trade-off, then the tiff image is converted to png and varied-quality [lossy] jpeg images.

Format and qualitysize [kiB]time [s]
[original] 94040
tiff_none 94040.02
tiff_packbits 94760.11
tiff_lzw 82480.32
tiff_zip 64321.01
tiff_jpeg 1720.24
bzip2_qual9 48563.08
bzip2_qual4 50562.71
bzip2_qual1 55002.52
gzip_qual9 63800.95
gzip_qual4 63920.92
gzip_qual1 65000.74
png 37445.94
jpeg_qual100 37441.12
jpeg_qual90 6200.58
jpeg_qual80 1800.38
jpeg_qual70 1120.33
jpeg_qual60 760.32
jpeg_qual50 560.31
jpeg_qual40 440.31
jpeg_qual30 360.31
jpeg_qual20 320.34
jpeg_qual10 240.30

Conclusions:

Suggestion: to enable saving all produced images easily on a sensibly sized disk, jpeg at about 85% quality seems quite acceptable. Perhaps the originals are still wanted, either as the `important' subset or as all of them on regular DVDs or a few removable hard-disks ... not my business to decide!

Cropping

As a lot of the image is away from the tree area and not of interest, it can be removed, allowing more reduction of size. If automated, this is very easy, e.g. for the example picture,

convert -quality 85 -crop 1000x1050+820+250 a.tif a_crop_jpeg_qual85.jpg
gives 100kiB filesize for a_crop_jpeg_qual85.jpg rather than 250kiB without the cropping. The above command can of course be run in a loop for a whole sequence or sequences with similar crop borders. The 1000x1050+820+250 argument is a 1000,1050 (x,y) rectangle, starting at offset 820,250.

Other automated tricks

The existence of command-oriented image manipulation allows some other tricks. I'm using mainly ImageMagick commands wrapped up in Unix shell, but one can try such things with ImageMagick in a DOS shell, or called from Matlab/Octave, or in the Matlab `image-processing toolbox', or from the GIMP, etc.

Montages (multiple images)

To put a sequence of images together `in space', the `montage' command from ImageMagick can be used:

montage  -mode concatenate  -tile 4x5  tmp_*.jpg  montage_eg.jpg
which gives a single image montage_eg.jpg with all the 20 input images tiled. (This example one has also been reduced to a sensible size; it is from transformations of the original image, according to amusing ideas found on this page, e.g. the `-charcoal 10' option to the `convert' command.)

Videos (multiple images)

To put a sequence of images together `in time', the `mencoder' command from mplayer can be used to assemble jpeg images into an arbitrary framerate container:

mencoder "mf://tmp_*.jpg" -mf fps=2 -o video_eg.avi -ovc lavc -lavcopts vcodec=msmpeg4v2:vbitrate=80
where the `fps=' option sets the frames per second, `vbitrate=' sets the bitrate, and `tmp_*.jpg' defines the pattern to match in the input filenames. The output file is video_eg.avi. (The input files to this example were obtained by running the `convert -charcoal N' command on the photo, with N increasing from 1 to 24.)

Technical details: doing the compressions, collecting results

Compression methods possible within the TIFF format: the jpeg method is lossy, others apparently not. (The command `tiffcp' is part of libtiff).

for t in tiff_{none,packbits,lzw,zip,jpeg} ; do echo $t
  time tiffcp -c ${t//tiff_/} a.tif a_$t.tiff
done

Direct (lossless) compression by common compression programs.

for f in {bzip2,gzip}_qual{1,4,9} ; do echo $f ; set ${f//_qual/ } ; 
   time cat a.tif | $1 -$2 >a_$f.`echo $f | grep -q bzip && echo bz2 || echo gz`
done

Conversion into jpeg at varied quality. (The `convert' command is part of ImageMagick).

for q in 10 20 30 40 50 60 70 80 90 100 ; do echo -n "jpeg_qual$q";
    time convert -quality $q a.tif a_jpeg_qual${q}.jpg
done

Conversion to png.

convert a.tif a.png


Page started: 2009-03-02
Last change: 2011-08-03