Lab image compression

Images from general-purpose digital cameras come out as already compressed jpeg images, but without very tight compression: that is, they could be rather smaller without loss of information, but processing power on the camera is too limited to be wasted on trying cleverer compression. Nevertheless, such images are already several or many times smaller than the uncompressed version: for modern cameras this would typically be around (2400*1800)pixels * 3 bytes ~= 13MB rather than the 0.5--2.0 MB that the output images usually are.

It seems our microscope's camera avoids any compression at all, perhaps because jpeg (common for photos) is in general lossy, so the pixels are not perfectly preserved. This camera stores 2080x1540 pixel images in the `tiff' format, without using any even of the lossless compression methods available within tiff: each image is therefore around 9MB even if just showing a blank background. One possible reason for not doing compression could be that it might take longer to save and open (depending on CPU speed versus disk speed). Another reason could be that different programs support different subsets of tiff features, so compression perhaps makes compatibility worse.

If we are to do a lot of lab work, including often-taken sequences of about 50 photos during aging, this ~(1/100)GB per picture will produce about 0.5GB per sequence. Compare this to about 8GB for our networked home directories, 160GB for typical local disks at ETK, or 500GB for quite large internal or external hard disks.

There's no big problem with such a lot of data: 500GB disks are about 1 SEK/GB just now, and bigger ones are getting cheaper too. But there is some reduction in convenience when one's data is spread between several disks and when it takes a long time to make backup copies. Lossless compression is one option that `costs' only in processor time. Else, if the image filesize can be lossily compressed without large loss of quality, it might be worth storing compressed images for regular analysis, and `original' backups on several extra disks or on regularly stored DVDs.

Jargon: Lossless compression (typically of text, documents, lab data, programs) means reproducing perfectly the original, reducing filesize by representing the information more efficiently (take advantage of redundant information). Lossy compression (typically of image/audio/video) means also making some approximations perhaps based on `near correct' and taking account of limits of human perception. The treeing images have a background that doesn't matter much if it isn't represented perfectly, but the edges between tree and background should presumably be kept very similar by an acceptable compression algorithm.

Results of compression final-size / cpu-time

Below is some investigation of the compressibility of a sample microscope image of an electrical tree. The image is saved in tiff format with several inbuilt compression methods, then the tiff image is compressed with common data-compression programs at several levels of speed/size trade-off, then the tiff image is converted to png and varied-quality [lossy] jpeg images.

Format and quality	size [kiB]	time [s]
[original]	9404	0
tiff_none	9404	0.02
tiff_packbits	9476	0.11
tiff_lzw	8248	0.32
tiff_zip	6432	1.01
tiff_jpeg	172	0.24
bzip2_qual9	4856	3.08
bzip2_qual4	5056	2.71
bzip2_qual1	5500	2.52
gzip_qual9	6380	0.95
gzip_qual4	6392	0.92
gzip_qual1	6500	0.74
png	3744	5.94
jpeg_qual100	3744	1.12
jpeg_qual90	620	0.58
jpeg_qual80	180	0.38
jpeg_qual70	112	0.33
jpeg_qual60	76	0.32
jpeg_qual50	56	0.31
jpeg_qual40	44	0.31
jpeg_qual30	36	0.31
jpeg_qual20	32	0.34
jpeg_qual10	24	0.30

Conclusions:

several lossless methods can approximately halve the size
png or lossless jpeg compression can do a little better than this
lossy jpeg compression can reduce the filesize enormously (1/90) with very little loss of tree detail: try seeing an important difference between the original image and a 90% or even 80% quality jpeg!
coming below about 80% jpeg quality gives clearly lossy images (as well as tiny filesizes): this is expected from general experience with digital photo editing

Suggestion: to enable saving all produced images easily on a sensibly sized disk, jpeg at about 85% quality seems quite acceptable. Perhaps the originals are still wanted, either as the `important' subset or as all of them on regular DVDs or a few removable hard-disks ... not my business to decide!

Cropping

As a lot of the image is away from the tree area and not of interest, it can be removed, allowing more reduction of size. If automated, this is very easy, e.g. for the example picture,

convert -quality 85 -crop 1000x1050+820+250 a.tif a_crop_jpeg_qual85.jpg

gives 100kiB filesize for a_crop_jpeg_qual85.jpg rather than 250kiB without the cropping. The above command can of course be run in a loop for a whole sequence or sequences with similar crop borders. The 1000x1050+820+250 argument is a 1000,1050 (x,y) rectangle, starting at offset 820,250.

Other automated tricks

The existence of command-oriented image manipulation allows some other tricks. I'm using mainly ImageMagick commands wrapped up in Unix shell, but one can try such things with ImageMagick in a DOS shell, or called from Matlab/Octave, or in the Matlab `image-processing toolbox', or from the GIMP, etc.

Montages (multiple images)

To put a sequence of images together `in space', the `montage' command from ImageMagick can be used:

montage  -mode concatenate  -tile 4x5  tmp_*.jpg  montage_eg.jpg

which gives a single image montage_eg.jpg with all the 20 input images tiled. (This example one has also been reduced to a sensible size; it is from transformations of the original image, according to amusing ideas found on this page, e.g. the `-charcoal 10' option to the `convert' command.)

Videos (multiple images)

To put a sequence of images together `in time', the `mencoder' command from mplayer can be used to assemble jpeg images into an arbitrary framerate container:

mencoder "mf://tmp_*.jpg" -mf fps=2 -o video_eg.avi -ovc lavc -lavcopts vcodec=msmpeg4v2:vbitrate=80

where the `fps=' option sets the frames per second, `vbitrate=' sets the bitrate, and `tmp_*.jpg' defines the pattern to match in the input filenames. The output file is video_eg.avi. (The input files to this example were obtained by running the `convert -charcoal N' command on the photo, with N increasing from 1 to 24.)

Technical details: doing the compressions, collecting results

Compression methods possible within the TIFF format: the jpeg method is lossy, others apparently not. (The command `tiffcp' is part of libtiff).

for t in tiff_{none,packbits,lzw,zip,jpeg} ; do echo $t
  time tiffcp -c ${t//tiff_/} a.tif a_$t.tiff
done

Direct (lossless) compression by common compression programs.

for f in {bzip2,gzip}_qual{1,4,9} ; do echo $f ; set ${f//_qual/ } ; 
   time cat a.tif | $1 -$2 >a_$f.`echo $f | grep -q bzip && echo bz2 || echo gz`
done

Conversion into jpeg at varied quality. (The `convert' command is part of ImageMagick).

for q in 10 20 30 40 50 60 70 80 90 100 ; do echo -n "jpeg_qual$q";
    time convert -quality $q a.tif a_jpeg_qual${q}.jpg
done

Conversion to png.

convert a.tif a.png

Page started: 2009-03-02
Last change: 2011-08-03