Hints/tips/tricks that may help someone

GRUB setup problem with ext3 (grub)root

grub Error2: Bad file or directory type has been annoying me for a few months, as of end of 2008. I often copy a whole linux-based (usually Gentoo) system to another computer, on which the receiving partition will have a filesystem made by the mkfs of some Gentoo or Knoppix boot-cd; I then chroot into the copied system from the boot-cd to run whatever version of grub is in that copied system, for installing it to the MBR. The above error has come up sometimes, only with ext3 roots. I sometimes just switched to jfs to avoid the problem, but now that I've looked around a bit on the web I see that recent mkfs.ext3 has a larger default inode size (256B) than before, which upsets grub. One, quick, solution: use

mkfs.ext3 -I 128 /dev/DISKNAME 
to keep to the old default inode size. I'd hope that another solution is to get a recent grub and perhaps some other updated libraries, but this one is quicker for me to do...

No keyboard or mouse in recent (2009) Xorg (1.5.*)

config/hal: couldn't initialise context: (null) ((null) Recent xorg seems to prefer `hal' (hardware abstraction layer) for access to these devices. Any of the following should help: compile without support for hal, make sure that hald is present and running correctly to make keyboard/mouse available, or add the following to the xorg.conf file (add to the existing "ServerFlags" section if there is one).

Section "ServerFlags"
        Option "AutoAddDevices" "false"
        Option "AllowEmptyInput" "false"
EndSection

vmplayer start-up error due to DISPLAY (?)

On a (Gentoo) vmware-player installation, the following error:

$ vmplayer /space/vms/
/opt/vmware/player/lib/vmware/bin/vmware-modconfig-console: unrecognized option '--icon=vmware-player'
Must use a valid mode.  Use one of:
        --get-kernel-headers
        --get-gcc
        --validate-kernel-headers
	[etc. etc.]
was first removed by removing the appropriate checks in the vmplayer executable script, since one of these checks was actually giving the error. But it persisted. Checking the DISPLAY environment-variable showed it to be empty (I run proprietary programs [often with bad manners about where they write their files] as another user from my main session). Setting DISPLAY correctly removed the problem. No useful site was turned up by Google on this problem.

Xorg with `nv' driver absolutely refuses to use full resolution

Note: in general, an Xorg resolution problem, especially if apparently happening `for no reason' (no config changes, updates, different monitor etc) may be due to a poor connection of the monitor plug, preventing the monitor from sending the information about its capabilities into the graphics card (see Xorg logs as a check of this). A more specific problem follows:

Not using default mode "1600x1200" (exceeds panel dimensions).
Rather puzzling till searching for it as a bug. This maximum of 1280x1024 seems a common result of the `nv' (open source driver for nvidia cards) driver being used with a monitor on its digital output. By connecting the monitor to the analogue output, the right 1600x1200 as asked for in xorg.conf became available. It seems the driver gets the wrong idea about what the monitor claims about itself, when on the DVI connection.

Why we like Gentoo and its kin

 * ERROR: app-office/libreoffice-3.4.5.2 failed (pretend phase):
 *   Build requirements not met!
 *
 * Call stack:
 *                    ebuild.sh, line  75:  Called pkg_pretend
 *   libreoffice-3.4.5.2.ebuild, line 211:  Called check-reqs_pkg_pretend
 *            check-reqs.eclass, line 105:  Called check-reqs_pkg_setup
 *            check-reqs.eclass, line  96:  Called check-reqs_output
 *            check-reqs.eclass, line 237:  Called die
 * The specific snippet of code:
 *              [[ ${EBUILD_PHASE} == "pretend" && -z ${I_KNOW_WHAT_I_AM_DOING} ]] && \
 *                      die "Build requirements not met!"
 *

i.e., how good that it makes clear that one can easily override the (failed) pre-checks, by setting I_KNOW_WHAT_I_AM_DOING. Compare to proprietary software or even to some other Free software installers that would simply exit without any options if believing there to be a problem.

Very slow mkdir on large ext3 fs

(Quick answer: use ext4, but it's still not at all good.)

On changing a set of 4 320GB disks in (linux md) raid5, to 3 1TB disks, giving about a 1.8TiB ext3 filesystem, a problem was immediately apparent: sometimes a mkdir would take an annoying long time, even into tens of seconds! The disks were new, and so was the filesystem on them, which had even been given options ( -E stride=16,stripe-width=32 ) supposedly optimal for the raid array. All the other parts of the computer were as before. This was tried with several kernels, from 2.6.29 through to 2.6.32, with and without gentoo patches.

Primitive tests suggest the large size to be important. Some confirmation of this, and of others having the problem of long delays, has been found on the web. The number of options ( commit=1 or commit=60 , writeback, directory hashing, simultaneous long read/write, raid stripe sizes, etc) and tediously time-consuming rebuilding of filesystems and testing, precluded serious study -- only half a day.... Judging from Documentation/filesystems/ext4.txt under the kernel sourcecode, much of the point of ext4 is making things work better with big filesystems. This was tried, and worked much better. I've not been keeping up to date: it must be years since I glanced at it first and thought I'd wait a while for the end of `experimental'.

Of course, one might ask why not switch from ext3 to reiser or xfs or jfs, none of which had the same problem. The reason is that the data is important. Experience with all of these filesystems has been that the effect of disk corruption, power failure or kernel problems is worse than with the `native' ext[234] filesystems: XFS had its trick of replacing file contents with zeros, reiserfs would be hopelessly lost a few years ago if trying to mount without basic /dev files (like null) at least in a gentoo system [ext3 wasn't], reiser has more easily than ext3 lost all track of data in the event of read-failures on a disk ... can't think of any dirt on jfs, I confess, but I haven't much experience with it. I see that opinions differ on the relative data-protecting merits of different filesystems, but in the absence of a detailed study I go by my experience rather than hearsay. (I'd love to hear of such a study, by the way...)

Remove virus (worm) from ms-windows system

It's distressing how often I see people struggling to remove one of the traditional viruses (in the generic sense: worm is more correct to the case considered here) from their ms-windows computers, but meeting the usual trouble: scan (minutes on end); find; ask for 'remove it'; be told it's ok; reboot, and find it's all back again.

This is of course the way that the authors of malicious programs want it: the running program checks that any attempts at removing its copy on disk will be met with immediate reinstatement, and attempts at changing registry settings or antivirus installations may also be prevented. What is surprising is how so many so-called antivirus programs waste so much time pretending to help, when often they don't actually help at all. It's appalling what time gets wasted by the users, who should not even have continued after the first failure.

Often, the removal is very easy, as long as the disk can be accessed from another system that isn't running the virus. Look up the virus (see the name in results from a scan by an antivirus program, or search the web on the problems or error-messages being encountered). Find what its critical files are; sometimes it's as simple as a single additional file (a true virus would be nastier, in having modified an existing program; depending on the importance of that program you may still care to remove it).

Start the computer from a 'live cd' (or usb key, or take out the hard-disk and put it as an extra disk in another system). The other system (live cd etc) must have read-write support for whatever filesystem the virus is on. One good choice is a linux-based knoppix cd (Free); it might have a point-and-click way to mount the necessary disk-partition, but otherwise something like mkdir /d ; mount -t ntfs-3g -o rw /dev/sda1 /d will do the job (sda1 is assumed to be the windows partition; perhaps it will actually be sda2 or hda1 etc -- look using fdisk -l /dev/sda). A good choice for people who like ms-windows and nothing else might be bartpe (pebuilder), which requires fiddling around to make a bootable windows cd, which will then give an ms-windows interface.

Then simply remove those critical files, unmount the partition, and reboot. I've had great success with this in the last few years, helping people in minutes who'd wasted whole days fiddling with the usual tools. An absurd situation, but that's how it is!

PSCAD, invalid directory (compilation failure)

Trying to run PSCAD/EMTDC, with EGCS Fortran-compiler, in an m$ windows system of vista or 7 vintage, this error arises when using "example" models: the models are under "c/program files/..." so are protected from user modification. Apparently the compiler is asked to dump things there too, which it can't if run as a normal user. The solution is to copy the examples to one's own directory, or ask for pscad to be fixed.
Much as I love to knock m$, this choice is the right one: it's absurd that one could ever change things under installation roots as a normal user: on multiuser machines it's likely to cause real problems, and even on single-user machines it promotes careless distribution of user-modified and system files, making backups much harder. Pity this wasn't the case from ages back, as typical in unixes.

Hitachi Deskstars starting to flag

From about 2003 to 2009 I've formed the strong impression, from some tens of disks, that Hitachi was no bad choice. During that time, almost every hard-worked Maxtor died or at least got very slow and had plenty of read errors reported by 'smart'; two Samsungs in a row broke completely and suddenly on power-loss to a computer (the second was a replacement under warranty); nothing too bad was seen for Seagate or WD, but as the Hitachis had been so good -- outnumbering the Maxtors but having not a single trouble, these eventually won favour.

Last summer, 2009, things started to change: a 500GB Hitachi failed (very annoying and remotely, albeit fairly slowly) within months of new, a 1TB Hitachi in an array of three was in a few months slow enough to slow the array acutely, and its replacement was also showing oodles of Raw_Read_Errors within the time taken to build it into the array, although the other two in the array, and two slightly older ones in another computer, had zero such errors after several months. One should always consider chance with these small numbers of disks, but it's not such a small number as to be worth continuing when there are other options: it begins to seem that at least a certain age of Hitachi disks has rather a `bathtub' curve of reliability, such that some have a rapid decline and possibly failure, even when very new, and those that don't may well live long.

For the latest array, WD 'Green' 1TB disks have been used. After building and rebuilding an array a few times (testing things) there's no evidence of slowing and no reported errors: that's better than the bad ones of the Hitachis.

`Advanced format' 4096B (4KB)-block disks

Some recent (2009/2010) disks with 4KB blocks risk the problem that the kernel doesn't properly report the blocksize, and fdisk either believes this or has a `normal' 512B size hardwired: if partitions then start at points within the actual block-size, rather than at starts of blocks, performance can be reduced -- search for lots of results on the web! By using the `u' command in fdisk before choosing boundaries, then choosing all starting points to be multiples of 8 (for a 4KB actual block-size, 8 times the 512B), the problem is avoided. Some tens of percent difference were seen for ext3 writes in my example, between a first block of 63 (wrong) or 64 (right).

`Permission Denied' for NFS v3 client, although uid and mode correct.

Having been forced (proprietary drivers in the lab) to go back from current 2.6.3x kernels to pre-2.6.20 (in fact, 2.6.16), my NFS v3 mount of working files started to be read-only, although the client reported (using ls and friends) that the files were owned by the current user and were rw for user. Nothing at the server end (RHEL5.2) helped, and anyway that server was running fine with lots of other more modern clients. Trying NFS v4 didn't help, nor did turning on all related services, changing the username--uid mapping to match the server (only the uid matched before), or making sure that only the one line in the server's exports applied to this export (i.e. that there wasn't the broader export line exporting the whole /srv partition to all, but read-only). Going back to an old nfs-utils might have been good, since this one was compiled with the rest of the system for a new kernel. Trying a work-around of using CIFS (samba with extensions) led to some annoyances with file-times and speed. In the end, NFS v2 was used, successfully -- nfsvers=2 in the options for the mount in /etc/fstab was all that was needed. For this use, with editing a few smallish text files, the NFS v2 limitations aren't a problem. Perhaps the following is the reason for the success, that it sticks to a simple check of uid, rather than something that apparently messes up.

Version 2 clients interpret a file's mode bits themselves to determine whether a user has access to a file.
Version 3 clients can use a new operation (called ACCESS) to ask the server to decide access rights.''
(http://nfs.sourceforge.net/)

Xorg eating memory?

Since September 2009, several copies of a newly compiled amd64 Gentoo system (at home, at family home, at work) have had Xorg using more and more memory, until needing to be killed from its several GB after a few days of intensive working. I'd never seen such behaviour before, having been used to work-computers running Gentoo for a whole year without ever even logging out!

I suspected Xorg itself, or certain drivers: but nv or nvidia, or Xorg 1.6 or 1.7 all did the same. And strange that no other user noticed it: couldn't /just/ be that I open and close more things than they. It only happened on the main 64-bit systems, not on 32-bit laptops or lab computers.

The very useful program found from a webpage about memory-hungry X is

xrestop
(X resources top [moving list..., like top, htop, iotop, apachetop]). Running xrestop immediately pointed out that Kompose was the problem. Kompose is a sort of desktop switcher that tries to remember what each desktop looked like. I'm the only one here who uses it ... and actually, I never use it but just think it's rather nice to have. So, a problem simply but slowly solved: don't use it. Another suggestion from a website was disabling Xinerama, by adding Option "Xinerama" "false" in the ServerFlags section of xorg.conf. Not necessary in my case.

Getting TeXmacs to include Axiom in its available `sessions'

The TeXmacs wysiwyg `mathematical word-processor' allows one to insert a session with an external program, where the commands are remembered and the responses are neatly formatted. This is a good way to view the normally ascii-based output of Axiom, making long expressions much more easily read. TeXmacs will include Axiom in the list iff it finds the command AXIOMsys available on its path. Sometimes only a link to axiom is available from an on-path directory (e.g. gentoo of 2009) in which case a further link or adding the axiom bin/ directory to the path will get TeXmacs to provide Axiom sessions on its next start.

RHEL5 yum claims no updates: but RHN claims hundreds, yet they never happen

The list of necessary updates on the RHN (RedHat Network -- management) web-interface grew longer and longer, and attempts at selecting all for immediate update came to nothing. The command yum check-update returned an empty list as though nothing needed doing. The solution was this:

yum clean all
after which yum started realising the true situation.

Cheat-`virtualisation': running in chroots, even graphically

Some other entries on this page use chroots for dealing with compiling certain modules or programs. Another use for chroots is when compiling or trying out another system (of course, the kernel is the same inside and outside the chroot, but all the libraries and applications aren't).

I upgrade my Gentoo systems (used for almost any computer other than servers managed in collaboration with other people [RedHat] or on certain laptops [FreeBSD]) perhaps once in two years; in the interim it's just occasional security updates from glsa-check --list. For an upgrade I simply compile from scratch, taking the latest baselayout: Gentoo changes so quickly it's not worth fooling about trying to do an update after two years, or to keep updating things in between and suffering library breakages and subsequent rebuilding of large numbers of programs, update of configuration files, and so on. Compilation takes some days for my huge list of things to install, particularly with all the likely problems of circular dependencies when including USE flags such as doc (which are worked around by building without documentation, then rebuilding with it once the required programs are in place). The compilation is done in a chroot, to allow the existing system to keep running until the new one is complete: for example,

	mkdir /NEW
	cd /NEW
	tar -xjf /tmp/stage3-tarball-xxxxxx.tar.bz2
	rm -rf dev ; mkdir /dev
	mount --bind /dev dev ; mount --bind /proc proc
	mount --bind /sys sys ; mount --bind /usr/portage usr/portage
	cp /etc/resolv.conf etc/ 
	chroot . /bin/bash

Recently there's been cause to try running the full system graphically from a chroot, preferably on a further X display (e.g. Ctrl-Alt-F12) while still having the existing system on its normal X display. This was because KDE3 has been removed from Gentoo, and KDE4 was (correctly) felt likely to be fraught with regressions of functionality such as to make an upgraded system pointless ... it is fortunate that it was thus tested, to establish the importance of keeping the old system running for a few more years.

The cheat method used to get both systems running graphically was to set the real system not to listen for XDMCP from kdm, but the new chrooted one to listen; the chrooted one was started with /etc/init.d/xdm start. Then, the real system's display manager could be set to broadcast for remote logins, causing it to find a host of its own name, which was then selected. There was a problem: xterms (konsole etc.) didn't open: no prompt came up. Suspecting some sort of pty trouble, a google search was made which turned up this, going into further uses of chroots and pointing out the need of bind-mounting not only /dev but also

mount --bind /dev/pts dev/pts
and possibly /dev/shm too. All worked easily then.

OpenGL and good performance in VNC

Using realVNC and tightVNC for computation servers has been a problem for those users who run applications with graphics acceleration; back in 2007 some technical FEM applications had to be worked around to force them to use a software (mesa) opengl implementation. Rather obvious `jpeg-style' noise around edges of fonts was also a highly-noticeable annoyance.

Just now, in 2010, I've `discovered' TigerVNC and have updated all our servers and my own fleet to have this instead. Its X-server (Xvnc) includes acceleration extensions, so applications work happily and acceptably fast even for 3D work. It can be built based on the same Xorg release as the computer's physical X-server (if it has one...) with the advantage of similar appearance of default fonts. The quality of the image is, for whatever reason, higher, with scarcely-detectable noise around edges.

(Note that at least as of May 2010, a fully-functional tigervnc installation in gentoo requires editing the ebuild file as described in bug308465, to cause the Xorg build within the tigerVNC build to use 'glx-tls'.)

Getting sensible DPI (dots per inch) for fonts in Xorg

Sometimes a change in graphics card or driver makes all the fonts on the screen get so much bigger or smaller that changes of several 'points' are needed to make them acceptable. I have seen it written that this is particularly common with nvidia hardware, but that the solution (other than adjusting all font-sizes) applies to all cases. In the Monitor section of xorg.conf (probably /etc/X11/xorg.conf), the Option "DPI" line can force the DPI:

    Section "Monitor"
        ......
        Option "UseEdidDPI" "false"
        Option "DPI" "96 x 96"
    EndSection
The "UseEdidDPI" turned out not to be necessary, but perhaps it would be in some cases. The 96x96 is pretty common.

Including truetype (ttf) and opentype (otf) fonts for XeTeX

My installations of texlive-2008 (in gentoo) were done without the xetex option. A user of one of these systems wanted to compile a CV template from here, which relies on xetex. XeTeX permits direct use of TrueType and OpenType fonts. It was easily installed, but the example wouldn't compile:

 $  xelatex cv_template_xetex_gentium.tex
.....
(/usr/share/texmf-dist/tex/xelatex/xetexconfig/geometry.cfg))kpathsea: Invalid fontname `Gentium Basic', contains ' '
! Font \zf@basefont="Gentium Basic" at 10.0pt not loadable: Metric (TFM) file o
r installed font not found.
The sourcecode of kpathsea makes clear that this is a fatal error in itself. Judging from a web-search this error doesn't mean it was wrong to have a space in the font-name, but very likely just that the font isn't there, so the search has come to places where it would be wrong... Gentoo's sil-gentium font package was installed, but was found not to contain the 'Basic' set, so both lots were got from Gentium download, along with another ttf file (monaco) needed by the template.

Then how to install the fonts so that xelatex would find them? Fortunately, it does follow the general font/desktop conventions rather than having its own special index: simple copying all the ttf files into ~/.fonts/ made it all work. xelatex produced pretty pdf output.

Conversion of other Bibliography formats to Bibtex

The simplest, best thing I've found is bibutils, which provides a set of simple commands, using some XML format as an intermediary: for example, ris2xml, xml2bib, xml2ris.

The annoyance that prompted it: something called RIS, that several journals all on the same day were wanting me to accept instead of bibtex, to download citations. Clearly I'd chosen a bad subject that day. The first couple were manually edited. Then it seemed so frequent a problem I went to the Web and found a perlscript that basically didn't work (too many assumptions about the input field-order etc). Then a cb2bib program that was GUI-based and required lots of dependencies then wasn't really the quick one-shot command I sought. Finally, bibutils was found (via cb2bib!), so my final ris2bib command is just

#!/bin/sh
ris2xml "$@" | xml2bib -b -nb
with apparently good robustness.

Compiling 32bit (linux) kernel, while running under a 64bit kernel

All that's needed is to add an argument ARCH=i386 after the `make' command, when running such commands as make ARCH=i386 menuconfig, make ARCH=i386, make ARCH=i386 install, etc. Without this option, the presence of a 64bit running kernel causes the kernel build to assume that only 64bit targets should be considered. My need for forcing this choice was that I was preparing a (copied) 32bit system's kernel within a chroot running from a 64bit livecd; it should have been ok to have had a 64bit kernel in my new system, but the 32bit chroot had no compiler capable of 64bit output, so the kernel's autodetection of 64bit caused compilation to fail.

Persuading `NI' drivers (e.g. GPIB [488.2], PXI) to work

Following the nvidia-related points above, here's another poinnt continuing the line of huge wastes of time due to hardware manufacturers persisting in insulting their customers with proprietary drivers. NI (National Instruments) makes various control/measurement/instrumentation hardware. It therefore makes use of kernel modules to communicate with specialist PCI, USB etc. hardware. These, unfortunately, are proprietary. They work competently with a few `approved' distributions, but attempting to use other systems, or newer systems (the release cycle of the drivers in infrequent) generally gives errors, even in the installer sometimes (wrongly) reports a glowing success after the shower of error messages.

One way to have a better chance of the drivers working is to go (back) to a kernel version very close to those in the supported distributions. I've tried this by going back to linuxes 2.6.16 or 2.6.18 (to match RedHat EL5) instead of the natural choice of 2.6.30 (as of August 2009) for my recent Gentoo system.

The first trouble came about ten lines into compiling this older kernel with my gcc-4.3.2: error: ‘PATH_MAX’ undeclared. By mounting the kernel source within the (alternative-boot) RedHat system, the kernel compiled happily (`make') with gcc-4.1.2. I then ran make install && make install from within the kernel source-directory when running the Gentoo system. At later times during the experimentation, I instead mounted the relevant parts of the Gentoo system (/boot /lib/modules) within the RedHat chroot, so as to allow the compilation and installation of kernel and modules all in one go.

The next problem came when running the installation for the NI drivers (GPIB) or later when running updateNIDrivers to compile the wrappers for the kernel modules. First some check of kernel version and kernel source version failed: it was claimed they didn't match, though they clearly must as the kernel came from the sources. A first step was to follow advice to make it happy:

cd /lib/modules/$(uname -r)/source/include/asm/ 
ln -s asm-offsets.h asm_offsets.h
cd /lib/modules/$(uname -r)/source/include/linux/
cat utsrelease.h >>version.h
This removed the nonsense-error about versions, but left (understandably) the warning that different compiler versions had been used for the kernel and for the new modules. The intended Gentoo kernel was running, and the RedHat system was mounted under /sl (Scientific Linux). By some bind mounts, the relevant parts of the Gentoo system replaced the RedHat ones, allowing a chroot into the RedHat system to compile the modules with the RedHat compiler:
list="dev proc sys usr/local lib/modules boot usr/src usr/include"
for t in $list; do mount --bind /$t /sl/$t; done  # where /sl is RedHat mount
chroot  /sl /bin/bash
cd /usr/src/linux
make && make install && make modules_install
updateNIDrivers
for t in $list; do umount  /sl/$t; done

The final trouble was my fault, for not considering that /usr/local/lib was hitherto unused, and wasn't included by the linker. In matlab (instrument control) the following error came up:

??? Error using ==> visa.visa at 242
Invalid RSRCNAME specified. Type 'instrhelp visa' for more information.
Error in ==> icm_connection>icm_connection_open at 82
ih.icm = visa('ni', 'GPIB0::1::INSTR');
This RSRCNAME error turns out to be nothing to do with the 'GPIB0::1::INSTR' actually being wrong, or with the specification having changed, but just that there's some problem with the drivers. In my case, this turned out to be that the loader path hadn't yet been updated to include the new NI files that had been copied into /usr/local/lib: running env-update (a Gentoo-specific wrapper for ldconfig and other things) then restarting matlab from a new shell, was enough to get GPIB working. So now it's running in recent Gentoo (2.6.31-kernel age) on a 2.6.18 kernel (full of holes) that was compiled within a RHEL-5 system on its gcc-4.1.2 compiler. If only NI would get drivers into the kernel.

Spell-checker (e.g. aspell) working in Kate/Kile

On any spell-checking attempt in Kile or Kate:

KDE-3.5.10 was in use here, with Kile 2.0.3, using config files inherited from at least 5 years ago. This was in spite of going in to KDE's control centre (kcontrol) and configuring the Spell checker (under KDE components). The aspell program was present, in the most obvious place of all, /usr/bin/.

This seems to be another strangeness of KDE and perhaps of old config files. The working solution came from: http://wisconsinloco.ubuntuforums.org/showthread.php?t=600772 , which suggested editing ~/.kde/share/config/kdeglobals to change the value of KSpell_Client from 0 to 1, so that the KSpell section starts as:

[KSpell]
KSpell_Client=1
....
It was clear that kcontrol had written the later options into the config correctly, but that this 0 (surmised in the above link to indicate the ispell program) was preventing attempts at using aspell. Everything went fine after a restart of kile or kate.

Page started: 2009-01-14
Last change: 2012-04-11