The curse of three-letter extensions
Lem Bingley argues that this crude legacy of the 1960s should be consigned to the recycle bin.
As we all know, an operating system must know what to do with the various lumps of digital data under its command. It must load documents into Word and images into Photoshop, and not the other way around. This means keeping track of file types.
Back in the 1960s, when bytes were scarce and life was simpler, the three-letter extension was born to do this job. Today, when most operating systems have a tangled family tree stretching back to that period, we seem to be stuck with it.
These identifiers can create huge problems. Files can be rendered unusable by renaming them. Applications are forced to believe the file extension, and to handle mismatches gracefully. The system has to assume that a .yum extension is true, take a big bite, and only then decide if it is eating an apple or a mislabelled onion.
The dangers are compounded by the fact that three letters give a limited range of extensions. There is no central authority to apportion extensions in the manner of web addresses, so they can conflict, resulting in confusion and risk.
An alarming example arrived a couple of weeks ago, in the shape of the My Party virus. This presented users with an attachment called www.myparty.yahoo.com.
This looks like a URL but isn't, because attachments live in the file system, not in the browser. And in the Windows file system, .com means an executable, so it was fortunate that the My Party payload was fairly innocuous. It is worth ensuring that users are alerted to the difference between the two types of .com, because My Party copycats are sure to follow.
With recent versions of Windows, Microsoft has tried to defuse file-type dangers by optionally hiding the extension and creating a registry to match applications and file extensions. But I doubt whether anyone believes these measures will succeed in properly protecting integrity.
For years Mac users have pitied their PC cousins, because the Mac OS has a far superior system for identifying files. Instead of putting type information in the file name, the Mac sensibly embeds it in the file itself. Renaming the file makes no difference to its type.
Unfortunately for Mac users, Mac OS X supports the embedded label and the dumb file extension. Apple "strongly encourages" application developers to support both methods, arguing that this is desirable because the internet is built around three-letter extensions.
But Mac OS X does at least show that it is possible to support both methods of type identification. And it seems we would be better off if the internet did the same.
File types and application associations are simply data about the data in the file, otherwise known as metadata. XML could be used to better convey type information.
To a limited extent this is done already but, as far as I know, there is no effort to co-ordinate this work globally. With all the work being done on describing and linking programs for web services, now would be a good time to start.
Older file types could be shipped in XML containers to maintain compatibility, with newer applications adopting XML labels as part of their native format.
Perhaps then, we might finally have systems that can tell apples from onions without spitting out errors.