This is where I put a list of software projects that would be cool to see realized as free software, but which I currently have no time to implement myself. You are welcomed to steal these ideas and implement them. Please drop a mail to me at triad(at)df.lth.se if you do, but do not expect any contributions or help from me...
This is a cool idea I've had for some time.
What is WHATIS?
You have probably tested the Unix tool "whatis", which is capable of recognizing different strange files according to their content, such as string constants and similar. The MIME magic module for the webserver Apache implements part of these rules. In technical terms this is called "magic file type detection".
Whatis maintains a large database of binary and string constants to be able to detect different file types. It might look like this:
0 belong 0x000001b3 video/mpeg 0 belong 0x000001ba video/mpeg 0 beshort&0xfff0 0xfff0 audio/mpeg 4 leshort 0xAF11 video/fli 4 leshort 0xAF12 video/flc 0 string MOVI video/x-sgi-movie 4 string moov video/quicktime 4 string mdat video/quicktime
You probably see the idea in this scheme. The first column is the byte position in the file, "belong" is "big-endian-long value", string is a verbatim string that appears at this position, etc.
You might run this program on a file and obtain a result like this:
linus@Felicia etc]$ whatis printcap printcap (5) - printer capability data base
What is WHATISB?
The idea is to take the command "whatis" one level further with "whatisb" which is to be understood as "whatis-binary". Whatisb is to determine file type and contents with more sophisticated statistical methods and algorithmns than "whatis". Sample sessions:
linus@Felicia etc]$ whatisb rougefile rougefile seems to be an image file format of unknown type. linus@Felicia etc]$ whatisb /dev/hdb /dev/hdb seems to be a harddisk partition containing a ReiserFS filesystem. linus@Felicia etc]$ whatisb /dev/hdc /dev/hdc seems to be a harddisk partition encrypted with 4096 bit 3DES encryption. Do you want me to try to crack the 3DES key and see what is on this partition? [Y/N] y Cracking 3DES key (this may take 35 billion years or more)...
I got the idea of "whatisb" from reading about methods used by cryptographic cracking software to determine when something interesting has been decrypted or not: you have to know about the materials statistical properties in order to crack it in a good way.
"whatisb" is to be based on heuristics and statistics. By using statistics, even heavily corrupted files can be recognized for what they are.
"Whatisb" needs a good architecture with plug-in modules for different detection algorithms, and a rank system as to what computing time different algorithms consume, so that the detection system may try the fast detection algorithms first.
It might also be desirable to have "whatisb" act as a library so that other applications (such as encryption crackers) may use whatisb as a backend to determine with a certain probability when something IS something.
A side track aims for taking the firmware off a certain CPU that you don't recognize and feed it to whatisb. It will tell you something like:
linus@Felicia etc]$ whatisb fw.bin rougefile seems to be binary code for the ARM v6 architecture.