/<-r4d Vaporware

This is where I put a list of software projects that would be cool to see realized as free software, but which I currently have no time to implement myself. You are welcomed to steal these ideas and implement them. Please drop a mail to me at triad(at)df.lth.se if you do, but do not expect any contributions or help from me...


WHATISB a binary whatis tool


This is a cool idea I've had for some time.

What is WHATIS?

You have probably tested the Unix tool "whatis", which is capable of recognizing different strange files according to their content, such as string constants and similar. The MIME magic module for the webserver Apache implements part of these rules. In technical terms this is called "magic file type detection".

Whatis maintains a large database of binary and string constants to be able to detect different file types. It might look like this:

0       belong          0x000001b3      video/mpeg
0       belong          0x000001ba      video/mpeg
0       beshort&0xfff0  0xfff0          audio/mpeg
4       leshort         0xAF11          video/fli
4       leshort         0xAF12          video/flc
0       string          MOVI            video/x-sgi-movie
4       string          moov            video/quicktime
4       string          mdat            video/quicktime

You probably see the idea in this scheme. The first column is the byte position in the file, "belong" is "big-endian-long value", string is a verbatim string that appears at this position, etc.

You might run this program on a file and obtain a result like this:

linus@Felicia etc]$ whatis printcap
printcap             (5)  - printer capability data base

What is WHATISB?

The idea is to take the command "whatis" one level further with "whatisb" which is to be understood as "whatis-binary". Whatisb is to determine file type and contents with more sophisticated statistical methods and algorithmns than "whatis". Sample sessions:

linus@Felicia etc]$ whatisb rougefile
rougefile seems to be an image file format of unknown type.

linus@Felicia etc]$ whatisb /dev/hdb
/dev/hdb seems to be a harddisk partition containing a ReiserFS

linus@Felicia etc]$ whatisb /dev/hdc
/dev/hdc seems to be a harddisk partition encrypted with 4096 bit
3DES encryption.

Do you want me to try to crack the 3DES key and see what is on
this partition? [Y/N] y

Cracking 3DES key (this may take 35 billion years or more)...

I got the idea of "whatisb" from reading about methods used by cryptographic cracking software to determine when something interesting has been decrypted or not: you have to know about the materials statistical properties in order to crack it in a good way.

Implementation outline

"whatisb" is to be based on heuristics and statistics. By using statistics, even heavily corrupted files can be recognized for what they are.

"Whatisb" needs a good architecture with plug-in modules for different detection algorithms, and a rank system as to what computing time different algorithms consume, so that the detection system may try the fast detection algorithms first.

It might also be desirable to have "whatisb" act as a library so that other applications (such as encryption crackers) may use whatisb as a backend to determine with a certain probability when something IS something.

WHATCPU is a derivate aimed for firmware identification

A side track aims for taking the firmware off a certain CPU that you don't recognize and feed it to whatisb. It will tell you something like:

linus@Felicia etc]$ whatisb fw.bin
rougefile seems to be binary code for the ARM v6 architecture.