Homepage of Linus Akesson
Home
Links
 
Miscellaneous:
Antagons
Binary Art
Books & Pictures
Taglines
 
Entertainment:
Stranded
 
Programming:
Scheme & Lisp
TI-83 Code
Umlseq
Various Programs
 
Obfuscation:
Brainfuck
Intercal
Sendmail
Symbolic Links
Vim Code
 
Music downloads:
Bärnsten
Chopin Larghetto
Chopin Romance
M.I.N.D.
Multiple SIDs
SID Goes Piano
Triple Fugue #1
 
Other music:
Dream Theater
Functional Music
Metal vs. Christ
SID Search
 
In Swedish:
Albatross ex 2000
Blåsyra
Kalvins rebustävling
Landet i ordet
Natt-haikus
Tralleman
 
Bach of the day:
BWV1040
 
Mail me!

How does it work?

The SID theme search engine consists of a large database of themes. The database was automatically generated from the HVSC (this took several days).

This is the general idea of the melody extraction program: A 6510 emulator is invoked once for each SID, subtune and voice. The init routine is called with the right subtune number, and then the play routine is called 70000 times (allowing a maximum song length of approximately 23 minutes for a 50Hz SID). If the play routine won't return after 200000 CPU cycles, the emulation stops. Every time the play routine returns, the control register of the current voice is examined. If the gate bit was turned off during this call to the play routine, the 16-bit value in the frequency control register is converted into a note name, and emitted.

This way we obtain three sequences of note names for each SID and subtune. But this database turns out to be huge (207 MB). Therefore, each of the note sequences is scanned for repetition. If the SID appears to restart (from the very beginning, or from some other place), then all iterations except the first one are removed from the database. Additionally, if a voice doesn't play more than 4 notes (e.g. if we're looking at a sound effect subtune), then those 4 notes are removed from the database -- it's not like anyone's going to try to find them. =) After these optimizations, the database is about 32 MB large.

Note that my emulation environment isn't perfect; the illegal opcodes aren't implemented (that's the easy part), and several SIDs don't work at all for various, mysterious reasons. The database would have been a couple of megs bigger if all SIDs had been correctly emulated.

Now, it would be possible to transpose each search query into the twelve different keys, and scan the entire database once for each transposition. But to speed things up a bit, the note sequences are converted into change-in-note-value sequences. This way, after the search query has been converted in the same way, it is only necessary to scan the database once. (The drawback is that wildcards will match more tunes than they would have done without this conversion.)

The actual pattern matching is then handled by a MySQL server.

Search hints

Here are some general tips and tricks for searching the SID database:

  • A glissando is usually considered to be one note, because the emulator is triggered every time the gate bit is reset. Try searching for the first note in the glissando, or the last one. Wildcards will also work, of course.
  • Try removing the first note. ("defgfgeedefefg" won't match the Parallax melody, for instance, but "efgfgeedefefg" will. Interesting...)
  • Sometimes a melody is split into two voices, each voice playing every other note. Try searching for every other note in your theme. (For instance, you might consider "daeafaeadagafaead" to be the initial theme of Giana Sisters, subtune 5, but it will only match a couple of remixes, and not Hülsbeck's SID. "defedgfed", on the other hand, will find it.)
  • A single note between two wildcards will -- unfortunately -- match any note. In other words, a sequence such as "d.e.f.e.d.g.f.e" will match every SID in the database. This has to do with the fact that the melodies are stored as delta-note-value sequences in the database.
  • Arpeggios may match if you're lucky, but in most cases they don't. The apparent melody of Spellbound, for instance, is too deeply embedded in the arpeggios to be searchable.