The most merciful thing in the world is the inability of the human mind to correlate all its contents.
H. P. Lovecraft
Helping the environment

A lot of work is done in making linux suspend work better. For me it works perfect (from linux kernel 2.6.20 or so). Therefore I wanted to go a step further and let my class A, heavy power-using receiver switch off when my desktop computer suspends (to ram). And switch back on when my computer wakes up. The motivation is that I only listen music via my computer.

To achieve this I bought the Gembird Silver Shield, a USB-switchable power adapter. I was prepared to do some nice USB snooping and C programming to get this device working in linux, but (un)fortunately there was already a working utility for this device.

Configuration for suspend/hibernate is not so easy and documentation is sparse for the user mode utilities. Since it took me more than the usual googling I will summarize my conclusions here for later use. First the gnome-screensaver measures the idle time. After this timeout is expired the gnome-power-manager starts to measure his own timeout (so before suspend the two times will stack). When the gnome-power-manager times out it will look at the Inhibit flag. If there is no inhibiting (for example my rhythmbox pushes Inhibiting, because I do not want to suspend when music is playing) your computer will suspend.

Since a custom script should be added to switch the power-switch off via usb with sispmctl the suspend-backend is important. Gnome power manager can use multiple backends to go to suspend mode. This works via hal (the hardware abstraction layer). Configuration for this is in /usr/share/hal/information. HAL is responsible for calling the suspend-backend. The default suspend-backend on my machine is pm-utils. (which can be tested with pm-(suspend/hibernate/power-save etc.). I also have a package called hibernate (which can also suspend, confusing isnt' it?). A third one is suspend2 (which can also hibernate....). These backends have different ways of adding custom hooks.

To add a hook to hibernate I added a file called local in /etc/hibernate/scriptlets.d/. The API is as follows (ugly in my book):

# -*- sh -*-

UsbPowerSocketDown() {
      /usr/bin/sispmctl -f1
}
UsbPowerSocketUp() {
      /usr/bin/sispmctl -o1
}

AddUsbOptions() {
      AddSuspendHook 10 UsbPowerSocketDown
      AddResumeHook 10 UsbPowerSocketUp
      return 0
}

AddUsbOptions

Pm-utils has a much nicer API. To add a custom hook add a file to /etc/pm/sleep.d . This uses init style ordering. So look in /usr/lib/pm-utils/sleep.d/ for a proper number. I needed to talk to the usb-bus AFTER the modules were loaded, so a number lower than 50. So I added /etc/pm/sleep.d/10usbpoweroptions with content like this:

#!/bin/bash
case $1 in
    suspend)
        /usr/bin/sispmctl -f1 
    ;;
    resume)
        sleep 1
        /usr/bin/sispmctl -o1 
    ;;
esac

After all this fiddling it works like a charm! Now hopefully one standard will emerge; because how to achieve the same result with KDE I don't know. I had to manually patch rhythmbox to change calling (via dbus) the Inhibit method from org.gnome.powermanager to org.freedesktop.powermanagement (because i used a wrong combination of versions..), so this suggests a move in the right direction.

-- Filed under:

Posted by jochem on 2nd February 2008, last update on 3rd February 2008
Dead on

Good presentation on captchas by google. See this video. The group that is headed by the prof doing the presentation should easily be able to break current current used image captchas if their statements are true...

Perhaps I like it because perfectly reflects my own opinion though ;-).

-- Filed under:

Posted by jochem on 3rd April 2007, last update on 3rd April 2007
Faraway voice

Hacked a bit more on audio captchas lately, but the source is not in releasable form right now.. Anyway, I now recognize the audio captchas from microsoft 95% correct and from google (also blogger/blogspot) 60%+ by tweaking the segmentation. captchas.net (35%) and paypal.com (10%) are also doable, but some improvements are still needed.

Time to add some neural network learning.

-- Filed under:

Posted by jochem on 14th June 2006, last update on 14th June 2006
Defeating audio (voice) captchas

Introduction

For some years semi turing tests under the name of "captchas" can be found on the web, to prevent bots from filling in forms. When I first saw the visual variant I thought recognizing the characters with a computer algoritm should be easy. A bit of surfing and searching on the internet learned me that I was right, most were broken already. Reinventing the wheel is not very useful, so I left the topic alone.

Later I found a post about voice captchas. Since there was not too much information about this on the net and I was bored (ill at home), I decided to give it a shot. I started easy, willing to enhance the used algoritms to those used in speech recognition (like hmm, viterbi, baum-welch, entropy coding, etc.) when needed. This proved not to be necessary, the first feature complete (segmentation and matching) code worked relatively well on microsofts captchas. Later I tweaked it a bit to also work on google captchas.

On this page you can find proof of concept code to break voice captchas. Do not expect advanced software (pattern recnognition science is so much further) or code that can be used in other projects, I quitted the project when it worked. Initially (february 2006) I kept the code on my harddisk, but later (may 2006) I published it (see disclosure motivation).

How does it work

This is not a complete guide, but some pointers to the source (read it luke). As a starting point, consider the configtype struct:

typedef struct {
    int samplerate;
    int byterate;
    int winsize;
    int band_cnt;
    int word_length;
    int word_overlap;
    int threshold_energy;
    int file_offset;
    char trainfile[255];
} configtype;

The program starts with reading the audio file (in the header it could read the samplerate and byterate, but I am lazy). file_offset bytes are skipped in the beginning of the file, because google starts with a bell. The first step is that all samples are treated with a hamming window (arbitrary choice, most window types should do). The winsize is in samples (eg 512 samples on 8000 Hz provides a 64 ms window). Now the blocks are transformed into the frequency domain with a DFT After that the frequencies are put in band_cnt bins. These bins are not equally wide, the higher the frequency, the larger the band (this has to do with human hearing (mel/bark scale), but I doubt this is actually useful at the current incarnation of the program).

Now the program looks at the highest frequency bin. Every block that has more energy in a window than threshold_energy is considered a peak, and these peaks are used the segment the input file in the different spoken words. The word_length tells the program how many windows long a word is (so all words are considered the same length which is a current weakness of devoicecaptcha). word_overlap helps in localizing the peaks. When the locations of the words are know all frequency bins are written for word_length windows around the peaks. This is called the profile of the word.

The profiles for know words are put in trainfile. When a guess has to be made, the profiles for the words in the file are subtracted from those in the trainfile and the smallest deviation is chosen as the proper word. That is all.

The algoritms in devoicecaptcha are at this moment really naive. There are a lot of possible improvements. Perhaps in the future I will enhance the program a bit, for now I think the 33% (as on googles captchas) is good enough (and I am too lazy to reimplement htk, which should do the trick also (I guess)).

Proof of concept

The code is rather messy, but since this applies to most p0f code consider that 1337 ;-). Download devoicecaptcha.c and compile it with it:

gcc -lfftw3 -std=c99 devoicecaptcha.c

As you can see you need fftw, an allround fourier transform library, which is packaged for most distributions, so you can be lazy (apt-get install fftw3-dev or similar).

When started with ./a.out captcha.wav you also need a data set (a msn one and a google one are available. If you have downloaded the same captchas (see links) as I have, it will print a guess on stdout.

As said before, devoicecaptcha works with a comparison to a trained set. To build up a training set and test the effectiveness of various parameters you can start devoicecaptcha with a third bogus argument, eg ./a.out captcha.wav --print.

What I did was download a large set of captchas with lwp and transcribed them with the proper words with something like:

for i in google/*.wav; do aplay $i &> /dev/null &; read j; mv $i google/$j.wav; done

I ended up with a directory with filenames like "123456.wav" where 123456 are the digits spoken in the captcha. On this directory I unleased a small ruby script, which splits the files in a training and testing set, builds a training set and tests the rest. This script can be found under train.rb.

If you have broken other voice captchas with my code (or with an addition to my code), please let me know, so I can update this page.

MSN

MSN (passport) audio captchas are really weak. Only digits are used, there are always ten digits and the noise is weak and constant. The distance between the words is relatively constant. Devoicecaptcha guesses all ten digits correct on around the 75% of all cases, with a training set of about 40 files.

A data set which can be used for the english MSN (aka passport, aka msn live) voice captchas (which I got from passport.net) can be downloaded under the name msn.txt. It is also possible to create your own training data (see above).

Google

Googles voice captchas are more difficult to break than the captchas by microsoft. Google employs different speakers, uses better noise artifacts and a random number of words. The dictionary is (as microsofts) limited to digits only. The devoicecaptcha program scores around 33% on these voice captchas with a training set of 60 files. This is high enough for use in a bot.

A data set which can be used for the google captchas (in english, google also provides captchas in multiple languages) can be found under google.txt. The files were found at Google new account.

These captchas are also in use by blogger and blogspot for comments

Others

If you know other voice captcha systems, let me know. Perhaps I will have some time to look into them (and perhaps not). I will at least add them to the links section on this page, so together with the provided source other people can try to beat them.

Disclosure motivation

I did not release the source code on this page without hesitation, because it might help spammers in their goals. And I hate spam. However there are some reasons I released the code anyway:

Some people might ask what kind of solutions I do suggest for solving the spam problem. Spamassasin catches thousands of spam mails for me; it is expensive in cpu cycles (so putting spammers in jail is preferred), but the multi-tiered approach (neural network detection together with several lists of wrong-doers) works relatively well and can be applied to other forms of spam.

Playing the cat/mouse game with more difficult captchas, when the previous challenge is broken will work, but is not satisfactory in the end. I encounter more human unsolvable captchas everyday. I do understand that corporations play this game however; in the real world thresholds do help.

Links

Information

Broken voice captchas

Working on voice captchas

Not working on voice captchas

Do you know different implementations of audio captchas? Please contact me.


-- Filed under:

Posted by jochem on 20th February 2006, last update on 20th February 2006
Foucault's Pendulum

Some rights reserved by Feuillu (http://www.flickr.com/photos/feuilllu/) These weeks, with much pleasure, I have been reading Umberto Eco's Foucault's Pendulum. The writer truly knows a lot about history, philosophy, literature, different cultures and is very erudite. So besides enjoying the good plot, reading Foucault's Pendulum learns me a lot. However, on one thing the writer is a bit off. In the beginning of the book the main person tries to break into a computer by writing a (inefficient) computer program which generates anagrams of 'JHVH'.

Accidentally two weeks before reading this passage I wrote for my DND group a small program which solves a similar question in general. Since this group is too lazy to solve puzzles, I put the program on line; it is called rotx. Perhaps someone can make good use of it. It finds all rotx puzzles (with x = [1..25]) which deliver again a known word.

So, for example layout is the 'encrypted' version of fusion (rot6, so a->h, b->i, c->j, d->i, e->k, f->l, etc.), curly -> wolfs, arena - river, etc.

In dutch some solutions are urnen -> lieve, opaal -> hitte, knijp -> bezag and kerk->gang.

To use rotx you need a wordlist, for example as generated by aspell:

   aspell --lang=en dump master | ./rotx - > rotated.txt

The output (in the example above copied to rotated.txt) contains all rotated words which can also be found in the original wordlist..

The first incarnation of the program was in bash/sed/tr and awfully slow. (I had to try though, "No premature optimization!"). It should take two weeks to process a 1.5 MB English wordfile. (Eco's Basic script should take what, years??). Enter C++ and STL. The direct approach (rotating all words through the entire alphabet and looking all results up in the original list) should still take around 20 hours. So I cooked up an algorithm which uses more memory, but finishes in approximately 15 seconds on my old and crusty AMD duron 850!

The source can be found at /downloads/rotx.cc.html.

-- Filed under:

Posted by jochem on 22nd May 2005, last update on 22nd May 2005

<- older entries