Helping the environment
A lot of work is done in making linux suspend work better. For me it works perfect (from linux kernel 2.6.20 or so). Therefore I wanted to go a step further and let my class A, heavy power-using receiver switch off when my desktop computer suspends (to ram). And switch back on when my computer wakes up. The motivation is that I only listen music via my computer.
To achieve this I bought the Gembird Silver Shield, a USB-switchable power adapter. I was prepared to do some nice USB snooping and C programming to get this device working in linux, but (un)fortunately there was already a working utility for this device.
Configuration for suspend/hibernate is not so easy and documentation is sparse for the user mode utilities. Since it took me more than the usual googling I will summarize my conclusions here for later use. First the gnome-screensaver measures the idle time. After this timeout is expired the gnome-power-manager starts to measure his own timeout (so before suspend the two times will stack). When the gnome-power-manager times out it will look at the Inhibit flag. If there is no inhibiting (for example my rhythmbox pushes Inhibiting, because I do not want to suspend when music is playing) your computer will suspend.
Since a custom script should be added to switch the power-switch off via usb with sispmctl the suspend-backend is important. Gnome power manager can use multiple backends to go to suspend mode. This works via hal (the hardware abstraction layer). Configuration for this is in /usr/share/hal/information. HAL is responsible for calling the suspend-backend. The default suspend-backend on my machine is pm-utils. (which can be tested with pm-(suspend/hibernate/power-save etc.). I also have a package called hibernate (which can also suspend, confusing isnt' it?). A third one is suspend2 (which can also hibernate....). These backends have different ways of adding custom hooks.
To add a hook to hibernate I added a file called local in /etc/hibernate/scriptlets.d/. The API is as follows (ugly in my book):
# -*- sh -*-
UsbPowerSocketDown() {
/usr/bin/sispmctl -f1
}
UsbPowerSocketUp() {
/usr/bin/sispmctl -o1
}
AddUsbOptions() {
AddSuspendHook 10 UsbPowerSocketDown
AddResumeHook 10 UsbPowerSocketUp
return 0
}
AddUsbOptions
Pm-utils has a much nicer API. To add a custom hook add a file to /etc/pm/sleep.d . This uses init style ordering. So look in /usr/lib/pm-utils/sleep.d/ for a proper number. I needed to talk to the usb-bus AFTER the modules were loaded, so a number lower than 50. So I added /etc/pm/sleep.d/10usbpoweroptions with content like this:
#!/bin/bash
case $1 in
suspend)
/usr/bin/sispmctl -f1
;;
resume)
sleep 1
/usr/bin/sispmctl -o1
;;
esac
After all this fiddling it works like a charm! Now hopefully one standard will emerge; because how to achieve the same result with KDE I don't know. I had to manually patch rhythmbox to change calling (via dbus) the Inhibit method from org.gnome.powermanager to org.freedesktop.powermanagement (because i used a wrong combination of versions..), so this suggests a move in the right direction.
[Permalink] -- Filed under: [music] [linux] [science]
Dead on
Good presentation on captchas by google. See this video. The group that is headed by the prof doing the presentation should easily be able to break current current used image captchas if their statements are true...
Perhaps I like it because perfectly reflects my own opinion though ;-).
[Permalink] -- Filed under: [linux] [science] [web]
Faraway voice
Hacked a bit more on audio captchas lately, but the source is not in releasable form right now.. Anyway, I now recognize the audio captchas from microsoft 95% correct and from google (also blogger/blogspot) 60%+ by tweaking the segmentation. captchas.net (35%) and paypal.com (10%) are also doable, but some improvements are still needed.
Time to add some neural network learning.
[Permalink] -- Filed under: [linux] [science]
Defeating audio (voice) captchas
Introduction
For some years semi turing tests under the name of "captchas" can be found on the web, to prevent bots from filling in forms. When I first saw the visual variant I thought recognizing the characters with a computer algoritm should be easy. A bit of surfing and searching on the internet learned me that I was right, most were broken already. Reinventing the wheel is not very useful, so I left the topic alone.
Later I found a post about voice captchas. Since there was not too much information about this on the net and I was bored (ill at home), I decided to give it a shot. I started easy, willing to enhance the used algoritms to those used in speech recognition (like hmm, viterbi, baum-welch, entropy coding, etc.) when needed. This proved not to be necessary, the first feature complete (segmentation and matching) code worked relatively well on microsofts captchas. Later I tweaked it a bit to also work on google captchas.
On this page you can find proof of concept code to break voice captchas. Do not expect advanced software (pattern recnognition science is so much further) or code that can be used in other projects, I quitted the project when it worked. Initially (february 2006) I kept the code on my harddisk, but later (may 2006) I published it (see disclosure motivation).
How does it work
This is not a complete guide, but some pointers to the source (read it luke). As a starting point, consider the configtype struct:
typedef struct {
int samplerate;
int byterate;
int winsize;
int band_cnt;
int word_length;
int word_overlap;
int threshold_energy;
int file_offset;
char trainfile[255];
} configtype;
The program starts with reading the audio file (in the header it could read the samplerate and byterate, but I am lazy). file_offset bytes are skipped in the beginning of the file, because google starts with a bell. The first step is that all samples are treated with a hamming window (arbitrary choice, most window types should do). The winsize is in samples (eg 512 samples on 8000 Hz provides a 64 ms window). Now the blocks are transformed into the frequency domain with a DFT After that the frequencies are put in band_cnt bins. These bins are not equally wide, the higher the frequency, the larger the band (this has to do with human hearing (mel/bark scale), but I doubt this is actually useful at the current incarnation of the program).
Now the program looks at the highest frequency bin. Every block that has more energy in a window than threshold_energy is considered a peak, and these peaks are used the segment the input file in the different spoken words. The word_length tells the program how many windows long a word is (so all words are considered the same length which is a current weakness of devoicecaptcha). word_overlap helps in localizing the peaks. When the locations of the words are know all frequency bins are written for word_length windows around the peaks. This is called the profile of the word.
The profiles for know words are put in trainfile. When a guess has to be made, the profiles for the words in the file are subtracted from those in the trainfile and the smallest deviation is chosen as the proper word. That is all.
The algoritms in devoicecaptcha are at this moment really naive. There are a lot of possible improvements. Perhaps in the future I will enhance the program a bit, for now I think the 33% (as on googles captchas) is good enough (and I am too lazy to reimplement htk, which should do the trick also (I guess)).
Proof of concept
The code is rather messy, but since this applies to most p0f code consider that 1337 ;-). Download devoicecaptcha.c and compile it with it:
gcc -lfftw3 -std=c99 devoicecaptcha.c
As you can see you need fftw, an allround fourier transform library, which is packaged for most distributions, so you can be lazy (apt-get install fftw3-dev or similar).
When started with ./a.out captcha.wav you also need a data set (a msn one and a google one are available. If you have downloaded the same captchas (see links) as I have, it will print a guess on stdout.
As said before, devoicecaptcha works with a comparison to a trained set. To build up a training set and test the effectiveness of various parameters you can start devoicecaptcha with a third bogus argument, eg ./a.out captcha.wav --print.
What I did was download a large set of captchas with lwp and transcribed them with the proper words with something like:
for i in google/*.wav; do aplay $i &> /dev/null &; read j; mv $i google/$j.wav; done
I ended up with a directory with filenames like "123456.wav" where 123456 are the digits spoken in the captcha. On this directory I unleased a small ruby script, which splits the files in a training and testing set, builds a training set and tests the rest. This script can be found under train.rb.
If you have broken other voice captchas with my code (or with an addition to my code), please let me know, so I can update this page.
MSN
MSN (passport) audio captchas are really weak. Only digits are used, there are always ten digits and the noise is weak and constant. The distance between the words is relatively constant. Devoicecaptcha guesses all ten digits correct on around the 75% of all cases, with a training set of about 40 files.
A data set which can be used for the english MSN (aka passport, aka msn live) voice captchas (which I got from passport.net) can be downloaded under the name msn.txt. It is also possible to create your own training data (see above).
Googles voice captchas are more difficult to break than the captchas by microsoft. Google employs different speakers, uses better noise artifacts and a random number of words. The dictionary is (as microsofts) limited to digits only. The devoicecaptcha program scores around 33% on these voice captchas with a training set of 60 files. This is high enough for use in a bot.
A data set which can be used for the google captchas (in english, google also provides captchas in multiple languages) can be found under google.txt. The files were found at Google new account.
These captchas are also in use by blogger and blogspot for comments
Others
If you know other voice captcha systems, let me know. Perhaps I will have some time to look into them (and perhaps not). I will at least add them to the links section on this page, so together with the provided source other people can try to beat them.
Disclosure motivation
I did not release the source code on this page without hesitation, because it might help spammers in their goals. And I hate spam. However there are some reasons I released the code anyway:
- I do not believe captchas actually work. Almost all visual captchas are broken already (read for example Microsofts paper on visual captchas or ez-gimpy which can defeat the "human insolvable" captchas at yahoo). Or look at the versatile pwntcha. Although I do believe humans can fool computers for quite some time in the future, I suspect that computers can always beat computer generated challenges (without a "secret" as in PKI).
- Spam is a human problem and I believe human problems should be solved by humans. Legislation and law enforcement are in my opinion the best ways to deal with spammers. If captchas did no harm anyone this would of course not be enough reason for releasing this code, but
- Captchas make the web a more difficult place for disabled people (see http://www.w3.org/ and more annoying for everyone. I hope the community will be motivated to find other solutions (and I am happy that the w3 organization cares for people with disabilities and a usable web for everyone, including the deaf-blind).
- I am a proponent of full disclosure.
Some people might ask what kind of solutions I do suggest for solving the spam problem. Spamassasin catches thousands of spam mails for me; it is expensive in cpu cycles (so putting spammers in jail is preferred), but the multi-tiered approach (neural network detection together with several lists of wrong-doers) works relatively well and can be applied to other forms of spam.
Playing the cat/mouse game with more difficult captchas, when the previous challenge is broken will work, but is not satisfactory in the end. I encounter more human unsolvable captchas everyday. I do understand that corporations play this game however; in the real world thresholds do help.
Links
Information
- Wikipedias entry on captchas is a very good starting point on this topic (as usual) and contains links to pages about breaking visual captchas
- Inaccessibility of CAPTCHA by w3.org
- Microsofts paper on visual captchas
- PWNtcha - captcha decoder by sam hocevar
Broken voice captchas
- Google signup
- Microsofts passport service
- blogger and blogspot also use google voice captchas, so these are broken to.
Working on voice captchas
- Google.com A slightly better segmentation and better database gives me 60%+ recognition rate, but this is yet unreleased.
- Paypal audio captcha Relatively well done (best one I found actually), but I expect to release support for this captcha "any time now"
- Captchas.net HEAD works now. I will release this soon, but I am too lazy to generate a training set, so stuck at 33% recognition rate for now.
Not working on voice captchas
- Standards schmmandards proposal The use of generated speech is probably weak; positive point is the use of human parsing ("the three numbers are one, two three"). For now I haven"t seen it actually used.
Do you know different implementations of audio captchas? Please contact me.
[Permalink] -- Filed under: [linux] [science]
Foucault's Pendulum
These weeks, with much pleasure, I have been reading
Umberto Eco's Foucault's Pendulum. The writer truly knows a lot about
history, philosophy, literature, different cultures and is very erudite. So
besides enjoying the good plot, reading Foucault's Pendulum learns me a lot. However,
on one thing the writer is a bit off. In the beginning of the book the main
person tries to break into a computer by writing a (inefficient) computer
program which generates anagrams of 'JHVH'.
Accidentally two weeks before reading this passage I wrote for my DND group a small program which solves a similar question in general. Since this group is too lazy to solve puzzles, I put the program on line; it is called rotx. Perhaps someone can make good use of it. It finds all rotx puzzles (with x = [1..25]) which deliver again a known word.
So, for example layout is the 'encrypted' version of fusion (rot6, so a->h, b->i, c->j, d->i, e->k, f->l, etc.), curly -> wolfs, arena - river, etc.
In dutch some solutions are urnen -> lieve, opaal -> hitte, knijp -> bezag and kerk->gang.To use rotx you need a wordlist, for example as generated by aspell:
aspell --lang=en dump master | ./rotx - > rotated.txt
The output (in the example above copied to rotated.txt) contains all rotated words which can also be found in the original wordlist..
The first incarnation of the program was in bash/sed/tr and awfully slow. (I had to try though, "No premature optimization!"). It should take two weeks to process a 1.5 MB English wordfile. (Eco's Basic script should take what, years??). Enter C++ and STL. The direct approach (rotating all words through the entire alphabet and looking all results up in the original list) should still take around 20 hours. So I cooked up an algorithm which uses more memory, but finishes in approximately 15 seconds on my old and crusty AMD duron 850!
The source can be found at /downloads/rotx.cc.html.
[Permalink] -- Filed under: [linux] [science]

