Monday, 25 January 2016

How to train your Docker using voice recognition

I spent last weekend playing about with some voice recognition tools.  There are lots out there but PocketSphinx seemed pretty cool.  The plan was to get PocketSphinx running in a container and use the voice decoding to start and stop another container using the commands such as "docker start chrome" and "docker stop chrome".  Similar to "Ok Google" and "Siri", I wanted to control containers with the power of speech ... Ok Docker.

The first few hours was spent installing dependencies, tweaking knobs and blowing whistles trying to get the mic to work.  The build steps are all documented in the Dockerfile, which can be found here.  The usage can also be found in the Dockerfile.

There is one python script called which can be used as an entrypoint or run manually inside the container for debugging.

When the okdocker container is run we share the sound device (/dev/snd) into the container.

Demo :

docker run -it --privileged --device /dev/snd -v `pwd`/wav:/opt/okdocker/wav --group-add audio thshaw/okdocker --demo

** If you copy and paste the line above be sure to check it is copied intact **

You will be prompted to record some speech for 3 seconds.  This will then be decoded and the text will be output on screen. The okdocker image has the US English language model included.

The recognition accuracy is quite poor at the moment but the plan is to train the language model to recognise my Northern Irish accent. This may take a number of weeks/years since the human ear, evolved over millions of years, still doesn't understand the Northern Irish accent.

I'll keep updating the source over the next few weekends and hopefully have more accurate decoding which can be pattern matched to actual Docker commands. If anyone wants to expand on this and maybe even present a working prototype at a Docker Meetup or DockerCon 2016 then that would be fantastic.

Sample Output :

docker run -it --privileged --device /dev/snd -v `pwd`/wav:/opt/okdocker/wav --group-add audio thshaw/okdocker --usage

Ok Docker (Version : 0.1)

    Command Line Usage :

        ./ --option <argument>

    Options :

record < .wav filename >
playback < .wav filename >
decode < .wav filename >

        demo (Recording and decoding demo)