Uncertain Substance

Uncertain substance: The Viterbi Algorithm
A speech recognition algorithm searches radio waves for conversations about money. As an ongoing investigation of the Viterbi algorithm, this project seeks to understand the agency of a mathematical entity that operates as structural thread within the fabric of contemporary society. (continues below)

About
Conceived in 1966 the Viterbi was originally used for digital signal processing where it detects and corrects errors in digital codes. Its use has subsequently extended through the technologies of speech recognition, DNA analysis, video encryption, deep space, and wireless communications systems. Physical manifestations of this algorithm exists as microchips installed in billions of mobile devices worldwide, enabling communications networks to permeate every conceivable space, blurring distinction between home, work and social environments.

Used to identify patterns and trends of human behaviour, the Viterbi plays a role in automated systems that interpret, record and report on human activity. These systems increasingly make economic decisions, govern response to crime, disaster, health and manage the everyday flow of cities. The Viterbi operates at a deep social level as it constructs new sets of social relations and radically shapes the development of our cities.

Installation Description
I tested two versions of the system, one as an installation in an old porters office in Goldsmiths University, the other as a mobile version built into a shopping trolley which I tested at Moving Forrest at Chelsea College of Art. The porters office version displayed two very dull looking computers one of which was a speech recognition server (SRS) built around the open source project CMUsphinx, and the other was a software defined radio server (SDRS) which was built around a hacked £10 USB TV tuner. The SRS listened to the audio output of the SDRS and if it detected speech then it would stay on that radio station in the hope that it would find a keyword from a list (Money, Credit, Debt, Thousand, Billion, Trillion etc), if it didn't find any words within 20 seconds, then it would trigger the SDRS to find another station where it would begin the process again.

The porters office added its own narrative which I discovered while cleaning it out and getting rid of years of grime and dumped objects - it recorded a pretty depressing history - there were old letters of redundancy, a broken pair of spectacles, betting slips, a small screen marked "payroll". I incorporated these elements in the space as a subtle way of illustrating the entanglement of algorithms into everyday lives and other media systems, where algorithmic reporting and profiling informs and influences our decision making processes, event though these outputs haven't necessarily been planned or programmed, the technology is then exerting its own power and its that mechanism that I want to understand.
[Further description of the project can be found in an interview I did with Regine Debatty of We Make Money Not Art.

The Build
Radio Server, Speech recognition server, Shopping trolley, CCTV Observation screen, Receipt printer, Speaker, Antenna, Notes, Betting slips, Spectacles. For the speech recognition server I utilised the FLOSS project CMUSphinx and for the radio tuning I created a software defined radio using a cheap £10 USB TV tuner which I hacked to create a simple software defined radio.

Code

Uses GQRX (C++), CMU SPhinx (C with a Python wrapper) and Python servers to communicate between components situated on multiple machines. The install process is not for the faint hearted! Follow instructions of reach of the software packages then use the scripts below to connect them all together.

Startup script

cd /script/root/dir
nohup gqrx-build-desktop-Desktop_Qt_4_8_1_for_GCC__Qt_SDK__Release/gqrx &
sleep 5
nohup python espeakserver.py &
nohup python keyserver.py &
python voice.py

Python voice recognition code using CMU Sphinx

#!/usr/bin/env python
# Tom Keene
# Script evolved from: Carnegie Mellon University.
# You may modify and redistribute this file under the same terms as
# the CMU Sphinx system.  See
# http://cmusphinx.sourceforge.net/html/LICENSE for more information.
# =======TODO=========
# - Check / set current audio model.
# - Create audio model.
# - Auto soundcard swap.
# - Keep limited text
# ====================
import threading
import re
import time
import socket
import sys
import errno
import pygtk
pygtk.require('2.0')
import gtk
import gobject
import pygst
pygst.require('0.10')
gobject.threads_init()
import gst
 
class DemoApp(object):
 
    """GStreamer/PocketSphinx Demo Application"""
    def __init__(self):
        """Initialize a DemoApp object"""
        self.init_gui()
        self.init_gst()
        self.init_keywords()
        self.init_timer()
        self.init_client()
 
    def init_client(self):
        """KeyserverToChangeFrequency"""
        self.TCP_IP = '127.0.0.1' 
        self.TCP_PORT = 50000
        """Found Money Server"""
        self.TCP_IP2 = '127.0.0.1'
        self.TCP_PORT2 = 50001
        """Shared Vars"""
        self.BUFFER_SIZE = 1024
        self.MESSAGE="Change Frequency"
 
    def client_connection(self): 
        """KeyserverToChangeFrequency"""
        s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        s.connect((self.TCP_IP, self.TCP_PORT))
        s.send(self.MESSAGE)
        data = s.recv(self.BUFFER_SIZE)
        s.close()
        # print "Received:", data

    def init_timer(self):
        self.starttimer()
        self.myshed = threading.Timer(5.0, self.checkbordem)  
        self.myshed.start()
 
    def checkbordem(self):
        mytimer = self.checktimer()
        if(mytimer>=15):
            self.starttimer()
            self.client_connection()
            print "Don't understand || Bad recieption" 
            print "Changing Frequency" 
        self.myshed = threading.Timer(2.0, self.checkbordem).start()
 
    def checktimer(self):
        return time.time()-self.timer
 
    def starttimer(self):
        self.timer = time.time()
 
    def init_keywords(self):
        """Load External File With Keyword List"""
        keywords = open("keywords.txt").read()
        keywords = keywords.replace("\n", '')
        keywords = keywords.replace(' ', '')
        keywords = keywords.upper()
        keywords = keywords.split(',')
        print "KEYWORDS:"
        print keywords
        self.keywords = keywords
 
    def init_gui(self):
        """Initialize the GUI components"""
        # Setup the window
        self.window = gtk.Window()
        self.screen = self.window.get_screen()
        w = self.screen.get_width();
	h = self.screen.get_height()/3;
        self.window.connect("delete-event", gtk.main_quit)
        self.window.set_default_size(w, h)
        self.window.set_usize(w, h) # make window fixed size
        self.window.set_position(gtk.WIN_POS_CENTER)
        self.window.set_border_width(3)
        self.window.set_keep_above(0)
        self.window.set_title("<!-----Searching Conversation-----!>")
        self.window.move(0,0) 
        vbox = gtk.VBox()  
 
        # Manage the textarea
        self.textbuf = gtk.TextBuffer()
        self.text = gtk.TextView(self.textbuf)
        self.text.set_wrap_mode(gtk.WRAP_WORD)
        self.text.set_justification(gtk.JUSTIFY_CENTER)
        vbox.pack_start(self.text)
 
        # Setup the button
        #self.button = gtk.ToggleButton("Report")
        #self.button.connect('clicked', self.button_clicked)
        #vbox.pack_start(self.button, False, False, 2) # refernce expand, fill, padding

        self.window.add(vbox)
        self.window.show_all()
 
    def init_gst(self):
        """Initialize the speech components"""
        # Set audio source to gconfaudiosrc OR  alsasrc OR pulseaudiosrc OR jacksrc
        self.pipeline = gst.parse_launch('alsasrc ! audioconvert ! audioresample '
                                         + '! vader name=vad auto-threshold=true '
                                         + '! pocketsphinx name=asr ! fakesink')
 
        asr = self.pipeline.get_by_name('asr')
        asr.connect('partial_result', self.asr_partial_result)
        asr.connect('result', self.asr_result)
        asr.set_property('configured', True)
 
        bus = self.pipeline.get_bus()
        bus.add_signal_watch()
        bus.connect('message::application', self.application_message)
 
        #self.pipeline.set_state(gst.STATE_PAUSED)
        self.pipeline.set_state(gst.STATE_PLAYING)
 
    def asr_partial_result(self, asr, text, uttid):
        """Forward partial result signals on the bus to the main thread."""
        struct = gst.Structure('partial_result')
        struct.set_value('hyp', text)
        struct.set_value('uttid', uttid)
        asr.post_message(gst.message_new_application(asr, struct))
 
    def asr_result(self, asr, text, uttid):
        """Forward result signals on the bus to the main thread."""
        struct = gst.Structure('result')
        struct.set_value('hyp', text)
        struct.set_value('uttid', uttid)
        asr.post_message(gst.message_new_application(asr, struct))
 
    def application_message(self, bus, msg):
        """Receive application messages from the bus."""
 
        msgtype = msg.structure.get_name()
        self.partial = 0;
        if msgtype == 'partial_result':
            self.partial_result(msg.structure['hyp'], msg.structure['uttid'])
            if(self.partial==0): 
              #print "Viterbi: Defining most probable sequence"
              self.partial = 1
 
        elif msgtype == 'result':
            # Print complete message to text box
            hyp = msg.structure['hyp']
            self.final_result(hyp, msg.structure['uttid'])
            self.partial = 0
            searchtext = hyp
            nums = len(hyp.split(" "))
            if(nums>=3): 
               print "Interesting conversation: "+str(nums)+" words" 
               print "Continue search on this frequency"
               self.starttimer() 
 
            # Perform keyword search
            for item in self.keywords:
                if searchtext.find(item) > -1:
                    print "!!!!Matched Keyword:"+item
                    self.starttimer()
                    """Found Money Server"""
                    s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
                    s.connect((self.TCP_IP2, self.TCP_PORT2))
                    s.send("Found - "+item)
                    data = s.recv(self.BUFFER_SIZE)
                    s.close()
 
            # Create a new paragraph /  ivider
            self.textbuf.insert_at_cursor(" | ")               
            if(self.textbuf.get_char_count()>1800):
                self.textbuf.set_text("TEXT BUFFER: ")
 
    def partial_result(self, hyp, uttid):
        """Delete any previous selection, insert text and select it."""
        # All this stuff appears as one single action
        self.textbuf.begin_user_action()
        self.textbuf.delete_selection(True, self.text.get_editable())
        self.textbuf.insert_at_cursor(hyp)
        ins = self.textbuf.get_insert()
        iter = self.textbuf.get_iter_at_mark(ins)
        iter.backward_chars(len(hyp))
        self.textbuf.move_mark(ins, iter)
        self.textbuf.end_user_action()
        nums = len(hyp.split(" "))
        if(nums>=5):
            self.starttimer()
 
    def final_result(self, hyp, uttid):
        """Insert the final result."""
        # All this stuff appears as one single action
        self.textbuf.begin_user_action()
        self.textbuf.delete_selection(True, self.text.get_editable())
        #self.textbuf.delete()
        print " "
        print "Viterbi matched most likely text:"
        print hyp
        print " "
        self.textbuf.insert_at_cursor(hyp)
        self.textbuf.end_user_action()
 
    #def button_clicked(self, button):
    #    """Handle button presses."""
    #    if button.get_active():
    #        button.set_label("Report")
    #        #self.pipeline.set_state(gst.STATE_PLAYING)
    #    else:
    #        button.set_label("Report:2")
    #       # self.pipeline.set_state(gst.STATE_PAUSED)
    #        #vader = self.pipeline.get_by_name('vad')
    #        #vader.set_property('silent', True)

app = DemoApp()
gtk.main()

List of keywords to search for in Keywords.txt

account,add,asset,bank, balance,billion,borrow,broke,buy,cash,cheque,check,cheap,cleared,coin,
commission,consume,contract,credit,debt,dollar,dosh,eight,eleven,fifty,fifth,five,four,funds,
hundred,invest,market,minus,million,money,note,nine,one,owed,plus,pound,rate,
record,rate,share,stock,six,seven,secure,sale,shop,tax,term,two,three,dollar,
trillion,ten,twelve,thirteen,fifteen,twenty,thirty,thousand

Quick hack to get GQRX to change radio channels by using key commands

#!/usr/bin/env python
from socket import *     
import os
 
# Grab name of GQRX window
p = os.popen("xwininfo -root -all | grep ezcap |  awk '{print $1}'")
WINDOWREF = p.readline()
WINDOWREF = WINDOWREF.replace("\n", '')
p.close()
wincommand = 'xvkbd -window '+str(WINDOWREF)+' -text "f"'
if(WINDOWREF==''):
   print "No windo0w refernece"
   wincommand = 'xvkbd -text "No available window"'
 
 
 
 
##let's set up some constants
HOST = ''    #we are the host
PORT = 50000    #arbitrary port not currently in use
ADDR = (HOST,PORT)    #we need a tuple for the address
BUFSIZE = 4096    #reasonably sized buffer for data
 
# If the port is already open then kill the process
#while True:
command = 'kill -9 $( lsof -i:'+str(PORT)+' -t )'
os.system(command)
 
## now we create a new socket object (serv)
## see the python docs for more information on the socket types/flags
serv = socket( AF_INET,SOCK_STREAM)    
serv.setsockopt(SOL_SOCKET, SO_REUSEADDR, 1)
 
##bind our socket to the address
serv.bind((ADDR))    #the double parens are to create a tuple with one element
serv.listen(5)    #5 is the maximum number of queued connections we'll allow

print 'listening...'
 
while True:
   conn,addr = serv.accept() #accept the connection
   print '...connected!'
   os.system(wincommand)
   print "COMMAND:"+wincommand
   conn.send('tuning')
 
conn.close()

Python server to receive "Found Money" Notifications

from socket import *
from datetime import datetime
import os
 
##let's set up some constants
HOST = ''    #we are the host
PORT = 50001    #arbitrary port not currently in use
ADDR = (HOST,PORT)    #we need a tuple for the address
BUFSIZE = 4096    #reasonably sized buffer for data

# If the port is already open then kill the process
#while True:
command = 'kill -9 $( lsof -i:'+str(PORT)+' -t )'
os.system(command)
 
## now we create a new socket object (serv)
## see the python docs for more information on the socket types/flags
serv = socket( AF_INET,SOCK_STREAM)
 
##bind our socket to the address
serv.bind((ADDR))    #the double parens are to create a tuple with one element
serv.listen(5)    #5 is the maximum number of queued connections we'll allow

serv = socket( AF_INET,SOCK_STREAM)
 
##bind our socket to the address
serv.bind((ADDR))    #the double parens are to create a tuple with one element
serv.listen(5)    #5 is the maximum number of queued connections we'll allow
print 'listening...'
 
o = 1
while(o):
   conn,addr = serv.accept() #accept the connection
   data = conn.recv(1024) # receive up to 1K bytes
   mytime = str(datetime.now())
   print "          Found Money: "+mytime
   os.system('espeak "'+data+'" ')
   conn.send('Got message')
 
conn.close()

Exhibited / Performed
6th -8th July 2012: MA Interactive media Exposition. Installed Goldsmith University in the Janitors office.
Mobile version performed at Moving Forest 2012 (see last image in series below).