Thursday, August 31, 2006

Phoneme object

please help me define the software properties of the phoneme object.

the object should be designed to hold the audio data as relational tables, within the object properties. The relational tables should use SRGS for markup/storage, and preprocessing of highly weighted grammar tokens should be included within the relational tables.

a standardized diagonally congruent search vector should be an object property.

it may seem like a weighty way of storing this data, but without an indexing system i just don't see how we can parse for polyphones and high noise environments.

anyway it seems like it would be nice to use SQL and datamine the dictionary...

Wednesday, August 30, 2006

Corpora dictionary data storage algorithm

the binary format is difficult to search.

it seems that each phoneme should be considered a software object. The object should contain properties reflecting the cepstral acoustic data in vector format, using a standardized identity matrix .


the phoneme objects should be related to database tables containing the weighted transitions for the phoneme. Lookup will not require full dictionary parsing, merely the datasets related to the object. The database should contruct relational tables for phonemes when new entries are added.

Friday, August 25, 2006

Hub Needs Speech Recognition Grammar Specification (SRGS) recognizer

The Speech Recognition Grammar Specification is a w3c standard for XML speech recognition data markup. It is developing along with VoiceXML and voice browser projects for the web.

The current version of Sphinx uses the more convenient Java Speech Grammar Format (JSGF) provided by Sun microsystems.

JSGF simply doesn't cover data retention and lookup. It is meant for recognizers, not for dictionaries. SRGS has a good deal of markup allowing several methods of lookups. Also, due to the nature of XML, standardized corpora can be developed, with a data structure that is shared among the data sources. This will facilitate large and robust community development.

So, if you will, please, consider for a moment the needs of algorithm development. Sun has left the data structuring and lookup to future developers (me). Sphinx wants to ignore the need to use a more complex system, instead using binary ARPA format for data storage. All admit that this doesn't produce a dynamic quick recognition result.

i admit it's not easy but then there are lots of HTML jobs out there if you don't like the rough stuff...