Digital Speech Recognition Software for Digital UNIX _______________________________________ Release Notes Order Number: AA-QS49B-TE March 1997 This document summarizes the performance considerations, known problems, and restrictions in the Digital Speech Recognition Software for Digital UNIX Version V1.1A product. Revision/Update Information: This is a revised document. Software Version: Digital Speech Recognition Software Version V1.1A Digital Equipment Corporation Nashua, New Hampshire __________________________________________________________ March 1997 The information in this document is subject to change without notice and should not be construed as a commitment by Digital Equipment Corporation. Digital Equipment Corporation assumes no responsibility for any errors that may appear in this document. The software described in this document is furnished under a license and may be used or copied only in accordance with the terms of such license. No responsibility is assumed for the use or reliability of software on equipment that is not supplied by Digital Equipment Corporation or its affiliated companies. Restricted Rights: Use, duplication, or disclosure by the U.S. Government is subject to restrictions as set forth in subparagraph (c)(1)(ii) of the Rights in Technical Data and Computer Software clause at DFARS 252.227-7013. (c) Digital Equipment Corporation 1996. All Rights Reserved. Printed in U.S.A. The following are trademarks of Digital Equipment Corporation: DIGITAL logo. Motif is a registered trademark of Open Software Foundation, Inc. Netscape Communications, Netscape, Netscape Navigator and the Netscape Communications logo are trademarks of Netscape Communications Corporation. Telex Voice Commander is a trademark of Telex Communications, Inc. Dragon Speech is a registered trademark of Dragon Systems, Inc. All other trademarks and registerd trademarks are the property of their respective holders. 3 ___________________________________________________________________ Contents Preface ................................................. 4 1 Installation Prerequisite Software ............................... 5 Prerequisite Hardware .............................. Installation Procedure .............................. 6 2 Features Features ........................................... 7 3 Restrictions Restrictions ..................................... 9 Known Problems ..................................... 10 Where to Send Problem Reports and Suggestions ....... 4 ________________________________________________________________ Preface This document contains the release notes for the Digital Speech Recognition Software (DSRS) for Digital UNIX Version V1.1A product. Digital Speech Recognition Software is a speech recognizer program used to interact with various applications under X Windows Motif and CDE on Digital UNIX computers. Purpose of This Guide This document describes the features, performance considerations, known problems, and limitations of the Digital Speech Recognition Software Version V1.1A product. Who Should Use This Guide This document is for all users who want to use speech as an additional input method to control applications in Digital UNIX. For More Information In addition to these release notes, the Digital Speech Recognition Software documentation set contains the following: o Digital Speech Recognition Software Users Guide Order Number: AA-QS47B-TE Other related documents include: o Multimedia Services for Digital UNIX Installation Guide o DSRS Users guide in HTML format that can be accessed using the following procedure. With Netscape 1. Select File from the menu bar. 2. Select Open File... from the File menu. 3. Enter in the box labeled Selection /usr/opt/DSRS110/html_docs/dsrsUsersGuide.html 4. Select OK in the dialog box. 5 ________________________________________________________________ Installation Prerequisite Software: Digital Speech Recognition Software depends upon the following versions of the component software: Digital UNIX V3.2 or later Multimedia Services for Digital UNIX V1.5 or later. NOTE: If Digital UNIX V4.0 is installed, then the software dependency for Multimedia Services for Digital UNIX is V2.0 or later. Prerequisite Hardware: Telex Nomad or Voice Commander microphone - Sound and Motion J300 (preferred) or Base Board Audio for DEC3000 series workstations. - Alpha AXP Microsoft Sound board compatible for AlphaStation series workstations. The Plantronics microphone supplied with many of the DEC3000 series workstations is not supported, although for some users it may provide acceptable performance. 6 ________________________________________________________________ Installation Installation Procedure: Before beginning the installation make sure that the prerequisite software is installed and that the Multimedia Services for Digital UNIX server is running. To install the kit from disk, login as root and use setld as follows: # cd # /usr/sbin/setld -l . Answer the questions regarding the installation of the Digital Speech Recognition Software. The installation procedure creates a DSRS110 subdirectory in the /usr/opt directory tree. All files installed by Digital Speech Recognition Software installation procedures can be found in this subdirectory. The installation procedure creates softlinks from these files to /usr/shlib, /usr/lib/dsrs, /usr/include/X11/bitmaps/dsrs and /usr/bin/X11 directories. The Common Desktop Environment (CDE) subset will create softlinks into the /etc/dt directory. For more information on other installation procedures, refer to the Digital Speech Recognition Software Users Guide. The command and control application that you just installed is called speechmgr, it may be run by typing the following at the Digital UNIX shell prompt: > speechmgr 7 ________________________________________________________________ Features This section describes the features of Digital Speech Recognition Software Version V1.1A. Features o This version supports the vocabulary for the following X Motif applications: C Programming calculator (dxcalc) calendar (dxcalendar) cardfiler (dxcardfiler) clock (dxclock) emacs decterm - combined with UNIX command line mail (dxmail) Netscape Speech Manager (speechmgr) vi o This version supports the vocabulary for the following CDE based applications: C Programming calculator (dtcalc) calendar (dtcm) editor (dtpad) emacs dtterm - combined with UNIX command line mail (dtmail) file manager (dtfile) Netscape Speech Manager (speechmgr) vi Note that DSRS has predefined vocabulary for the applications listed above. Speech Manager is not intended to be used as a dictation system, although in some instances, such as "C Programming" the commands spoken do enter text. 8 ________________________________________________________________ Features o DSRS supports continuous speech in the following instances: i. Continuous mode launching of applications, "Bring up", "switch to", "go to" ii. Calculator utterances can be of the form: " equals" Here are some examples: "6 4 3 plus 1 2 3 equals" "6 4 0 plus 1 2 3 " "6 4 3 5 6 plus " "6 4 " When entering a number with a decimal point, (e.g. 643.12), you must use the word "decimal" to enter the decimal point: "6 4 3 decimal 1 2" iii. "Next" or "previous" eg. "next decterm" iv. "Include vocabulary for" or "Exclude vocabulary for" followed by "C-Programming". v. In the CDE Workspace, calendar date entry can be entered in a continuous fashion. By first saying the discrete phrase "go to date" you can then enter a continuous date string. Here are some examples: January 1st April 2nd July 4th September 23rd October 31st December 24th 9 ________________________________________________________________ Restrictions Restrictions: 1. You may not manipulate vocabulary for continuous recognition in the vocabulary manager. So, for example, if you add a word or group in vocabulary manager, it is added into the list of words for discrete recognition only. 2. DSRS allocates the wavein device and keeps it open at 11KHz as long as the microphone is on. This has an impact on the availability of the waveout device; it will only be available for use at 11KHz. If you are running other applications that use the same sound card at a different sample rate then DSRS will not be able to access the wavein device at 11KHz and will display an error status. 3. The user should exercise some caution in preserving the integrity of the files which contain the vocabulary data (*.voc) and acoustic data (*.usr). When using on-line training, the user should avoid corrupting the acoustic data with extraordinary noise. Allowing other users to train using your acoustic files will also affect the quality of your acoustic models and should be avoided. 10 _________________________________________________________________ Known Problems Known Problems: 1. This version of the Digital Speech Recognition Software has problems on early baselevel releases of Digital UNIX V4.0. It might work, but you also might need to fix an XTrap problem that is caused by the following line in the file: /var/X11/Xserver.conf < dbe libdbe.so DbeExtensionInit DOUBLE-BUFFER > Comment out the line with an ! and then reboot your machine. i.e.: !< dbe libdbe.so DbeExtensionInit DOUBLE-BUFFER > This should be fixed by the Digital UNIX BL11 subset. 2. This version of the Digital Speech Recognition Software has problems on the Digital UNIX V4.0 release. The problem is related to a multi-threaded program issuing a command to start another application. The symptoms are that the other application is never started and a core file gets created. This problem has been fixed by the UNIX team by applying the OSF400-037 Patch. 3. Immediate training using the utterance which has just been recognized and displayed in the history window is intended as a convenience. For a more robust training environment use the training tools in the Speech Manager or Vocabulary Manager. Immediate training has the following features: a) Only the current utterance is used for immediate training. Immediate training currently works only from discrete recognition results. b) Use mouse button 3 to display a choice list for the current utterance. To make a selection in the choice list, drag mouse button 3 to the desired entry and release. Since training is immediately invoked, the user should exercise care in doing this. c) If the recognized result is correct, select the top choice in the choice list to perform training. d) If the recognized result is incorrect, search the choice list to see if the correct result is present. i) If the correct result is present, select the correct entry using mouse button 3. ii) If the correct result is not present, immediate training is not possible. Make no selection in the choice list. Otherwise, training using incorrect data will corrupt the user's acoustic models. e) In the GUI, the check/mark indicator for trained/untrained entries is not immediately updated. Where to Send Problem Reports and Suggestions: PTT's can be reported via e-mail at axp_mme@wsqar.enet.dec.com