DECtalk Software for Digital UNIX Programmer's Guide March 1996 This manual provides information on installation, overview, getting started and step-by-step procedures for the DECtalk Software Runtime kit for the Digital UNIX product. Revision/Update Information: This is a revised manual Operating System: Digital UNIX 3.0, later Software Product Version: 4.2A Digital Equipment Corporation Maynard, Massachusetts --------------------------------------------------------------------------- Preface: About this Guide This guide contains instructions for the installation of the DECtalk Software product. It also contains the tutorial and reference information you need to build a DECtalk Software application. --------------------------------------------------------------------------- What's the Difference Between the DECtalk Software Runtime Kit and the DECtalk Software Development Kit? DECtalk Software is packaged as a Runtime kit and a Development kit. The Runtime kit gives you access to the following DECtalk Software applications: mailtalk, say, speak, emacspeak, and DECface. In order to develop your own DECtalk Software applications, you must order the DECtalk Software Developer's kit. DECtalk Software Developer's kit gives you access to the DECtalk Software API and some sample C programs.p> --------------------------------------------------------------------------- License Requirements You can run one copy of any DECtalk Software application at a time without needing an LMF license. A license is required to run more than one copy of the Runtime kit or to use the DECtalk Software Development kit. See the section on LMF Licensing in Chapter 1 for more information. --------------------------------------------------------------------------- Features in DECtalk Software 4.2A The following is a list of important features in DECtalk V4.2A: * Expanded main dictionary * Added user-dictionary suffix processing to help locate words in user dictionary * Expanded pronunciation rules and improved pronunciation * Homograph processing * Improved inline index-mark processing * Added the following inline commands: * Play command to play audio files in line with text * Tone command to generate tones * Dial command to generate DTMF tones used to dial telephone numbers * Stereo volume control commands * A new version of the mailtalk program that is fully integrated with mail * An enhanced Motif windows-based user dictionary builder that automatically translates text strings into phonemes * An improved command-line program, say, used to run DECtalk from the Digital UNIX command line Improved computational efficiency * Two new sample applications o DECface o Emacspeak o Support for CDE desktop environment --------------------------------------------------------------------------- Purpose and Audience This guide is for the application programmer who wants to design and build text-to-speech applications with DECtalk Software. This guide contains instructions for installing DECtalk Software development subset. The installation procedure installs all files in subdirectories under the following directory with links to the system directory hierarchy: /usr/opt/DTKDEV420 --------------------------------------------------------------------------- Structure This guide is designed to provide you with quick and easy access to all information. You can easily learn about new topics and perform specific tasks related to running DECtalk Software application programs for the Digital UNIX operating system. This guide's organization is listed below: --------------------------------------------------------------------------- Chapter Description --------------------------------------------------------------------------- Chapter 1 Installing DECtalk Software Chapter 2 Introduction to DECtalk Software API Chapter 3 Using DECtalk Software Sample Programs Chapter 4 Creating a Customized DECtalk Software Voice Chapter 5 DECtalk Software API Functions --------------------------------------------------------------------------- On-line Help DECtalk Software on-line help is accessible in two forms: * Manpages --Invoke manpage help from the UNIX command line with the %man speak command * HTML Hypertext -- Start Netscape hypertext help by launching Netscape and loading the DtkDevGuide.html file. --------------------------------------------------------------------------- Conventions This guide uses the following conventions: Convention Explanation enter Enter means type the required information and press the Return key. mouse Mouse refers to any pointing device, such as a mouse, a puck, or a stylus. MB1 MB1 indicates the left mouse button click on Click on means to press and release MB1. double click Double click means to press and release MB1 twice in rapid succession without moving the mouse. drag The phrase drag means to press and hold MB1, move the mouse, and then release MB1 when the pointer is in the desired position. Ctrl/x A sequence such as Ctrl / x indicates that you must press the Ctrl key while you press another key. Menu Command The right arrow key indicates an abbreviated instruction for choosing a command from a menu. For example, File Exit means pull down the File menu, move the pointer to the Exit command, and release MB1. Courier type Courier type indicates text that you type and is displayed on the screen. This is most often used for program code examples. User Input Boldface type in interactive examples indicates information you enter from the keyboard. For example: % ls speak " xxx" Indicates a word, words, or phrases you must speak. Unless otherwise noted, press Return after entering commands or responses to command prompts. --------------------------------------------------------------------------- Chapter 1: Installing DECtalk Software This chapter covers the preinstallation, installation and post installation tasks required to install DECtalk Software on your system. Topics include: * Installing DECtalk Software o Preinstallation Tasks + Accessing the Release Notes + Registering Your Software Licenses + Checking the Software Distribution Kit o Installation Procedure Requirements + Hardware Requirements + Software Requirements + Checking Current Disk Space + Increasing Disk Space by Using Alternative Disks + Installation Tasks + Using the CD-ROM Consolidated Distribution Media + Using an RIS Distribution Area + Starting the Installation Procedure + Selecting Subsets + Stopping the Installation o Post-Installation Tasks + Running the Installation Verification Procedure + Deleting DECtalk Software from Your System + Displaying Documentation from the CD-ROM Distribution Disc + Correcting Problems During Product Installation + Reporting Problems --------------------------------------------------------------------------- Preinstallation Tasks This section covers the tasks you must perform before installing DECtalk Software. Topics include: * Accessing the release notes (see Accessing the Release Notes, page 12) * Checking installation procedure requirements (see Installation Procedure Requirements, page 18) * Hardware requirements (see Hardware Requirements, page 18) * Checking current disk space (see Checking Current Disk Space, page 19) --------------------------------------------------------------------------- Accessing the Release Notes DECtalk Software provides release notes. The release notes contain information about changes to DECtalk Software for Digital UNIX. Digital strongly recommends that you read these release notes before using the product. See the Compact Disc User's Guide shipped with your media for instructions about how to access the release notes prior to the software installation. The release notes for DECtalk Software are in the following files after the DTKDEVRELNOTES420 subset is installed: /usr/opt/DTKDEV420/docs/ascii/release_notes_dev.txt /usr/opt/DTKDEV420/docs/postscript/release_notes_dev.ps Use the following command to read the release notes for DECtalk Software after the DTKDEVRELNOT420 subset is installed: # more /usr/opt/DTKDEV420/docs/ascii/release_notes_dev.txt You can also print either file. --------------------------------------------------------------------------- Registering Your Software Licenses DECtalk Software includes support for the License Management Facility (LMF). You must register your license product authorization keys (License PAKs) in the license database (LDB) in order to use DECtalk Software on a newly licensed system. The License PAKs is shipped with the kit if you ordered the licenses and media together; otherwise, they are shipped separately to a location specified on your license order. Note You must have the root privileges to install the DECtalk Software and to register the license PAK. If you are installing DECtalk Software as an update on a node already licensed for this software, you have already completed the License PAK registration requirements. To register a license under the Digital UNIX operating system: Log in as root. At the superuser prompt, edit an empty PAK template with the lmf register) command as follows, and include all the information on your License PAK: # lmf register LMF displays a blank template and invokes an editor to allow you to edit the template. LMF invokes the editor that is defined by your EDITOR environment variable. If the environment variable is undefined, LMF invokes the vi editor. You must enter the license information from the PAK accurately. When you finish entering the license data, exit from the editor. If the license data is correct, LMF copies it into the license Database. If the license data is incorrect, you may reenter the editor and correct mistakes. Alternatively, you can create a command script enclosing the license information (the license information is in the cover letter with this kit) found between lmf register - << ENDLMF and ENDLMF Execute this script as root. After you register your license, use the following commands to copy the license details from the license database (LDB) to the kernel cache: # lmf load 0 DECTALK-SW For complete information on using the License Management Facility, see the Guide to Software License Management and the lmf reference page. --------------------------------------------------------------------------- Checking the Software Distribution Kit Use the bill of materials (BOM) to check the contents of your DECtalk Software software distribution kit. In addition to this guide, the software distribution kit includes the following: * CD-ROM optical disc for systems with optical disc drives * CD-ROM booklet If your software distribution kit is damaged or incomplete, contact your Digital representative. Directories and files included in the distribution kit are listed in the following screen display: /usr/opt/DTKDEV420/docs/ascii: dtk420_prog_guide.txt filelist_dev.txt dtk420_release_notes_dev.txt /usr/opt/DTKDEV420/docs/html: DtkDevGuideGuide.html dt_u.html dt_11.html dt_22.html dt_33.html dt_44.html dt_55.html dectalkR.gif redball.gif pinkball.gif yellowball.gif /usr/opt/DTKDEV420/docs/man/man3: TextToSpeechAddBuffer.3 TextToSpeechPause.3 TextToSpeechCloseInMemory.3 TextToSpeechReset.3 TextToSpeechCloseLogFile.3 TextToSpeechResume.3 TextToSpeechCloseWaveOutFile.3 TextToSpeechReturnBuffer.3 TextToSpeechGetCaps.3 TextToSpeechSetLanguage.3 TextToSpeechGetLanguage.3 TextToSpeechSetRate.3 TextToSpeechGetRate.3 TextToSpeechSetSpeaker.3 TextToSpeechGetSpeaker.3 TextToSpeechShutdown.3 TextToSpeechGetStatus.3 TextToSpeechSpeak.3 TextToSpeechLoadUserDictionary.3 TextToSpeechStartup.3 TextToSpeechOpenInMemory.3 TextToSpeechSync.3 TextToSpeechOpenLogFile.3 TextToSpeechUnloadUserDictionary.3 TextToSpeechOpenWaveOutFile.3 /usr/opt/DTKDEV420/docs/postscript: dtk420_prog_guide.ps dtk420_release_notes_dev.ps /usr/opt/DTKDEV420/examples/dtk/dtsamples: Imakefile aclock.c mailtalk.c xmsay.c README.txt dtmemory.c say.c xmsay.uil /usr/opt/DTKDEV420/include/dtk: dtmmedefs.h dtmmiodefs.h engphon.h ttsapi.h /usr/opt/DTKDEV420/share/man/man3: TextToSpeechAddBuffer.3dtk TextToSpeechPause.3dtk TextToSpeechCloseInMemory.3dtk TextToSpeechReset.3dtk TextToSpeechCloseLogFile.3dtk TextToSpeechResume.3dtk TextToSpeechCloseWaveOutFile.3dtk TextToSpeechReturnBuffer.3dtk TextToSpeechGetCaps.3dtk TextToSpeechSetLanguage.3dtk TextToSpeechGetLanguage.3dtk TextToSpeechSetRate.3dtk TextToSpeechGetRate.3dtk TextToSpeechSetSpeaker.3dtk TextToSpeechGetSpeaker.3dtk TextToSpeechShutdown.3dtk TextToSpeechGetStatus.3dtk TextToSpeechSpeak.3dtk TextToSpeechLoadUserDictionary.3dtk TextToSpeechStartup.3dtk TextToSpeechOpenInMemory.3dtk TextToSpeechSync.3dtk TextToSpeechOpenLogFile.3dtk TextToSpeechUnloadUserDictionary.3dtk TextToSpeechOpenWaveOutFile.3dtk --------------------------------------------------------------------------- Installation Procedure Requirements This section discusses the requirements for installing DECtalk Software. Installing DECtalk Software takes approximately 5 minutes, depending on your type of media and system configuration. --------------------------------------------------------------------------- Hardware Requirements To install DECtalk Software, you need the following: * distribution device (if installing from media) Locate the drive for the CD-ROM software distribution media. The CD booklet or the documentation for the CD-ROM drive you are using explains how to load the CD-ROM media. * Terminal You can use either a hardcopy or video terminal to communicate with the operating system and respond to the prompts from the installation procedure. See the DECtalk Software for Digital UNIX Software Product Description (SPD) for additional hardware requirements. --------------------------------------------------------------------------- Software Requirements DECtalk Software for Digital UNIX Version 4.2A requires: * The Digital UNIX operating system Version 3.x or 4.0. * The Multimedia Services for Digital UNIX Version 2.x. * The Realtime extension. * DECtalk Software Runtime subset V4.2A. --------------------------------------------------------------------------- Checking Current Disk Space To check the current amount of free space for a directory path, log in to the system where you will install DECtalk Software. You can check which directories are mounted and their locations by viewing the /etc/fstab file. For example: # more /etc/fstab /dev/rz3a / ufs rw 1 1 /dev/rz3g /usr ufs rw 1 2 /dev/rz3b swap1 ufs sw 0 2 The display indicates that /usr mounted on /dev/rz3g is the only mount point that affects where DECtalk Software files will reside; the system has only one local disk drive, and the /usr/opt file system resides in the g partition of the disk on that drive. To check the total space and the free space for the directories where DECtalk Software will reside, enter the df command. Given the previous display of the /etc/fstab) file, which shows that only /usr is a mount point, you need to check free space only in the /usr file system. For example: # df /usr Filesystem 512-blocks Used Avail Capacity Mounted on /dev/rz3a 79608 45648 25998 64% / /dev/rz3g 1482190 921846 412124 69% /usr This display shows that there are 412124 kbytes free. This free space must accommodate the subsets that you opt to install. If you choose to install all the subsets in the DECtalk Software Development kit you will need approximately 2 Mbytes of free disk space. --------------------------------------------------------------------------- Increasing Disk Space by Using Alternative Disks The DECtalk Software installation procedure creates and loads files into the sub directory: /usr/opt/DTKDEV420 If any of the previously listed directories already exists, the installation procedure uses it. If you find that there is insufficient disk space for the DECtalk Software subsets and you know that you have additional space on alternative disks or disk partitions for your system, perform the following steps before installing DECtalk Software: 1. Log in as root 2. Create the directory /usr/opt/DTKDEV420 3. Specify in the /etc/fstab file that one or more of the newly created directories are mount points to new disk partitions where there is additional space. 4. Enter the mount -a command so that the new mount points take effect. --------------------------------------------------------------------------- Installation Tasks This section covers the tasks you must perform to install DECtalk Software. Topics include: * Using the CD-ROM consolidated distribution media (see Using the CD-ROM Consolidated Distribution Media, page 21) * Responding to installation procedure prompts (see Starting the Installation Procedure, page 22) * Selecting subsets (see Selecting Subsets, page 23) * Using a RIS distribution area (see Using an RIS Distribution Area, page 21) * Starting the installation procedure (see Starting the Installation Procedure, page 22) * Stopping the installation (see Stopping the Installation, page 29) --------------------------------------------------------------------------- Using the CD-ROM Consolidated Distribution Media The following procedure loads DECtalk Software files onto a disk belonging to the system where you perform the installation. When DECtalk Software is run, its executable images are mapped into memory on your system. To install DECtalk Software from CD-ROM media: 1. Mount the media on the appropriate disk drive. 2. Log in as superuser login name root to the system where you will install DECtalk Software. 3. Make sure that you are at the root (/) directory by entering the following command: # cd / 4. Specify the /cdrom directory to be the mount point for the distribution file system on the drive. If your drive is rz4c, enter the following command: # mount -dr /dev/rz4c /cdrom 5. Enter a setld) command that requests the load function -l and identifies the directory in the mounted file system where DECtalk Software subsets are located. For example, if the directory location for these subsets is /cdrom/DTK420/kit, enter the following command: # /usr/sbin/setld -l /cdrom/DTK420/kit 6. The installation procedure now displays the names of DECtalk Software subsets and asks you to specify the subsets you want to load. See Starting the Installation Procedure, page 22 to continue the installation. --------------------------------------------------------------------------- Using an RIS Distribution Area If you are installing DECtalk Software subsets that reside in an /etc/ris RIS distribution area on a remote system, take the following steps: 1. Log in as superuser login name root to the system where you will install DECtalk Software. 2. Make sure that you are at the root directory (/) by entering the following command: # cd / 3. Enter a setld command that requests the load function (-l) option and identifies the system where the DECtalk Software subsets are located. For example, if you are loading DECtalk Software subsets from a RIS distribution area on node axpmme, enter the following: /usr/bin/setld -l axpmme 4. RIS now displays a menu that lists all the software subsets available to you and asks you to specify the subsets you want to load. See Starting the Installation Procedure on page 22 to continue the installation. --------------------------------------------------------------------------- Starting the Installation Procedure Before starting the installation procedure, 1. log in as a superuser and verify that you are at the root directory. Check to see if there are any previously installed DECtalk Software subsets by entering the following command: % su root # cd / # /usr/sbin/setld -i | grep DTKDEV 2. Deinstall any installed subsets with the prefix DTKDEV by entering the following command: # cd / # /usr/sbin/setld -d (name of subset) 3. To start the installation procedure, enter the following command: # /usr/sbin/setld -l /dev/rmt0h 4. Then, respond to the installation procedure prompts as described in Selecting Subsets on page 23. --------------------------------------------------------------------------- Selecting Subsets The following section presents a complete installation procedure, including all messages that are displayed on your screen during the installation. You must specify which DECtalk Software subsets you want to load. If you specify more than one number at the prompt, separate each number with a space, not a comma. # setld -l . The subsets listed below are optional: There may be more optional subsets than can be presented on a single screen. If this is the case, you can choose subsets screen by screen or all at once on the last screen. All of the choices you make will be collected for your confirmation before any subsets are installed. 1) DECtalk Software V4.2A for Digital UNIX Development Documentation. 2) DECtalk Software V4.2A for Digital UNIX Development Kit. 3) DECtalk Software V4.2A for Digital UNIX Release Notes. 4) DECtalk Software V4.2A for Digital UNIX Sample Programs. Or you may choose one of the following options: 5) ALL of the above 6) CANCEL selections and redisplay menus 7) EXIT without installing any subsets Enter your choices or press RETURN to redisplay menus. Choices (for example, 1 2 4-6): 5 Next, the script lets you verify your choice. For example, if you enter 7 in response to the previous prompt, you will see the following display: You are installing the following optional subsets: DECtalk Software V4.2A for Digital UNIX Development Documentation. DECtalk Software V4.2A for Digital UNIX Development Kit. DECtalk Software V4.2A for Digital UNIX Release Notes. DECtalk Software V4.2A for Digital UNIX Sample Programs. Is this correct? (y/n): y If the displayed subsets are not the ones you intended to choose, enter n. In this case, the subset selection menu is displayed again and you can correct your choice of optional subsets. If the displayed subsets are the ones you want to load, enter y. After you respond to this question, the rest of the installation proceedes automatically and all the selected subsets are loaded. A sample of the rest of the installation script is listed below. Checking file system space required to install selected subsets: File system space checked OK. 4 subset(s) will be installed. Loading 1 of 4 subset(s).... DECtalk Software V4.2A for Digital UNIX Development Documentation. Copying from . (disk) Verifying Loading 2 of 4 subset(s).... *********************************************************************** * * * DECtalk Software Application Services V4.2A * * Development Subset * * * * Copyright(c)Digital Equipment Corporation, 1996 All Rights * * Reserved * * * * Unpublished rights reserved under the copyright laws of the United * * States. The software contained on this media is proprietary to * * and embodies the confidential technology of Digital Equipment * * Corporation. Possession, use, duplication or dissemination of the * * software and media is authorized only pursuant to a valid written * * license from Digital Equipment Corporation. * * * * RESTRICTED RIGHTS LEGEND Use, duplication, or disclosure by the * * U.S. Government is subject to restrictions as set forth in * * Subparagraph (c)(1)(ii) of DFARS 252.227-7013, or in FAR 52.227-19, * * or in FAR 52.227-14 Alt. III as applicable. * * * *********************************************************************** DECtalk Software V4.2A for Digital UNIX Development Kit. Copying from . (disk) Verifying Loading 3 of 4 subset(s).... *********************************************************************** * * * DECtalk Software Application Services V4.2A * * Sample Programs Subset * * * * Copyright(c)Digital Equipment Corporation, 1996 All Rights * * Reserved * * * * Unpublished rights reserved under the copyright laws of the United * * States. The software contained on this media is proprietary to * * and embodies the confidential technology of Digital Equipment * * Corporation. Possession, use, duplication or dissemination of the * * software and media is authorized only pursuant to a valid written * * license from Digital Equipment Corporation. * * * * RESTRICTED RIGHTS LEGEND Use, duplication, or disclosure by the * * U.S. Government is subject to restrictions as set forth in * * Subparagraph (c)(1)(ii) of DFARS 252.227-7013, or in FAR 52.227-19, * * or in FAR 52.227-14 Alt. III as applicable. * * * *********************************************************************** DECtalk Software V4.2A for Digital UNIX Sample Programs. Copying from . (disk) Verifying Loading 4 of 4 subset(s).... DECtalk Software V4.2A for Digital UNIX Release Notes. Copying from . (disk) Verifying 4 of 4 subset(s) installed successfully. *********************************************************************** DECtalk Software V4.2A development documentation subset (DTKDEVDOC420) installation completed successfully. This installation puts the DECtalk Software runtime documents in html format in the following directory /usr/opt/DTKDEV420/docs/html. You can use the Netscape™ browser to view the documents. Start by opening the file: /usr/opt/DTKDEV420/docs/html/DtkDevGuideGuide.html *********************************************************************** Configuring "DECtalk Software V4.2A for Digital UNIX Development Documentation." (DTKDEVDOC420) ************************************************************************** DECtalk Software V4.2A development subset (DTKDEV420) installation completed successfully. ************************************************************************** Configuring "DECtalk Software V4.2A for Digital UNIX Development Kit." (DTKDEV420) ************************************************************************** DECtalk Software V4.2A sample program subset (DTKSAMP420) installation completed successfully. This installation puts the sample programs in the following directory: /usr/examples/dtk/dtsamples ************************************************************************** Configuring "DECtalk Software V4.2A for Digital UNIX Sample Programs." (DTKSAMP420) **************************************************************************** DECtalk Software V4.2A development release notes subset (DTKDEVRELNOT420) installation completed successfully. This installation put DECtalk Software development kit release notes in the following directories: /usr/opt/DTKDEV420/docs/ascii and /usr/opt/DTKDEV420/docs/postcript **************************************************************************** Configuring "DECtalk Software V4.2A for Digital UNIX Release Notes." (DTKDEVRELNOT420) --------------------------------------------------------------------------- Stopping the Installation To stop the installation procedure at any time, 1. enter Ctrl/C. Then, interactively delete the files created by the installation up to the point where you stopped the installation. 2. The directories and files created during the DECtalk Software installation are listed in the following file: /usr/opt/DTKDEV420/docs/ascii/filelist.txt If you encounter any failures during installation, see Reporting Problems, page 32. You may interrupt the installation procedure at any point. However, if you do, the installation may not be left in a useful state. Remove all the subsets you installed and reinstall them. --------------------------------------------------------------------------- Post-Installation Tasks This section explains what you need to do after the installation to make DECtalk Software ready for use. Topics include: * Running the installation verification procedure (see Running the Installation Verification Procedure , page 30) * Deleting DECtalk Software from your system (see Deleting DECtalk Software from Your System, page 30) * Displaying documentation from the CD-ROM distribution disk. (see Displaying Documentation from the CD-ROM Distribution Disc, page 31) * Solving problems during product installation (see Correcting Problems During Product Installation , page 31) * Failures during product use. (see Reporting Problems, page 32) --------------------------------------------------------------------------- Running the Installation Verification Procedure You can run the Installation Verification Procedure (IVP) during the installation or you can run the IVP independently after installing DECtalk Software to verify that the software is available on your system. You might also want to run the IVP after a system failure to be sure that users can access DECtalk Software. To run the IVP command: 1. % su root #/usr/sbin/setld -v DTKRT420 2. The DECtalk Software IVP verifies the installation as follows: o A check for a valid LMF license is made. If no license is found, the IVP fails because the software cannot be tested. o DECtalk Software requires that the Multimedia Software for Digital UNIX server mmeserver be up and running. If the mmeserver is not already running then the IVP fails. Start the server and try again. 3. To start the server follow the sequence shown below: % su root # mmeserver& --------------------------------------------------------------------------- Deleting DECtalk Software from Your System If you must remove a version of DECtalk Software from your system, delete each subset that you previously installed. For example to delete a subset, do the following: 1. as superuser login name root, as follows: % su root 2. verify you are at the root directory (/) by entering the following command: # cd / 3. Enter the following form of the setld) commands: # setld -i | grep DTK 4. Look for the word installed in the listing produced, and then 5. delete the installed subsets. For example: # setld -d DTKDEV420 DTKDEVDOC420 --------------------------------------------------------------------------- Displaying Documentation from the CD-ROM Distribution Disc The DECtalk Software documentation is provided on the Digital UNIX Layered Products Online Documentation CD-ROM in Bookreader (.decw_book) file format. You can display the Bookreader files on your workstation using the DECwindows Bookreader application. For information on accessing and displaying these files, see the Digital UNIX Layered Products Disc User's Guide. --------------------------------------------------------------------------- Correcting Problems During Product Installation If errors occur during the installation, the system displays failure messages. For example, if the installation fails due to insufficient disk space, a message similar to the following is displayed: There is not enough space for subset SUBSET_NAME SUBSET_DESCRIPTION (SUBSET_NAME) will not be loaded. where: SUBSET_NAME is the name of the subset SUBSET_DESCRIPTION is the description of the subset For example, "DTKDEVRELNOT420" is a subset name, and "DECtalk Software for Digital UNIX Release Notes V4.2A" is a subset description. Errors can occur during the installation if any of the following conditions exist: * Operating system version is incorrect. * Prerequisite software version is incorrect. * Disk space is insufficient. * System parameter values for successful installation are insufficient. For descriptions of error messages generated by these conditions, see the Digital UNIX documentation on system messages, recovery procedures, and software installation. --------------------------------------------------------------------------- Reporting Problems If an error occurs while DECtalk Software is in use and you believe the error is caused by a problem with the product, take one of the following actions: * If you have a Software Product Services Support Agreement, contact your Customer Support Center (CSC) by telephone or by using the electronic means provided with your support agreement (such as DSNlink). The CSC provides telephone support for high-level advisory and remedial assistance. When you initially contact the CSC, indicate the following: * The name and version number of the operating system you are using * The version number of DECtalk Software you are using * The hardware system you are using (such as a model number) * A brief description of the problem (one sentence, if possible) * How critical the problem is * If you have a Self-Maintenance Software Agreement, you can submit a Software Performance Report (SPR). * If you do not have any type of software services support agreement and you purchased DECtalk Software within the past year, you can submit an SPR if you think the problem is caused by a software error. When you submit an SPR, take the following steps: * Describe as accurately as possible the circumstances and state of the system when the problem occurred. Include the description and version number of the DECtalk Software being used. Explain the problem with specific examples. * Reduce the problem to its elements. * Remember to include listings of any command files, include files, relevant data files, and so forth. * Provide a listing of the program. * If the program is longer than 50 lines, submit a copy of it on machine-readable media (floppy diskette or magnetic tape). If necessary, also submit a copy of the program library used to build the application. For information about submitting media, see the tar(1) reference page. * Report only one problem per SPR. This will facilitate a faster response. * Mail the SPR package to Digital. If the problem is related to DECtalk Software documentation, you can do one of the following: * Report the problem to the CSC (if you have a Software Product Services Support Agreement and the problem is severe). * Fill out the Reader's Comments form (located at the back of the document that contains the error) and send the form to Digital. Be sure to include the action and page number where the error occurs. --------------------------------------------------------------------------- Chapter 2: Introduction to the DECtalk Software API This chapter provides an introduction to the DECtalk Software Text-To-Speech API services and a discussion of programming text-to-speech applications using the API services. Topics include: * DECtalk Software Text-To-Speech Services * Using the Text-To-Speech API --------------------------------------------------------------------------- DECtalk Software Text-To-Speech Services The Text-To-Speech API is a Digital extension to the multimedia API specified by the MME services for the Digital UNIX operating system. The API function set gives you a flexible method of manipulating the various parameters of DECtalk Software functionality from within your application. These functions perform a wide range of tasks associated with the Text-To-Speech system and are listed by functional category in Table 1-1. Table 1-1 -- Functions Listed by Category Function Purpose Core API Functions TextToSpeechStartup() Initializes and starts up text-to-speech system. TextToSpeechSpeak() Speaks text from a buffer. TextToSpeechShutdown() Shuts down text-to-speech system. Audio Output Control Functions TextToSpeechPause() Pauses output. TextToSpeechResume() Resumes output. TextToSpeechReset() text-to-speech System is purged and output stopped. Blocking Synchronization Function TextToSpeechSync() Synchronizes to the text stream. Control and Status Functions TextToSpeechSetSpeaker() Selects one of nine speaking voices. TextToSpeechGetSpeaker() Returns the last speaking voice to have spoken. TextToSpeechSetRate() Sets the speaking rate of the text-to-speech system. TextToSpeechGetRate() Gets the speaking rate of the text-to-speech system. TextToSpeechSetLanguage() Sets the language to be used. TextToSpeechGetLanguage() Returns the language in use. TextToSpeechGetStatus() Gets status of text-to-speech System. TextToSpeechOpenWaveOutFile() Opens a file for output. Text-To SpeechSpeak writes audio data in wave format to this file. TextToSpeechCloseWaveOutFile() Closes the specified wave file. TextToSpeechOpenLogFile() Opens a log File. TextToSpeechCloseLog File() Closes a log File. TextToSpeechOpenInMemory() Produces buffered speech samples in shared memory. TextToSpeechCloseInMemory() Returns the text-to-speech system to its normal state. TextToSpeechAddBuffer() Adds a shared-memory buffer to the memory buffer list. TextToSpeechReturnBuffer() Returns the current shared-memory buffer. TextToSpeechGetCaps() Retrieves the capabilities of the text-to-speech system. Special Text-To-Speech Modes Loading and Unloading a User Dictionary TextToSpeechLoadUserDictionary() Loads user dictionary. TextToSpeechUnloadUserDictionary() Unloads user dictionary. --------------------------------------------------------------------------- Using the Text-To-Speech API This section describes how to write application programs using the DECtalk API. The DECtalk Software API can be called from within any C program on the DIGITAL UNIX system. This API has been designed to be extensible for future Text-To- Speech growth while still being easy to use. The current DECtalk Software implementation supports only one instance of Text-To-Speech per process. However, several copies of Text-To-Speech can simultaneously be run as separate processes. However, several copies of the text-to-speech system can be run as separate processes. Core API Functions The core Text-To-Speech API functions are the following: * TextToSpeechStartup() allocates system resources. * TextToSpeechSpeak() queues text to the system. * TextToSpeechShutdown() returns all system resources allocated by the TextToSpeechStartup() function. The simplest application might use only these functions. About the TextToSpeechSpeak() Function The TextToSpeechSpeak() function is used to pass a null terminated string of characters to the Text-To-Speech system. The system queues all characters up to the null character. If the TTS_FORCE flag is not used in the call to the TextToSpeechSpeak() function, then the queued characters are seamlessly concatenated with previously queued characters. The TTS_FORCE flag is used to force a string of characters to be spoken even though the string might not complete a clause. For example: TextToSpeechSpeak("This will be spoken. ", TTS_NORMAL ); This text is spoken immediately by the system because it is terminated by a period and a space. These last two characters are one way to create a clause boundary. TextToSpeechSpeak("This will be spok", TTS_NORMAL ); This produces output only after the following line of code executes to complete the phrase. TextToSpeechSpeak("en. ", TTS_NORMAL ); Finally, a nonphrase string can be forced to be spoken by using the TTS_FORCE flag. TextToSpeechSpeak("This will be spok", TTS_FORCE ); Note that the word spoken is not pronounced correctly in this case even if the final characters in the word spoken, (en), are queued immediately afterward. The TTS_FORCE flag causes the previous line to be spoken before taking any subsequently queued characters into account. It is important that all sentences are separated with a space character. To make sure of this, it is recommended that a space character is routinely included after the final punctuation in a sentence. An example of what will happen without this is shown below: TextToSpeechSpeak("They are tired.", TTS_NORMAL ); TextToSpeechSpeak("I am Cold.", TTS_NORMAL ); Because there is no space, the Text-To-Speech system processes the following string: "They are tired.I am Cold." The string "tired.I" will be pronounced incorrectly because the system will treat it as one item instead of two words. Audio Output Control Functions An application can control speech output using the TextToSpeechPause() function, the TextToSpeechResume() function, and the TextToSpeechReset() function. These functions pause the audio output, resume output after pausing, and reset the Text-To-Speech system. A reset discards all queued text, and stops and discards all queued audio. If the application has called the TextToSpeechOpenInMemory() function to store speech samples in memory, a reset causes all buffers to be returned to the application. Blocking Synchronization Function A special function called TextToSpeechSync() is provided to block until all text previously queued by the TextToSpeechSpeak() function is spoken. Once this function is called, there is no way to abort until all text is processed. This could take hours if there is sufficient text queued. Nonblocking synchronization can be provided using the index mark command. See the Runtime User's Guide for more information on the index mark command. --------------------------------------------------------------------------- Control and Status Functions The functions described in the following table provide additional control and status information for the Text-To-Speech system. Table 1-2 -- Control and Status Functions Function Descriptions TextToSpeechSetSpeaker() Sets the speaker's voice (which becomes active at the next clause boundary). TextToSpeechGetSpeaker() Returns the value of the last speaker to have spoken. This value cannot be the value previously set by the TextToSpeechSetSpeaker() function. TextToSpeechSetRate() Sets the speaking rate, which becomes active at the next clause boundary. TextToSpeechGetRate() Gets the speaking rate (the current rate setting is returned even if it has not been activated). TextToSpeechSetLanguage() Sets the Text-To-Speech system language. (Currently, this must be TTS_AMERICAN_ENGLISH ). TextToSpeechGetLanguage() Returns the current Text-To-Speech system language. TextToSpeechGetStatus() Returns various Text-To-Speech system parameters, such as the number of characters in the text pipe, the ID of the wave output device, and a Boolean value that indicates whether the system is speaking or silent. TextToSpeechGetCaps() Returns the capabilities of the Text-To-Speech system, which includes the version number of the system, the number of speakers, the maximum and minimum speaking rate, and the supported languages. --------------------------------------------------------------------------- Special Text-To-Speech Modes After the TextToSpeechStartup() function is called by an application, it can then call the TextToSpeechSpeak() function to speak text. The application can also use the Text-To-Speech API to select different modes.These modes allow for writing wave files; writing a log file, which can contain text, phonemes, or syllables; or writing the audio (speech) samples to memory. Each mode-switch function has a corresponding function to return the Text-To-Speech system to the startup state. These functions are listed below. Open Close TextToSpeechOpenWaveOutFile TextToSpeechCloseWaveOutFile() TextToSpeechOpenLogFile() TextToSpeechCloseLogFile() TextToSpeechOpenInMemory() TextToSpeechCloseInMemory() The Text-To-Speech system must be in the startup state before calling any of the Open functions listed above. The corresponding Close functions return the system to the startup state. --------------------------------------------------------------------------- Loading and Unloading a User Dictionary The TextToSpeechLoadUserDictionary() function is used to load a user dictionary created with the userdic program. The TextToSpeechUnloadDictionary() function is used to unload a user dictionary. --------------------------------------------------------------------------- Creating a Wave File After calling the TextToSpeechStartup() function, an ap- plication can call the function TextToSpeechOpenWaveOutFile(). This function blocks until all previously queued text has been processed. After the function returns, all text subsequently queued by the function TextToSpeechSpeak() is converted to speech and written into a wave file. Function TextToSpeechCloseWaveOutFile() blocks until the speech from all previously queued text is written to the file. --------------------------------------------------------------------------- Creating a Log File After calling the TextToSpeechStartup() function, an application can call the TextToSpeechOpenLogFile() function. This function blocks until all previously queued text has been processed. After the function returns, all text subsequently queued by the TextToSpeechSpeak() function is written to a log file as either text, phonemes, or syllables. The phonemes and syllables are written using the arpabet phoneme alphabet. The TextToSpeechCloseLogFile() function terminates phoneme logging and blocks until the speech from all previously queued text is processed. --------------------------------------------------------------------------- Storing Speech Samples in Memory To cause all speech samples to be put in memory, the application must call the TextToSpeechOpenInMemory() function. This function blocks until all previously queued text has been processed. The memory buffers to store the speech samples are supplied to the Text-To-Speech system by the TextToSpeechAddBuffer() function. This function is used to pass a pointer to a structure of type TTS_BUFFER_ T. (The TTS_BUFFER_T structure is defined in the include file ttsapi.h.) When a buffer is completed, the buffer is returned to the application by sending a message to the callback function that corresponds to the callback function passed to the TextToSpeechStartup() function. A pointer to the returned TTS_BUFFER_T structure is contained in the LPARAM parameter of the message. The user is responsible for the allocation and freeing of memory for the following elements in the TTS_BUFFER_T structure: lpData, lpPhoneme array, and lpIndex array. The TTS_BUFFER_T structure is considered completed when any one of the following conditions occurs: o The sample buffer, which is pointed to by element lpData, is filled. o The phoneme array is filled. o The index mark array is filled. o A TTS_FORCE is used in a call to the TextToSpeechSpeak() function. The application must not modify any buffer passed to the Text- To-Speech system by function TextToSpeechAddBuffer() until the buffer is returned from the Text-To-Speech system in a message. The application then owns the buffer. If no buffers are available, the system blocks. If the application is processing relatively long passages of text, it is recommended that the application queue several buffers and then requeue each buffer after finishing with it so that the system is never idle. A call to the TextToSpeechReset() function returns all buffers to the application. The TextToSpeechReturnBuffer() function is supplied to force the return of the current TTS_BUFFER_T structure, whether it is filled or not. This function might not be required by most applications. It is included so that an application can obtain the last buffer without forcing that buffer to be sent with the TTS_FORCE command in the TextToSpeechSpeak() function. This might be required, if the application performs its own buffer management. The TTS_BUFFER_T structure and its elements are defined as follows: typedef struct TTS_PHONEME_TAG { DWORD dwPhoneme; DWORD dwPhonemeSampleNumber; DWORD dwPhonemeDuration; DWORD dwReserved; } TTS_PHONEME_T; typedef TTS_PHONEME_T * LPTTS_PHONEME_T; typedef struct TTS_INDEX_TAG { DWORD dwIndexValue; DWORD dwIndexSampleNumber; DWORD dwReserved; } TTS_INDEX_T; typedef TTS_INDEX_T * LPTTS_INDEX_T; typedef struct TTS_BUFFER_TAG { LPSTR lpData; LPTTS_PHONEME_T lpPhonemeArray; LPTTS_INDEX_T lpIndexArray; DWORD dwMaximumBufferLength; DWORD dwMaximumNumberOfPhonemeChanges; DWORD dwMaximumNumberOfIndexMarks; DWORD dwBufferLength; DWORD dwNumberOfPhonemeChanges; DWORD dwNumberOfIndexMarks; DWORD dwReserved; } TTS_BUFFER_T; typedef TTS_BUFFER_T * LPTTS_BUFFER_T; TTS_BUFFER_T Structure Initialization The TTS_BUFFER_T structure and the elements of its lpData, lpPhonemeArray, and lpIndexArray members point to must be allocated and freed by the user. (Note that the last two pointers can be optionally set to NULL if they are not used by the application.) * The lpData element points to a byte array. The dwMaximumBufferLength must be set to the length of this array. * If the lpPhonemeArray element is set to NULL, then no phonemes are returned. Otherwise, the lpPhonemeArray element must point to an application- allocated array of structures of type TTS_PHONEME_ T. The length of this array must be copied into the dwMaximumNumberOfPhonemeChanges element. * If the lpIndexArray element is set to NULL, then no index marks are returned. Otherwise, the lpIndexArray element must point to an application-allocated array of structures of type TTS_INDEX_T. The length of this ar- ray must be copied into the dwMaximumNumberOfIndexMarks element. TTS_BUFFER_T Return Values When the TTS_BUFFER_T structure is returned to the application, it contains the following return values: * The number of bytes of audio samples pointed to by the lpData element are returned in the dwBufferLength element. * The number of phoneme changes contained in the array pointed to by the lpPhonemeArray element are returned in the dwNumberOfPhonemeChanges element. * The number of index marks contained in the array pointed to by the lpIndexArray are returned in the dwNumberOfIndexMarks element. The index and phoneme arrays each contain a time stamp in the form of a sample number. This sample number is initialized at zero at startup and after each call to the TextToSpeechReset() function. The phoneme array also contains the current phoneme duration in frames. Each frame is approximately 6.4 milliseconds. --------------------------------------------------------------------------- Chapter 3: DECtalk Software Sample Programs This chapter provides instructions on how to build the sample programs. Topics include: * DECtalk Software Sample Programs * Building DECtalk Software Sample Programs --------------------------------------------------------------------------- Sample Programs Some applications are included with DECtalk Software. These sample applications have been included to demonstrate the use of DECtalk Software APIs. These sources can be used as templates for other applications that you might want to develop. Sources to these programs can be found in: /usr/examples/dtk/dtsamples The samples and a brief description are listed below. * xmsay.c and xmsay and its companion uil file xmsay.uil demonstrate the use of DECtalk Software APIs in the Motif windows environment. * say.c This is a command line program that speaks out the text typed on the command line. * mailtalk.c -- mailtalk announces the arrival of mail when new mail is received. The file mailtalk.ini in /usr/lib/dtk/ contains default announcement messages that mailtalk uses. To have mailtalk speak your own custom messages copy the mailtalk.ini file into your login directory and edit the strings. * aclock.c -- Announces the time at specified intervals. * dtmemory.c -- In dtmemory DECtalk Software passes back synthesized speech in buffers. These buffers are written out into a wave file. ---------------------------------------------------------------------- Building the Sample Programs Sample programs can be created from the sources provided in /usr/examples/dtk/dtsamples. This section describes the procedure for building the sample programs. Before proceeding make sure that the DECtalk Software development kit has been installed. See the DECtalk Software Users Guide for more information on different components of DECtalk Software. 1. Create a local directory that you want to build he sample programs in. 2. Copy all the files in /usr/examples/dtk/dtsamples into the directory that you just created. 3. Generate a Makefile from the Imakefile by typing: /usr/bin/X11/xmkmf 4. Compile and link the sample application programs by typing the following while still in the directory that you just created: make all 5. After the make program completes successfully, the sample programs are ready to run. In addition to the sample programs you will also find some demo text files in your directory. These files demonstrate some of the DECtalk Software capabilities. ----------------------------------------------------------------- Programming This section describes the DECtalk API programming environment. Topics include: 1. Header files 2. Shareable libraries 3. Compiling and linking applications Header Files DECtalk provides three header files that contain all the public data-structure definitions that the DECtalk Software API references. They are ttsapi.h, dtmmedefs.h, and engphon.h. When DECtalk Software is installed, these files are in /usr/include/dtk. + ttsapi.h contains definitions of constants used in the DECtalk Software API calls, data structures that define the buffers that DECtalk Software returns, and the API function prototype definitions. + dtmmedefs.h contains the basic data structure definitions used by DECtalk Software. It also contains definitions of error codes and audio formats. This file enables you to compile, link, and run certain DECtalk programs even if Multimedia Services for DIGITAL UNIX is not installed. Specifically, if you are writing an application program that does not use the audio drivers but want to use DECtalk Software to produce synthesized speech buffers (via the TextToSpeechInMemory calls), then using dtmmedefs.h circumvents the requirement for Multimedia Services for DIGITAL UNIX . + engphon.h contains a list of American English Phoneme Codes. ----------------------------------------------------------------- Shareable Libraries DECtalk Software APIs are available to programmers in two shareable libraries. + libtts.so contains device independent DECtalk Software routines. + libttsmme.so contains the DECtalk Software library that requires Multimedia Services for DIGITAL UNIX . As in the case of the header files, if you want to use DECtalk Software to write an application that produces buffers of synthesized speech, then the program is linked with libtts.so. If, on the other hand, you want to use the Multimedia Services for DIGITAL UNIX to communicate with the audio subsystem then the application has to be linked with libttsmme.so. --------------------------------------------------------------------------- Chapter 4: Customizing a DECtalk Software Voice The DECtalk Software voices provide an adequate selection for most applications. However, if you have a special application requiring a monotone or unusual voice, you can modify the parameters provided in this section to design your own voice. * Customizing a DECtalk Software Voice o Parameters [:dv_] o Changing Sex and Head Size o Changing Voice Quality o Changing Pitch and Intonation --------------------------------------------------------------------------- Parameters [:dv_] The nine built-in voices of DECtalk are distinguished from one another by a large set of speaker-definition parameters. Speakers can differ in sex, age, head size and shape, larynx size and behavior, pitch range, pitch and timing habits, dialect, and emotional state. DECtalk Software cannot approximate all of these options. Therefore, the space of distinguishable voices is limited, even though DECtalk Software has many speaker-definition parameters that can be modified. The design voice [:dv _] command introduces the speaker-definition parameters that can be entered as a string or one at a time. The following sections discuss speech production, acoustics, and perception. Some of the information is relatively technical, but the examples should make it possible for all developers to modify any parameter effectively and listen to the results. --------------------------------------------------------------------------- Changing Sex and Head Size Six speaker-definition parameters control the size and shape of the head. These parameters are as follows are described later in this chapter. sx Sex 1 (male) or 0 (female) hs Head size, in % f4 Fourth formant resonance frequency, in Hz f5 Fifth formant resonance frequency, in Hz b4 Fourth formant bandwidth, in Hz b5 Fifth formant bandwidth, in Hz Sex, sx Male and female voices have many differences, including head size, pharynx length, larynx mass, and speaking habits such as degree of breathiness, liveliness of pitch, choice of articulatory target values, and speed of articulation. Some of these differences are under the control of a single parameter, sx, the sex of the speaker. Speakers Paul, Harry, Frank, and Dennis are male (sx = 1), while speakers Betty, Rita, Ursula, Wendy, and Kit are female (sx = 0). Actually, Kit the Kid can be male or female because children younger than 10 years old have similar voices for both sexes. Changing the sx parameter causes DECtalk Software to access a different (male or female) table of target values for formant frequencies, bandwidths, and source amplitudes. The male and female tables are patterned after two individuals who were judged to have pleasant, intelligible voices. DECtalk Software's built-in voices are simply scaled transformations of Paul and Betty, the two basic voices. You can change the sex of any of DECtalk Software's voices by making the voice current and then modifying the sx parameter. For example, the following command gives Paul some of the speaking characteristics of a woman. (The sx parameter does not change the average pitch or breathiness, so a peculiar combination of simultaneous male and female traits results from this sx change.) [:np :dv sx 0] Am I a man or woman? The sx parameter can also be specified as m or f with the commands [:dv sx m] or [:dv sx f]. Note If you change the sex of the voice, some phonemes might cause DECtalk Software's filters to overload, producing a squawk. The modification of certain parameters such as f4, f5, and g1 (explained in a later section) can help to correct this problem. Head Size, hs Head size (hs) is specified as the average size for an adult man (if sx = 1) or an adult woman (if sx = 0). A head size of 100 % is normal or average for a given sex, but people can differ significantly in this characteristic. Head size has a strong influence on a person's voice. Large musical instruments produce low notes, and humans with large heads tend to have low, resonant voices. For example, to make Paul sound like a larger man with a 15 % longer vocal tract (and formant frequencies that are scaled down by a factor of about 0.85 %), use the following command: [:np :dv hs 115] Do I sound more like Huge Harry this way? Head size is one of the best variables to use if you want to make dramatic voice changes. For example, Paul has a head size of 100, while Harry's deep voice is caused in part by a head-size change to 115, or 15 % greater than normal. Decreasing head size produces a higher voice, such as in a child or adolescent. Extreme changes in head size, as in the following examples, are somewhat difficult to understand. [:nh :dv hs 135] Do I have a swelled head? [:nk] I am about 10 years old. [:nk :dv hs 65] Do I sound like a six year old? Note Extreme changes in head size can cause overloads, as well as difficulties in understanding the speech. The modification of certain parameters such as f4, f5, and g1 can help to correct this problem. (See the next section) Higher Formants, f4, f5, b4, and b5 A male voice typically has five prominent resonant peaks in the spectrum (over the range from 0 to 5 kHz), a female voice typically has only four (because of a smaller head size), and a child has three. If fourth and fifth formant resonances exist for a particular voice, they are fixed in frequency and bandwidth characteristics. These characteristics are specified (in HZ) by the parameters f4, f5, b4, and b5, in Hz. If a higher formant does not exist, the frequency and bandwidth of the speaker definition are set to special values that cause the resonance to disappear. To make a resonance disappear, the frequency is set to above 5500 Hz and the bandwidth is set to 5500 Hz. (This disables the formant filter.) This is what has been done to the fourth and fifth formants for Kit. The permitted values for f4 and f5 have fairly complicated restrictions. Violating these restrictions can cause overloads and squawks. The Following restrictions apply to cases where a higher formant exists: F5 must be at least 300 Hz higher than f4. If sx is 1 (male), f4 must be at least 3250 Hz. If sx is 0 (female), f4 must be at least 3700 Hz. If hs is not 100, the preceding values should be multiplied by (h/ 100). These higher formants produce peaks in the spectrum that become more prominent if b4 and b5 are smaller, and if f4 and f5 are closer together. The limits placed on b4 and b5 should ensure that no problems occur. However, smaller values for bandwidths may produce an overload in the synthesizer. You can correct these overloads by increasing the bandwidths or by changing the gain control g1. --------------------------------------------------------------------------- Changing Voice Quality Six speaker-definition parameters control aspects of the output of the larynx, which, in turn, control voice quality. These parameters are listed as follows: br Breathiness, in decibels (dB) lx Lax breathiness, in % sm Smoothness, in % ri Richness, in % nf Number of fixed samples of open glottis la Laryngealization, in % Breathiness, br Some voices can be characterized as breathy. The vocal folds vibrate to generate voicing and breath noise simultaneously. Breathiness is a characteristic of many female voices, but it is also common under certain circumstances for male voices. The range of the br parameter is from 0 dB (no breathiness) to 70 dB (strong breathiness). By experimenting, you can learn what intermediate values sound like. For example, to turn Paul into a breathy, whispering speaker, use the following command: [:np :dv br 55 gv 56] Do I sound more like Dennis now? This voice is not as loud as the others because of the simultaneous decrease in the gain of voicing, (gv), but it is intelligible and human sounding. Lax Breathiness, lx The br parameter creates simultaneous breathiness whenever voicing is turned on. Another type of breathiness occurs only at the ends of sentences and when going from voiced to voiceless sounds. This type of "lax" breathiness is controlled by the lx parameter in %. A nonbreathy, tense voice would have lx set to 0, while a maximally breathy, lax voice would have lx set to 100. The difference between these two voices is not great, but you can hear it if you listen closely. Smoothness, sm Smoothness refers to vocal fold vibrations. The vocal folds meet at the midline, as they do in normal voicing, but they do not slam together forcefully to create a very sudden cessation of airflow. DECtalk Software uses a variable-cutoff, gradual low-pass filter to model changes to smoothness. The range of sm is from 0 % (least smooth and most brilliant) to 100 % (most smooth and least brilliant). The voicing source spectrum is tilted so that energy at higher frequencies is attenuated by as much as 30 dB when sm is set to the maximum but is not attenuated at all when sm is set to 0. Professional singing voices that are trained to sing above an orchestra are usually brilliant, while anyone who talks softly becomes breathy and smooth. To synthesize a breathy voice, an sm value of about 50 or more is good. Changes to sm do not have a great effect on perceived voice quality. Richness, ri Richness is similar to smoothness and brilliance except that the spectral change occurs at lower frequencies and is because of a different physiological mechanism. Brilliant, rich voices carry well and are more intelligible in noisy environments, while smooth, soft voices sound more friendly. For example, the following command produces a soft, smooth version of Paul's voice: [:np :dv ri 0 sm 70] Do I sound more mellow? The following command produces a maximally rich and brilliant (forceful) voice: [:np :dv ri 90 sm 0] Do I sound more forceful? Smoothness and richness are usually negatively correlated when a speaker dynamically changes laryngeal output. The sm and ri parameters do not influence the speaker's identity very much. Nopen Fixed, nf The number of samples in the open part of the glottal cycle is determined not only by ri, but also by a second parameter, nf. The nf parameter is the number of fixed samples in the open portion of the glottal cycle. Most speakers adjust the open phase to be a certain fraction of the period, and this fraction is determined by ri. Other speakers keep the open phase fixed in duration when the overall period varies. To simulate this behavior, set ri to 100 and adjust nf to the desired duration of the open phase. The shortest possible open phase is 10 (1 ms), and the longest is three quarters of the period duration (about 70 for a male voice). Laryngealization, la Many speakers turn voicing on and off irregularly at the beginnings and ends of sentences, which gives a querulous tone to the voice. This departure from perfect periodicity is called laryngealization or creaky voice quality. The la parameter controls the amount of laryngealization, in the voice. A value of 0 results in no laryngealized irregularity, and a value of 100 (the maximum) produces laryngealization at all times. For example, to make Betty moderately laryngealized, type the following command: [:nb :dv la 20] The la parameter creates a noticeable difference in the voice, although it is not altogether a pleasant change. --------------------------------------------------------------------------- Changing Pitch and Intonation Seven speaker-definition parameters control aspects of the fundamental frequency (f0) contour of the voice. These parameters are as follows and are described in the chapter on modifying voices. bf Baseline fall, in Hz hr Hat rise, in Hz sr Stress rise, in Hz as Assertiveness, in % qu Quickness, in % ap Average pitch, in Hz pr Pitch range, in % Baseline Fall, bf The bf parameter in Hz determines one aspect of the dynamic fundamental frequency contour for a sentence. If bf is 0, the reference baseline fundamental frequency of a sentence begin and ends at 115 Hz. All rule-governed dynamic swings in f0 are computed with respect to the reference baseline. Some speakers begin a sentence at a higher f0 and gradually fall as the sentence progresses. This "falling baseline" behavior can be simulated by setting bf to the desired fall in Hz. For example, setting bf to 20 Hz causes the f0 pattern for a sentence to begin at 125 Hz (115 Hz plus half of bf) and to fall at a rate of 16 Hz per second until it reaches 105 Hz (115 Hz minus half of bf). The baseline remains at this lower value until it is reset automatically before the beginning of the next full sentence (right after a period, question mark, or exclamation point). The rate of fall (16 Hz per second) is fixed, regardless of the extent of the fall. Whenever you include a [+] phoneme in the text to indicate the beginning of a paragraph, the baseline is automatically set to begin slightly higher for the first sentence of the paragraph. While baseline fall differs among the speakers, it is not a good cue for differentiating among them. As long as the fall is not excessive, its presence or absence is hardly noticeable. Hat Rise, hr, and Stress Impulse Rise, sr The hr (nominal hat rises in Hz) and sr (nominal stress impulse rises in Hz) parameters determine aspects of the dynamic fundamental frequency contour for a sentence. To modify these values selectively, you should understand how the f0 contour is computed as a function of lexical stress pattern and syntactic structure of the sentence. A sentence is first analyzed and broken into clauses with punctuation and clause-introducing words to determine the locations of clause boundaries. Within each clause, the f0 contour rises on the first stressed syllable, stays at a high level for the remainder of the clause up to the last stressed syllable, and falls dramatically on the last stressed syllable. This rise-at-the-beginning and fall-at-the-end pattern has been called the "hat pattern" by linguists, using the analogy of jumping from the brim of a hat to the top of the hat and back down again. The hr parameter indicates the nominal height, in Hz of a pitch rise to a plateau on the first stress of a phrase. A corresponding pitch fall is placed by rule on the last stress of the phrase. Some speakers use relatively large hat rises and falls, while others use a local "impulse-like" rise and fall on each stressed syllable. The default hr value for Paul is 22 Hz, indicating that the f0 contour rises a nominal 22 Hz when going from the brim to the top of the hat. To simulate a speaker who does not use hat rises and falls, use the command: [:dv hr 0]. Other aspects of the hat pattern are important for natural intonation but are not accessible by speaker-definition commands. For example, the hat fall becomes a weaker fall followed by a slight continuation rise if the clause is to be succeeded by more clauses in the same sentence. Also, if unstressed syllables follow the last stressed syllable in a clause, part of the hat fall occurs on the very last (unstressed) syllable of the clause. If the clause is long, DECtalk Software may break it into two hat patterns by finding the boundary between the noun phrase and the verb phrase. If DECtalk Software is in phoneme input mode and you use the pitch rise [/] and pitch fall [\] symbols, the hr parameter determines the actual rise and fall in Hz. Stress Rise, sr The sr parameter indicates the nominal height, in Hz, of a local pitch rise and fall on each stressed syllable. This rise-fall is added to any hat rise or fall that is also present. For example, Paul has pr set to 32 Hz, resulting in an f0 rise-fall gesture of 32 Hz over a span of about 150 ms, which is located on the first and succeeding stressed syllables. However, DECtalk Software rules reduce the actual height of successive stress rises and falls in each clause and cause the last stress pulse to occur early so that there is time for the hat fall during the vowel. If the sr parameter is set too low, the speech sounds monotone within long phrases. Great changes to hr and sr from their default values for each speaker are not necessary or desirable, except in unusual circumstances. Assertiveness, as Assertive voices have a dramatic fall in pitch at the end of utterances. Neutral or meek speakers often end a sentence with a slight "questioning" rise in pitch to deflect any challenges to their assertions. The as parameter, in %, indicates the degree to which the voice tends to end statements with a conclusive final fall. A value of 100 is very assertive, while a value of 0 is extremely meek. uickness, qu The qu parameter, in %, controls the speed of response to a request to change the pitch. All hat rises, hat falls, and stress rises can be thought of as suddenly applied commands to change the pitch, but the larynx is sluggish and responds only gradually to each command. A smaller larynx typically responds more quickly, so while Harry has a quickness value of 10, Kit has a value of 50. In engineering terms, a value of 10 implies a time constant (time to get to 70 % of a suddenly applied step target) of about 100 ms. A value of 90 % corresponds to a time constant of about 50 ms. Lower quickness values may mean that the f0 never reaches the target value before a new command comes along to change the target. Average Pitch, ap, and Pitch Range, pr The ap (average pitch, in Hz) and pr (pitch ranges in % of normal range) parameters modify the computed values of fundamental frequency, f0, according to the formula: f0' = ap + (((f0 - 120) * pr) / 100) If ap is set to 120 Hz and pr to 100 %, there will be no change to the "normal" f0 contour that is computed for a typical male voice. The effect of a change in ap is simply to raise or lower the entire pitch contour independently by a constant number of Hz, whereas the effect of pr is to expand or contract the swings in pitch about 120 Hz. Normally, a smaller larynx simultaneously produces f0 values that are higher in average pitch and higher in pitch range by about the same factor (the whole f0 contour is multiplied by a constant factor). Observing the values assigned to ap and pr for each of the voices, you can see that the voices rank in average pitch from low (Harry) to high (Kit). Rankings for pr are similar, except that Frank has a flat, nonexpressive pitch range as compared with his average pitch. The best way to determine a good pitch range for a new voice is by trial and error. You can create a monotone or robotlike voice by setting the pitch range to 0. For example, to make Harry speak in a monotone at exactly 90 Hz, type the following command. [:nh :dv ap 90 pr 0] I am a robot. Reducing the pitch range reduces the dynamics of the voice, producing emotions such as sadness in the speaker. Increasing the pitch range while leaving the average pitch the same or setting it slightly higher suggests excitement. Due to constraints involved in pitch-synchronous updating of other dynamically changing parameters, the fundamental frequency contour that is computed by the preceding formula is then checked for values that are outside the following limits. f0 maximum = 500 Hz f0 minimum = 50 Hz Any value outside this range is limited to fall within the range. To keep you from exceeding reasonable limits on the parameters that control pitch, certain constraints apply to the values selected. If a [:dv _] command specifies values outside these limits, the value is limited to the nearest listed value before execution. Changing Relative Gains and Avoiding Overloads Eight speaker-definition parameters control the output levels of various internal resonators. These parameters are: gv Gain of voicing source, in dB gh Gain of aspiration source, in dB gf Gain of frication source, in dB gn Gain of nasalization, in dB g1 Gain of cascade formant resonator 1, in dB g2 Gain of cascade formant resonator 2, in dB g3 Gain of cascade formant resonator 3, in dB g4 Gain of cascade formant resonator 4, in dB g5 Loudness of the voice, in dB Loudness, g5 Each predefined voice has been adjusted to have about the same perceived loudness -- a value that is optimal for telephone conversation. The value chosen is near maximum. (If loudness were increased much, some phonemes would probably cause an overload squawk.) A near-maximum value was selected to maximize the signal-to-noise level of DECtalk Software. If you want to decrease the loudness of a voice or temporarily increase a phrase that is known not to overload, determine the g5 value in dB for the voice in question. Then adjust the voice by using the following command: [:np :dv g5 76] I am speaking at about half my normal level. Because the g5 entry for Paul is 86, this command reduces loudness by 10 dB. Perceived loudness approximately doubles (or halves) for each 10 dB increment (or decrement) in g5. Software control over loudness is useful in a loudspeaker application where the background noise level in the room might change. For example, a vocally handicapped, wheelchair-bound person does not want to appear to be shouting in a quiet interpersonal conversation, but he or she may want to be able to converse in a noisy room as well. Using a software abbreviation facility, such a person could type "lo" to select a command making the voice maximally loud, or "sof" to invoke a command setting lo to a reduced value. Note DECtalk Software comes with volume control so that modification of the g5 parameter should not be necessary. Using the [:volume ...] command or the volume control knob on the external loudspeaker is recommended. Sound Source Gains, gv, gh, gf, and gn Several types of sound sources are activated during speech production: voicing, aspiration, frication, and nasalization. The relative output levels of these sounds, in dB, are determined by the gv, gh, gf and gn parameters, respectively. The default settings for these parameters have been factory preset to maximize the intelligibility of each voice. However, changing the settings can be useful in debugging the system or in demonstrating aspects of the acoustic theory of speech production. You can change the level of one sound source globally, for example, turn off frication to be able to hear just the output of the larynx. You might need to reduce these parameters to overcome certain kinds of overloads, but try the procedure described in the next section first. Cascade Vocal Tract Gains, g1, g2, g3, and g4 Changes in head size or other parameters can sometimes produce overloads in the synthesizer circuits. If this occurs, make sure that f4 and f5 are set to reasonable values. If the squawk remains, you can adjust several gain controls -- g1 through g4, in dB -- in the cascade of formant resonators of the synthesizer to attenuate the signal at critical points. These gains can then be amplified back to desired output levels later in the synthesis. Use the following procedure to correct an overload (typically indicated by a squawk during part of a word): Synthesize the word or phrase several times to make sure the squawk occurs consistently. Use the same test word each time a change to a gain is made. Determine the default values for g1 through g4 for the speaker that overloads. Reduce g1 by increments of 3 until the squawk goes away. When the squawk goes away, note the reduction that was needed. If more than a 10 dB decrement is required, some other parameter has probably been changed too much. If the squawk does not go away at all, then you may need to reduce gv instead of g1. Increase g5 to return the output to its original level. For example, if g1 was reduced by 6 dB, add 6 dB to lo (or to g4 if lo is already at a maximum). If incrementing lo causes the squawk to return, then decrease lo slowly until the squawk goes away. This procedure works in most cases, but using g2 rather than g1 can work better. If you can return g1 to its factory-preset value and reduce g2 instead to make the squawk go away, then the signal-to-quantization-noise level in g1 remains maximized. If you can eliminate the squawk by using g3 or g4 rather than g2, more of the cascaded resonator system can be made immune to quantization noise accumulation. The [save] Parameter and [:nv] Voice You can save a modified speaker definition in a buffer while synthesizing speech with one of the other voices. The Val voice [:nv] is either male or female, depending on what values are stored in the buffer. If you call Val before storing any values in the buffer, DECtalk Software uses the Perfect Paul voice [:np]. The following commands store a modified Betty voice in Val and then recall it. [:nb :dv sex m save ] (Store the modified Betty voice in Val.) [:np] I am Paul. (Use another voice.) [:nv] I am Val. (Recall the Val [modified-Betty] voice.) The buffer holds its contents until you power down DECtalk Software. You must reenter new voice characteristics if you turn off DECtalk Software. Note If you want to use the save command, leave a space between the command and the trailing bracket; for example, [:dv save ]. Summary on Speaker-Definition Parameters Of the 27 parameters, only a few cause dramatic changes in the voice. The greatest effects are obtained with changes to hs, ap, pr, and sx, while moderate changes occur when modifying la and br. To some extent, DECtalk Software's nine predefined speakers cover most of the possible voices, so don't expect to be able to find a voice that is highly novel and intelligible. However, you might easily find ways to improve one of the standard voices slightly. --------------------------------------------------------------------------- Chapter 5: DECtalk Software API Function Calls This chapter is an alphabetical listing of DECtalk Software API functions. They include: * Control and Status Fuctions * Text-to-Speech Modes * Text-to-Speech Functions: Alphabetical Listing * Function Listed by Category * TextToSpeechAddBuffer * TextToSpeechCloseInMemory * TextToSpeechCloseLogFile * TextToSpeechCloseWaveOutFile * TextToSpeechGetCaps * TextToSpeechGetLanguage * TextToSpeechGetRate * TextToSpeechGetSpeaker * TextToSpeechGetStatus * TextToSpeechLoadUserDictionary * TextToSpeechOpenInMemory * TextToSpeechOpenLogFile * TextToSpeechOpenWaveOutFile * TextToSpeechPause * TextToSpeechReset * TextToSpeechResume * TextToSpeechReturnBuffer * TextToSpeechSetLanguage * TextToSpeechSetRate * TextToSpeechSetSpeaker * TextToSpeechShutdown * TextToSpeechSpeak * TextToSpeechStartup * Loading of the Main Pronunciation Dictionary * Loading of the User Dictionary * TextToSpeechSync * TextToSpeechUnloadUserDictionary Conventions used in API functions bold Bold text is used to indicate function names, data structures, and field names. italics Italic text is used to indicate function arguments and to emphasize important information. --------------------------------------------------------------------------- Control and Status Fuctions The functions described in the following table provide additional control and status information for the text-to-speech system. Function Descriptions TextToSpeechSetSpeaker() Sets the speaker's voice (which becomes active at the next clause boundary). TextToSpeechGetSpeaker() Returns the value of the last speaker to have spoken. This value cannot be the value previously set by the TextToSpeechSetSpeaker() function. TextToSpeechSetRate() Sets the speaking rate, which becomes active at the next clause boundary. TextToSpeechGetRate() Gets the speaking rate (the current rate setting is returned even if it has not been activated). TextToSpeechSetLanguage( Sets the text-to-speech ) system language. (Currently, this must be TTS_AMERICAN_ENGLISH . TextToSpeechGetLanguage( Returns the current ) text-to-speech system language. TextToSpeechGetStatus() Returns various text-to-speech system parameters, such as the number of characters in the text pipe, the ID of the wave output device, and a Boolean value that indicates whether the system is speaking or silent. TextToSpeechGetCaps() Returns the capabilities of the text-to-speech system, which includes the version number of the system, the number of speakers, the maximum and minimum speaking rate, and the supported languages. --------------------------------------------------------------------------- Text-to-Speech Modes After the TextToSpeechStartup() function is called by an application, it can then call the TextToSpeechSpeak() function to speak text. The application can also use the text-to-speech API to select different modes. These modes allow for writing wave files; writing a log file, which can contain text, phonemes, or syllables; or writing the audio (speech) samples to memory. Each mode-switch function has a corresponding function to return the text-to-speech system to the startup state. These functions are listed below. Open Close TextToSpeechOpenWaveOutFile TextToSpeechCloseWaveOutFile() TextToSpeechOpenLogFile() TextToSpeechCloseLogFile() TextToSpeechOpenInMemory() TextToSpeechCloseInMemory() The text-to-speech system must be in the startup state before calling any of the Open functions listed above. The corresponding Close functions return the system to the startup state. --------------------------------------------------------------------------- Text-to-Speech Functions: Alphabetical Listing TextToSpeechAddBuffer() TextToSpeechCloseInMemory() TextToSpeechCloseLog File() TextToSpeechCloseWaveOutFile() TextToSpeechGetCaps() TextToSpeechGetLanguage() TextToSpeechGetRate() TextToSpeechGetSpeaker() TextToSpeechGetStatus() TextToSpeechLoadUserDictionary() TextToSpeechOpenInMemory() TextToSpeechOpenLogFile() TextToSpeechOpenWaveOutFile() TextToSpeechPause() TextToSpeechReset() TextToSpeechResume() TextToSpeechReturnBuffer() TextToSpeechSetLanguage() TextToSpeechSetRate() TextToSpeechSetSpeaker() TextToSpeechShutdown() TextToSpeechSpeak() TextToSpeechStartup() TextToSpeechSync() TextToSpeechUnloadUserDictionary() --------------------------------------------------------------------------- Function Listed by Category TextToSpeechStartup() Initializes and starts up text-to-speech system. TextToSpeechSpeak() Speaks text from a buffer. TextToSpeechShutdown() Shuts down text-to-speech system. Function Purpose Core API Functions Audio Output Control Functions TextToSpeechPause() Pauses output. TextToSpeechResume() Resumes output. TextToSpeechReset() text-to-speech System is purged and output stopped. Blocking Synchronization Function TextToSpeechSync() Synchronizes to the text stream. Control and Status Functions TextToSpeechSetSpeaker() Selects one of nine speaking voices. TextToSpeechGetSpeaker() Returns the last speaking voice to have spoken. TextToSpeechSetRate() Sets the speaking rate of the text-to-speech system. TextToSpeechGetRate() Gets the speaking rate of the text-to-speech system. TextToSpeechSetLanguage() Sets the language to be used. TextToSpeechGetLanguage() Returns the language in use. TextToSpeechGetStatus() Gets status of text-to-speech System. TextToSpeechOpenWaveOutFile() Opens a file for output. Text-To SpeechSpeak writes audio data in wave format to this file. TextToSpeechCloseWaveOutFile() Closes the specified wave file. TextToSpeechOpenLogFile() Opens a log File. TextToSpeechCloseLog File() Closes a log File. TextToSpeechOpenInMemory() Produces buffered speech samples in shared memory. TextToSpeechCloseInMemory() Returns the text-to-speech system to its normal state. TextToSpeechAddBuffer() Adds a shared-memory buffer to the memory buffer list. TextToSpeechReturnBuffer() Returns the current shared-memory buffer. TextToSpeechGetCaps() Retrieves the capabilities of the text-to-speech system. Special Text-To-Speech Modes Loading and Unloading a User Dictionary TextToSpeechLoadUserDictionary Loads user dictionary. () TextToSpeechUnloadUserDictionary() dictionary. Unloads user --------------------------------------------------------------------------- TextToSpeechAddBuffer This function adds a buffer to the memory list the application uses in the speech-to-memory mode. Syntax MMRESULT TextToSpeechAddBuffer (LPTTS_HANDLE_T phTTS, LPTTS_BUFFER_T pTTSbuffer) Parameters LPTTS_HANDLE_T phTTS A pointer to a structure of type TTS_HANDLE_T. LPTTS_BUFFER_T pTTSbuffer A pointer to a structure of type TTS_BUFFER_T. Return Value This function returns a value of type MMRESULT. The value is zero if the function is successful. The return value is one of the following constants: Constant Description MMSYSERR_NOERROR Normal successful completion. MMSYSERR_INVALPARAM Invalid parameter. MMSYSERR_ERROR Output to memory not enabled or unable to create a system object. MMSYSERR_INVALHANDLE The text-to-speech handle was invalid. Comments The application must have previously called the TextToSpeechOpenInMemory() function before calling this function. The buffer is passed using the structure TTS_BUFFER_T . The user must allocate the structure and the memory buffer. The text-to-speech system returns the buffer to the application when the buffer is full. The structure of type TTS_BUFFER is returned to the application in a message to the window procedure that corresponds to the window handle passed to the TextToSpeechStartup() function. A pointer to the structure of the type TTS_BUFFER_T is in the LPARAM field of the message. The message ID value is obtained with the following call: uiID_Buffer_Message = RegisterWindowMessage("DECtalkBufferMessage"); See the topic, Storing Speech Samples in Memory See Also TextToSpeechOpenInMemory() TextToSpeechReturnBuffer() Storing Speech Samples in Memory Asynchronous Messages --------------------------------------------------------------------------- TextToSpeechCloseInMemory This function terminates the text-to-speech system's speech-to-memory capability and returns the text-to-speech system to its startup state. If audio is enabled at startup, then speech samples are routed to the audio device. Syntax MMRESULT TextToSpeechCloseInMemory (LPTTS_HANDLE_T phTTS) Parameters LPTTS_HANDLE_T phTTS A pointer to a text-to-speech handle. Return Value This function returns a value of type MMRESULT. The value is zero if the function is successful. The return value is one of the following constants: Constant Description MMSYSERR_NOERROR Normal successful completion. MMSYSERR_ERROR Output to memory not enabled or unable to create a system object. MMSYSERR_INVALHANDLE The text-to-speech handle was invalid. Comments The TextToSpeechOpenInMemory() function must be called before calling this function. See Also TextToSpeechOpenInMemory() --------------------------------------------------------------------------- TextToSpeechCloseLogFile This function closes a log file opened by the TextToSpeechOpenLogFile() function. Syntax MMRESULT TextToSpeechCloseLogFile (LPTTS_HANDLE_T phTTS) Parameters LPTTS_HANDLE_T phTTS A pointer to a text-to-speech handle. Return Value This function returns a value of type MMRESULT. The value is zero if the function is successful. The return value is one of the following constants. Constants Description MMSYSERR_NOERROR Normal successful completion. MMSYSERR_ERROR Failure to wait for pending speech, unable to close the output file, or no output file is open. MMSYSERR_INVALHANDLE The text-to-speech handle was invalid. Comments This function, when called, closes any open log file, even if it was opened with the Log [:log] voice-control command. The application must have previously called the TextToSpeechOpenLogFile() function before calling this function. See Also TextToSpeechOpenLogFile() --------------------------------------------------------------------------- TextToSpeechCloseWaveOutFile This function closes a wave file opened by the TextToSpeechOpenWaveOutFile() function. Syntax MMRESULT (LPTTS_HANDLE_T phTTS) TextToSpeechCloseWaveOutFile Parameters LPTTS_HANDLE_T phTTS Specifies a text-to-speech handle identifying the opened text-to-speech device. Return Value This function returns a value of type MMRESULT. The value is zero if the function is successful. The return value is one of the following constants: Constant Description MMSYSERR_NOERROR Normal successful completion. MMSYSERR_ERROR Failure to wait for pending speech. Unable to update wave file header. Unable to close the wave file. MMSYSERR_INVALHANDLE The text-to-speech handle was invalid. Comments The application must have previously called the TextToSpeechOpenWaveOutFile() function before calling this function. See Also TextToSpeechOpenWaveOutFile() --------------------------------------------------------------------------- TextToSpeechGetCaps This function provides the capabilities of the text-to-speech system by filling in a structure of type TTS_CAPS_T. The caller must have space allocated for this structure before calling this function. Syntax MMRESULT TextToSpeechGetCaps (LPTTS_CAPS_T lpTTScaps) Parameters LPTTS_CAPS_T lpTTScaps A pointer to a structure of type TTS_CAPS_T . This structure returns the capabilities of the text-to-speech system. Return Value This function returns a value of type MMRESULT. The value is zero if the function is successful. The return value is one of the following constants: Constant Description MMSYSERR_NOERROR Normal successful completion. MMSYSERR_INVALHANDLE The text-to-speech handle was invalid. MMSYSERR_ERROR The pointer to the TTS_CAPS_T structure was invalid. Comments Information returned in the TTS_CAPS_T structure includes languages and proper-name pronunciation support, sample rate, minimum and maximum speaking rate, number of predefined speaking voices, character-set supported, and version number. --------------------------------------------------------------------------- TextToSpeechGetLanguage This function returns the current language. Syntax MMRESULT TextToSpeechGetLanguage (LPTTS_HANDLE_T phTTS, LANGUAGE_T pLanguage) Parameters LPTTS_HANDLE_T phTTS Specifies a text-to-speech handle identifying the opened text-to-speech device. LANGUAGE_T * pLanguage Specifies a language from the following list: Constant Description TTS_AMERICAN_ENGLISH Specifies American English. Currently, American English is the only supported language (defined in include file ttsapi.h). Return Value This function returns a value of type MMRESULT. The value is zero if the function is successful. The return value is one of the following constants: Constant Description MMSYSERR_NOERROR Normal successful completion. MMSYSERR_INVALHANDLE The text-to-speech handle was invalid. See Also TextToSpeechSetLanguage() --------------------------------------------------------------------------- TextToSpeechGetRate This function returns the current setting of the speaking rate. Syntax MMRESULT TextToSpeechGetRate (LPTTS_HANDLE_T phTTS, LPDWORD pdwRate) Parameters LPTTS_HANDLE_T phTTS Specifies a text-to-speech handle and identifies the opened text-to-speech device. LPDWORD pdwRate A pointer to a DWORD that is used to return the speaking rate. Valid values range from 75 to 600 words per minute. Return Value Constant Description MMSYSERR_NOERROR Normal successful completion. MMSYSERR_INVALHANDLE The text-to-speech handle was invalid. This function returns a value of type MMRESULT. The value is zero if the function is successful. The return value is one of the following constants: Comments The current setting of the speaking rate is returned even if the speaking rate change has not occurred. (The speaking-rate change occurs on clause boundaries.) See Also TextToSpeechSetRate() --------------------------------------------------------------------------- TextToSpeechGetSpeaker This function returns the value of the identifier for the last voice that has spoken. Syntax MMRESULT TextToSpeechGetSpeaker (LPTTS_HANDLE_T phTTS, LPSPEAKER_T lpSpeaker) Parameters LPTTS_HANDLE_T phTTS Specifies a text-to-speech Handle identifying the opened text-to-speech device. LPSPEAKER_T lpSpeaker A pointer to a DWORD that returns a speaker value from the following list. These symbols are defined in include file ttsapi.h. Speaker Description PAUL Default (male) voice HARRY Full male voice FRANK Aged male voice DENNIS Male voice BETTY Full female voice URSULA Aged female voice WENDY Whispering female voice RITA Female voice KIT Child's voice Return Value This function returns a value of type MMRESULT. The value is zero if the function is successful. The return value is one of the following constants: Constant Description MMSYSERR_NOERROR Normal successful completion. MMSYSERR_INVALHANDLE The text-to-speech handle was invalid. Comments Note that even after a call to the TextToSpeechSetSpeaker() function, this function returns the value for the previous speaking voice until the new voice actually speaks. See Also TextToSpeechSetSpeaker() --------------------------------------------------------------------------- TextToSpeechGetStatus This function returns the state of one or more text-to-speech system parameters. Syntax MMRESULT TextToSpeechGetStatus (LPTTS_HANDLE_T phTTS, DWORD dwIdentifier[ ], DWORDdwStatus[ ], DWORD dwNumberOfStatusValues) Parameters LPTTS_HANDLE_T phTTS Specifies a text-to-speech handle identifying the opened text-to-speech device. DWORD dwIdentifier[ ] An array of values of type DWORD that contains identifiers specifying the status values to return in array dwStatus[ ]. These values can be one of the following constants defined in include file ttsapi.h: Constant Description INPUT_CHARACTER_COUNT Returns a count of characters in the text-to-speech system is currently processing. STATUS_SPEAKING The status value is TRUE if audio samples are playing and FALSE if no audio sample is playing. WAVE_OUT_DEVICE_ID The current wave output device ID is returned. DWORD dwStatus[ ] An array of type DWORD that contains the status values corresponding to each of the identifiers in array dwIdentifier[]. DWORD dwNumberOfStatusValues A DWORD that contains the number of entries to return. Return Value This function returns a value of type MMRESULT. The value is zero if the function is successful. The return value is one of the following constants: Constant Description MMSYSERR_NOERROR Normal successful completion. MMSYSERR_INVALPARAM An invalid parameter was passed. MMSYSERR_ERROR Error obtaining status values. MMSYSERR_INVALHANDLE The text-to-speech handle was invalid. Comments The STATUS_SPEAKING status identifier has no meaning if the application is sending speech to a wave file or sending speech to memory. --------------------------------------------------------------------------- TextToSpeechLoadUserDictionary This function loads a user-defined pronunciation dictionary into the text-to-speech system. Syntax Parameters LPTTS_HANDLE_T phTTS Specifies a text-to-speech Handle identifying the opened text-to-speech device. LPSTR pszFileName A pointer to a NULL terminated string that specifies the name of the user dictionary file to be loaded. Return Value This function returns a value of type MMRESULT. The value is zero if the function is successful. The return value is one of the following constants: Constant Description MMSYSERR_NOERROR Normal successful completion. MMSYSERR_INVALHANDLE The text-to-speech handle was invalid. MMSYSERR_NOMEM Unable to allocate memory for dictionary. MMSYSERR_INVALPARAM Dictionary file not found. (Invalid dictionary file name.) MMSYSERR_ERROR Illegal dictionary format or a dictionary is already loaded. Comments This function loads a dictionary created by the User Dictionary Build Tool applet. The text-to-speech system loads a default user dictionary at startup if it finds a file named user.dic in the default directory or in the directory specified in the directory. Any previously loaded user dictionary must be unloaded before loading a new user dictionary. See Also TextToSpeechUnloadUserDictionary() Automatic Loading of a User Dictionary --------------------------------------------------------------------------- TextToSpeechOpenInMemory The TextToSpeechOpenInMemory() function allows speech to be stored in memory buffers supplied by the application. These buffers are passed to the text-to-speech system using the TextToSpeechAddBuffer() function. Syntax MMRESULT TextToSpeechOpenInMemory (LPTTS_HANDLE_T phTTS, DWORD dwFormat) Parameters LPTTS_HANDLE_T phTTS A pointer to a text-to-speech handle. DWORD dwFormat An identifier that determines the audio sample format. It is one of the following constants defined in the include files mmsystem.h and ttsapi.h. Constant Description WAVE_FORMAT_11M08 Mono, 8-bit 11.025 kHz sample rate WAVE_FORMAT_11M16 Mono, 16-bit 11.025 kHz sample rate WAVE_FORMAT_08M08 Mono, 8-bit -law, 8 kHz sample rate Return Value This function returns a value of type MMRESULT. The value is zero if the function is successful. The return value is one of the following constants: Constant Description MMSYSERR_NOERROR Normal successful completion. MMSYSERR_INVALPARAM An invalid parameter was passed. (An illegal output format value.) MMSYSERR_NOMEM Unable to allocate memory. Constant Description MMSYSERR_ERROR Illegal output state. MMSYSERR_INVALHANDLE The text-to-speech handle was invalid. Comment The buffer is passed using the structure TTS_BUFFER_T. The user must allocate the structure and the memory buffer. The text-to-speech system returns the buffer to the application when the buffer is full. The TextToSpeechStartup() function must be called to start the text-to-speech system before calling this function. The buffer is sent in a message to the window procedure that corresponds to the window handle passed to the function TextToSpeechStartup(). A pointer to the structure of the type TTS_BUFFER_T is in the LPARAM field of the message. The message ID value can be obtained by the following call: uiID_Buffer_Message = RegisterWindowMessage("DECtalkBufferMessage"); See the section, Storing Speech Samples in Memory , at the beginning of this Appendix for more information. The TextToSpeechStartup() function must be called to start the text-to-speech system before calling this function. See Also TextToSpeechAddBuffer() TextToSpeechCloseInMemory() TextToSpeechReturnBuffer() Special text-to-speech Modes Storing Speech Samples in Memory --------------------------------------------------------------------------- TextToSpeechOpenLogFile This function creates a file that contains text, phonemes, or syllables. The phonemes and syllables are written using the arpabet alphabet. After calling this function, all subsequent calls to the TextToSpeechSpeak() function cause the log data to be written to a specified file until the TextToSpeechCloseLogFile() function is called. Syntax MRESULT TextToSpeechOpenLogFile (LPTTS_HANDLE_T phTTS, LPSTR pszFileName, DWORD dwFlags) Parameters LPTTS_HANDLE_T phTTS A pointer to a text-to-speech handle. char pszFileName A pointer to a NULL terminated string that specifies the name of the log file to be opened. DWORD dwFlags Specifies the type of output. It can contain one or more of the following constants: Constants Description LOG_TEXT Log text LOG_PHONEMES Log phonemes LOG_SYLLABLES Log syllable structure Return Value This function returns a value of type MMRESULT. The value is zero if the function is successful. The return value is one of the following constants. Constants Description MMSYSERR_NOERROR Normal successful completion. MMSYSERR_INVALPARAM An invalid parameter was passed. Continued on next page Continued from previous page MMSYSERR_NOMEM Unable to allocate memory. MMSYSERR_ALLOCATED A phoneme file is already open. MMSYSERR_ERROR Unable to open the output file. MMSYSERR_INVALHANDLE The text-to-speech handle was invalid. Comments If more than one of the flags are passed, then the logged output is mixed in an unpredictable fashion. If there is already a log file open, this function returns an error. The voice-control Log command [:Log] has no effect when a log file is already open. The TextToSpeechStartup() function must be called to start the text-to-speech system before calling this function. See Also TextToSpeechCloseLogFile() Creating a Log File Special text-to-speech Modes --------------------------------------------------------------------------- TextToSpeechOpenWaveOutFile This function opens the named file for speech output as a wave file. Syntax MMRESULT TextToSpeechOpenWaveOutFile (LPTTS_HANDLE_T phTTS, LPSTR pszFileName, DWORD dwFormat) Parameters LPTTS_HANDLE_T phTTS Specifies a text-to-speech handle and identifies the opened text-to-speech device. PSZFileName FileName Specifies a pointer to a file name. DWORD dwFormat Determines the audio sample format. It can be one of the following constants that are defined in include files mmsystem.h and ttsapi.h: Constant Description WAVE_FORMAT_11M08 Mono 8-bit, 11.025 kHz sample rate WAVE_FORMAT_11M16 Mono 16-bit, 11.025 kHz sample rate WAVE_FORMAT_08M08 Mono 8-bit, -law 8 kHz sample rate Return Value This function returns a value of type MMRESULT. The value is zero if the function is successful. The return value is one of the following constants: Constant Description MMSYSERR_NOERROR Normal successful completion. MMSYSERR_INVALPARAM An invalid parameter was passed. Illegal wave output format MMSYSERR_NOMEM Memory allocation error. MMSYSERR_ALLOCATED A wave file is already open. MMSYSERR_ERROR Unable to open the wave file. Unable to write to the wave file. MMSYSERR_INVALHANDLE The text-to-speech handle was invalid. Comments If an application calls the TextToSpeechOpenWaveOutFile() function, all subsequent calls to the TextToSpeechSpeak() function write the audio to a wave file until the TextToSpeechCloseWaveOutFile() function is called. The TextToSpeechStartup() function must be called to start the text-to-speech system before calling this function. See Also TextToSpeechOpenWaveOutFile Creating a Wave File Special text-to-speech Modes --------------------------------------------------------------------------- TextToSpeechPause This function pauses text-to-speech audio output. Syntax MMRESULT TextToSpeechPause (LPTTS_HANDLE_T phTTS) Parameters LPTTS_HANDLE_T phTTS Specifies a text-to-speech handle identifying the opened text-to-speech device. Return Value This function returns a value of type MMRESULT. The value is zero if the function is successful. The return value is one of the following constants: Constant Description MMSYSERR_NOERROR Normal successful completion. MMSYSERR_INVALHANDLE The specified device handle is invalid. The system is not speaking or the text-to-speech handle is invalid. Comments This function only affects the audio output and will have no effect when writing log files, wave files, or when using the speech-to-memory capability of the text-to-speech system. The text-to-speech system will remain paused until one of the following functions is called: * TextToSpeechResume() * TextToSpeechSync() * TextToSpeechOpenInMemory() * TextToSpeechOpenLogFile() TextToSpeechOpenWaveOutFile() If the wave output (audio) device is being shared (i.e. OWN_AUDIO_DEVICE was NOT specified when the TextToSpeechStartup() function started the text-to-speech system.) by the text-to-speech system, and the TextToSpeechPause() function is called while the system is speaking, the wave output device is not released until one of the functions listed above is called and the system finishes speaking or the TextToSpeechReset() function is called. Note that the TextToSpeechReset() function will NOT resume audio output if text-to-speech system has been paused by the TextToSpeechPause() function. See Also TextToSpeechResume() Audio Output Control Functions --------------------------------------------------------------------------- TextToSpeechReset This function flushes all previously queued text from the text-to-speech system and stops any audio output. Syntax MMRESULT TextToSpeechReset (LPTTS_HANDLE_T phTTS, BOOL bReset) Parameters LPTTS_HANDLE_T phTTS Specifies a text-to-speech handle identifying the opened text-to-speech device. BOOL bReset bReset returns one of the following Boolean values: Return Value This function returns a value of type MMRESULT. The value is zero if the function is successful. The return value is one of the following constants: Constant Description MMSYSERR_NOERROR Normal successful completion. MMSYSERR_NOMEM Unable to allocate memory. MMSYSERR_ERROR Unable to flush the system. Value Description FALSE Preserves the current mode of the text-to-speech system. TRUE The text-to-speech system is returned to the startup state and any open text-to-speech files are closed. The one exception is that this function will NOT resume the text-to-speech system if it has been paused by the TextToSpeechPause() function. MMSYSERR_INVALHANDLE The text-to-speech handle was invalid. Comments The file is closed if the application has called the TextToSpeechOpenWaveOutFile() function or the TextToSpeechOpenLogFile() function and if bReset has a value of TRUE. Then, the TextToSpeechReset() function flushes all previously queued text and stops all audio output. If the TextToSpeechOpenInMemory() function has enabled outputting the speech samples to memory, then all queued TTS_BUFFER_T structures are returned to the application by a message that is sent to the application's window procedure. See the TextToSpeechOpenInMemory() function for more information. See Also TextToSpeechStartup() TextToSpeechShutdown() Audio Output Control Functions --------------------------------------------------------------------------- TextToSpeechResume This function resumes text-to-speech output after it has been paused by calling the TextToSpeechPause() function. Syntax MMRESULT TextToSpeechResume (LPTTS_HANDLE_T phTTS) Parameters LPTTS_HANDLE_T phTTS Specifies a text-to-speech handle identifying the opened text-to-speech device. Return Value Constant Description MMSYSERR_NOERROR Normal successful completion. MMSYSERR_INVALHANDLE The system was not paused, or the text-to-speech handle was invalid. This function returns a value of type MMRESULT. The value is zero if the function is successful. The return value is one of the following constants: Comments This function only affects the audio output and has no effect when writing log files, wave files, or when using the speech-to-memory capability of the text-to-speech system. See Also TextToSpeechPause Audio Output Control Functions --------------------------------------------------------------------------- TextToSpeechReturnBuffer This function returns the current buffer when an application is using the text-to-speech system's speech-to-memory capability. The buffer can be empty or partially full when it is returned. The dwBufferLength element of the TTS_BUFFER_T structure contains the number of samples in the buffer. If no buffer is available, then a NULL pointer is returned in ppTTSBuffer. Syntax MMRESULT TextToSpeechReturnBuffer (LPTTS_HANDLE_T phTTS, LPTTS_BUFFER_TppTTSbuffer) Parameters LPTTS_HANDLE_T phTTS A pointer to a structure of type TTS_HANDLE_T. LPTTS_BUFFER_T *ppTTSbuffer The address of a pointer to a structure of type TTS_BUFFER_T. Return Value This function returns a value of type MMRESULT. The value is zero if the function is successful. The return value is one of the following constants: Constant Description MMSYSERR_NOERROR Normal successful completion. MMSYSERR_INVALPARAM Invalid parameter. MMSYSERR_ERROR Output to memory not enabled or unable to create a system object. MMSYSERR_INVALHANDLE The text-to-speech handle was invalid. Comments Most applications do not require this function because buffers are automatically returned when filled or when a TTS_FORCE flag is passed in the TextToSpeechSpeak() function. The TextToSpeechReturnBuffer() function is provided so an application can return a buffer before it is filled and, therefore, obtain more speech samples immediately. See Also TextToSpeechStartup() TextToSpeechShutdown() --------------------------------------------------------------------------- TextToSpeechSetLanguage This function selects a language for the text-to-speech system to use as the default language. Syntax MMRESULT TextToSpeechSetLanguage (LPTTS_HANDLE_T phTTS, LANGUAGE_T Language) Parameters LPTTS_HANDLE_T phTTS Specifies a text-to-speech handle identifying the opened text-to-speech device. LANGUAGE_T Language Specifies a language. It must be one of languages listed below. (Currently there is only one supported language.) Constant Description TTS_AMERICAN_ENGLISH Specifies American English. This symbol is defined in include file ttsapi.h Return Value This function returns a value of type MMRESULT. The value is zero if the function is successful. The return value is one of the following constants: Constant Description MMSYSERR_NOERROR Normal successful completion. MMSYSERR_INVALPARAM An invalid parameter was passed. MMSYSERR_INVALHANDLE The text-to-speech handle was invalid. Comments Currently, American English is the only supported language. --------------------------------------------------------------------------- See Also TextToSpeechGetLanguage() --------------------------------------------------------------------------- TextToSpeechSetRate This function sets the text-to-speech speaking rate. Syntax MMRESULT TextToSpeechSetRate (LPTTS_HANDLE_T phTTS, DWORD dwRate) Parameters LPTTS_HANDLE_T phTTS Specifies a text-to-speech handle identifying the opened text-to-speech device. DWORD dwRate Sets the speaking rate. Valid values range from 75 to 600 words per minute. Return Value This function returns a value of type MMRESULT. The value is zero if the function is successful. The return value is one of the following constants: Constant Description MMSYSERR_NOERROR Normal successful completion. MMSYSERR_INVALPARAM An invalid parameter was passed. MMSYSERR_INVALHANDLE The text-to-speech handle was invalid. Comments The speaking rate change is not effective until the next phrase boundary. All the queued audio encountered before the phrase boundary is unaffected. See Also TextToSpeechGetRate() --------------------------------------------------------------------------- TextToSpeechSetSpeaker This function sets the voice of the speaker the text-to-speech system will use. Syntax MMRESULT TextToSpeechSetSpeaker (LPTTS_HANDLE_T phTTS, SPEAKER_T Speaker) Parameters LPTTS_HANDLE_T phTTS Specifies a text-to-speech handle identifying the opened text-to-speech device SPEAKER_T Speaker Selects a speaker from the following list. These values are defined in include file ttsapi.h. Speaker Description PAUL Default (male) voice HARRY Full male voice FRANK Aged male voice DENNIS Male voice BETTY Full female voice URSULA Aged female voice WENDY Whispering female voice RITA Female voice KIT Child's voice Return Value This function returns a value of type MMRESULT. The value is zero if the function is successful. The return value is one of the following constants: Constant Description MMSYSERR_NOERROR Normal successful completion. MMSYSERR_INVALPARAM An invalid parameter was passed. MMSYSERR_INVALHANDLE The text-to-speech handle was invalid. Comments The change in speaking voice is not effective until the next phrase boundary. All queued audio encountered before the phrase boundary is unaffected. See Also TextToSpeechGetSpeaker() --------------------------------------------------------------------------- TextToSpeechShutdown This function shuts down the text-to-speech system and frees all system resources used by the text-to-speech system. Syntax MMRESULT TextToSpeechShutdown (LPTTS_HANDLE_T phTTS) Parameters LPTTS_HANDLE_T phTTS Specifies a text-to-speech handle identifying the opened text-to-speech device. Return Value This function returns a value of type MMRESULT. The value is zero if the function is successful. The return value is one of the following constants: Constant Description MMSYSERR_NOERROR Normal successful completion. MMSYSERR_INVALHANDLE The text-to-speech handle was invalid. Comments This function is called when you close an application. Any user-defined dictionaries, which were previously loaded, are automatically unloaded. All previously queued text is discarded and the text-to-speech system will immediately stop speaking. See Also TextToSpeechStartup() --------------------------------------------------------------------------- TextToSpeechSpeak This function queues a null-terminated string to the text-to-speech system. Syntax MMRESULT TextToSpeechSpeak (LPTTS_HANDLE_T phTTS, LPSTR pszTextString, DWORD dwFlags) Parameters LPTTS_HANDLE_T phTTS Specifies a text-to-speech handle identifying the opened text-to-speech device. LPSTR pszTextString Specifies a pointer to a null terminated string of characters to be queued DWORD dwFlags Specifies whether the text is to be pushed through the text-to-speech system even if it does NOT end on a clause boundary. It can be set to one of the following constants defined in include file ttsapi.h: Constant Description TTS_NORMAL Insert characters in the text-to-speech queue. TTS_FORCE Insert characters in the text-to-speech queue and force all text to be output even if the text stream does NOT end on a clause boundary. Return Value This function returns a value of type MMRESULT. The value is zero if the function is successful. The return value is one of the following constants: Constant Description MMSYSERR_NOERROR Normal successful completion. MMSYSERR _NOMEM Unable to allocate memory. MMSYSERR_INVALHANDLE The text-to-speech handle was invalid. Comments The speaker, speaking rate, and volume can also be changed in the text string by inserting voice-control commands as shown in the following example: [:name paul] I am Paul. [:nb] I am Betty. [:volume set 50] The volume has been set to 50% of the maximum level. [:ra 120] I am speaking at 120 words per minute. See Also About TextToSpeechSpeak() --------------------------------------------------------------------------- TextToSpeechStartup This function initializes the text-to-speech system and returns a value of type MMRESULT. This value is zero if initialization was successful. A single process can run only one instance of DECtalk. Syntax MMRESULT TextToSpeechStartup (HWND hWnd, LPTTS_HANDLE_T *phTTS, UINT uiDeviceNumber, DWORD dwDeviceOptions) VOID (*DTCallbackRoutine) (), Long dwDTCallbackParameter Parameters HWND hWnd A handle to the parent window. This can be NULL. LPTTS_HANDLE_T *phTTS A pointer to a pointer to a structure of type TTS_HANDLE_T. UINT uiDeviceNumber Specifies a device number of the wave output device. A value of WAVE_MAPPER can be used to select the first available device. DWORD dwDeviceOptions Specifies how the wave output device is managed. It can be a combination of the following constants defined in include file ttsapi.h: Constants Description OWN_AUDIO_DEVICE The wave output device is opened. No other process can allocate the wave output device until the TextToSpeechShutdown() function is called. . If OWN_AUDIO_DEVICE is NOT specified, the wave output device is opened after audio is queued by the TextToSpeechSpeak() function. The wave output device is released when the text-to-speech system has completed speaking. DO_NOT_USE_AUDIO_DEVICE The text-to-speech system can only be used to write wave files, write speech samples to memory, or to write log files. No error is returned if a wave output device is not present. OUTPUT_TO_MME_DEVICE This flag need not be specified anymore, it is still available for compatiblity with previous versions of DECtalk Software. VOID (*DtCallbackRoutine)() This parameter is used to specify a callback routine. The callback routine is used by DECtalk Sofware to inform the application when the buffer is full (if DECtalk Software in-memory calls are used) or when the TexToSpeechSpeak () function encounters an index mark. A value of NULL should be passed in if no user-specificed parameters are desired. LONG dwCallbackParameter This is a pointer to a user-specified parameter. It is used to pass parameters into the callback routine. A value of NULL should be passed in if no user-specified parameters are desired. Return Value This function returns a value of type MMRESULT. The value is zero if the function is successful. The return value is one of the following constants: Constant Description MMSYSERR_NOERROR Normal successful completion. MMSYSERR_NODRIVER No wave output device present. MMSYSERR_NOMEM Memory allocation error. MMSYSERR_ERROR DECtalk dictionary not found. MMSYSERR_baddevice_id Device ID out of range. Comments The default parameters are: Language: American English. Speaking rate: 180 words per minute. Speaker: Paul. See Also TextToSpeechShutdown() --------------------------------------------------------------------------- Loading of the Main Pronunciation Dictionary The TextToSpeechStartup() function loads the DECtalk main pronunciation dictionary, dectalk.dic, from the directory specified in the directory at /usr/lib/dtk/. If the dictionary file cannot be found in this fashion then the TextToSpeechStartup() function returns a value of MMSYSERR_ERROR. --------------------------------------------------------------------------- Loading of the User Dictionary The TextToSpeechStartup() function attempts to load a user specified pronunciation dictionary from the user's login home directory. When started, DECtalk Software loads the default user dictionary called user.dic if it is available. If the dictionary file cannot be found in this fashion then the TextToSpeechStartup() function attempts to load the user dictionary from the applications default directory. If this second attempt fails then a user dictionary is not loaded. See Also TextToSpeechLoadUserDictionary() TextToSpeechUnloadUserDictionary() --------------------------------------------------------------------------- TextToSpeechSync This function blocks until all previously queued text has been processed. This function automatically resumes audio output if the text-to-speech system has been paused by the TextToSpeechPause() function. Syntax MMRESULT TextToSpeechSync (LPTTS_HANDLE_T phTTS) Parameters LPTTS_HANDLE_T ph TTS Specifies a text-to-speech handle identifying the opened text-to-speech device. Return Value This function returns a value of type MMRESULT. The value is zero if the function is successful. The return value is one of the following constants: Constants Description MMSYSERR_NOERROR Normal successful completion. MMSYSERR_ERROR Unable to complete queued text. MMSYSERR_INVALHANDLE The text-to-speech handle was invalid. Comments This function automatically resumes audio output if the text-to-speech system is in a paused state by a previously issued TextToSpeechPause() function. --------------------------------------------------------------------------- TextToSpeechUnloadUserDictionary This function unloads a user dictionary. You must unload any previously loaded dictionary before you can load a new one. That is, only one user dictionary can be loaded at a time. Syntax MMRESULT (LPTTS_HANDLE_T phTTS) TextToSpeechUnloadUserDictionary Parameters LPTTS_HANDLE_T phTTS Specifies a text-to-speech Handle identifying the opened text-to-speech device. Return Value This function returns a value of type MMRESULT. The value is zero if the function is successful. The return value is one of the following constants: Constants Description MMSYSERR_NOERROR Normal successful completion. MMSYSERR_INVALHANDLE The text-to-speech handle was invalid. Comments A user dictionary is created using the User Dictionary Build tool. See Creating a user dictionary. See Also TextToSpeechLoadUserDictionary()