Copyright (c) Hyperion Entertainment and contributors.
Narrator Device
This page is not yet fully updated to AmigaOS 4.x some of the information contained here may not be applicable in part or totally. |
Contents
- 1 Narrator Device
- 2 Narrator Device Commands and Functions
- 3 Device Interface
- 4 Writing to the Narrator Device
- 5 Reading from the Narrator Device
- 6 How to Write Phonetically for Narrator
- 7 A More Technical Explanation
- 8 Example Speech and Mouth Movement Program
- 9 Additional Information on the Narrator Device
Narrator Device
This article describes the narrator device which, together with the translator library, provides all of the Amiga’s text-to-speech functions. The narrator device is used to produce high-quality human-like speech in real time.
Narrator Device Commands and Functions
Command | Command Operation |
---|---|
CMD_FLUSH | Purge all active and queued requests for the narrator device. |
CMD_READ | Read mouth shapes associated with an active write from the narrator device. |
CMD_RESET | Reset the narrator port to its initialized state. All active and queued I/O requests will be aborted. Restarts the device if it has been stopped. |
CMD_START | Restart the currently active speech (if any) and resume queued I/O requests. |
CMD_STOP | Stop any currently active speech and prevent queued I/O requests from starting. |
CMD_WRITE | Write a stream of characters to the narrator device and generate mouth movement data for reads. |
Device Interface
The narrator device operates like all other Amiga devices. To use the narrator device, you must first open it. This initializes certain global areas, opens the audio device, allocates audio channels, and performs other housekeeping functions. Once open, the device is ready to receive I/O commands (most typically CMD_WRITE and CMD_READ). Finally, when finished, the user should close the device. This will free some buffers and allow the entire device to be expunged should the system require memory. See Exec Device I/O for general information on device usage.
The narrator device uses two extended I/O request structures: narrator_rb for write commands (to produce speech output) and mouth_rb for read commands (to receive mouth shape changes and word/syllable synchronization events).
struct narrator_rb { struct IOStdReq message; /* Standard IORequest Block */ UWORD rate; /* Speaking rate (words/minute) */ UWORD pitch; /* Baseline pitch in Hertz */ UWORD mode; /* Pitch mode */ UWORD sex; /* Sex of voice */ UBYTE *ch_masks; /* Pointer to audio allocation maps */ UWORD nm_masks; /* Number of audio allocation maps */ UWORD volume; /* Volume. 0 (off) thru 64 */ UWORD sampfreq; /* Audio sampling frequency */ UBYTE mouths; /* If non-zero, generate mouths */ UBYTE chanmask; /* Which ch mask used (internal - do not modify)*/ UBYTE numchan; /* Num ch masks used (internal- do not modify) */ UBYTE flags; /* New feature flags */ UBYTE F0enthusiasm; /* F0 excursion factor */ UBYTE F0perturb; /* Amount of F0 perturbation */ BYTE F1adj; /* F1 adjustment in +- 5% steps */ BYTE F2adj; /* F2 adjustment in +- 5% steps */ BYTE F3adj; /* F3 adjustment in +- 5% steps */ BYTE A1adj; /* A1 adjustment in decibels */ BYTE A2adj; /* A2 adjustment in decibels */ BYTE A3adj; /* A3 adjustment in decibels */ UBYTE articulate; /* Transition time multiplier */ UBYTE centralize; /* Degree of vowel centralization */ char *centphon; /* Pointer to central ASCII phon */ BYTE AVbias; /* Amplitude of voicing bias */ BYTE AFbias; /* Amplitude of frication bias */ BYTE priority; /* Priority while speaking */ BYTE pad1; /* For alignment */ }; struct mouth_rb { struct narrator_rb voice; /* Speech IORequest Block */ UBYTE width; /* Mouth width (returned value) */ UBYTE height; /* Mouth height (returned value)*/ UBYTE shape; /* Internal use, do not modify */ UBYTE sync; /* Returned sync events */ };
Details on the meaning of the various fields of the two I/O request blocks can be found in the Writing to the Narrator Device and Reading from the Narrator Device sections later in this article. See the include file devices/narrator.h for the complete structure definitions.
The Amiga Speech System
The speech system on the Amiga is divided into two subsystems:
- The translator library, consisting of a single function: Translate(), which converts an English string into its phonetic representation.
- The narrator device, which uses the phonetic representation (generated either manually or by the translator library) as input to generate human-like speech and play it out via the audio device.
The two subsystems can be used either together or individually. Generally, hand coding phonetic text will produce better quality speech than using the translator library, but this requires the programmer to “hard code” the phonetic text in the program or otherwise restrict the input to phonetic text only. If the program must handle arbitrary English input, the translator library should be used.
Below is an example of how you would use the translator library to translate a string for the narrator device.
#define BUFLEN 500 APTR EnglStr; /* pointer to sample input string */ LONG EnglLen; /* input length */ UBYTE PhonBuffer[BUFLEN]; /* place to put the translation */ LONG rtnCode; /* return code from function */ struct narrator_rb *VoiceIO; /* speaking I/O request block */ struct mouth_rb *MouthIO; /* mouth movement I/O request block */ EnglStr = "This is Amiga speaking."; /* a test string */ EnglLen = strlen(EnglStr); rtnCode = ITranslator->Translate(EnglStr, EnglLen, (APTR)&PhonBuffer[0], BUFLEN); voice_io->message.io_Command = CMD_WRITE; voice_io->message.io_Offset = 0; voice_io->message.io_Data = PhonBuffer; voice_io->message.io_Length = strlen(PhonBuffer); IExec->DoIO((struct IORequest *)VoiceIO)
This article discusses only the narrator device; refer to Translator Library for more information on the translator library.
While the narrator device on the Amiga supports all of the major device commands (see the Narrator Device Commands and Functions section), two of these commands do most of the work in the device. They are:
- CMD_WRITE
- This command is used to send a phonetic string to the device to be spoken. The narrator_rb I/O request block also contains several parameters which can be set to control various aspects of the speech, such as pitch, speaking rate, male/female voice, and so on. Some of the options are rather arcane. See the Writing to the Narrator Device section for a complete list of options and their descriptions.
- CMD_READ
- The narrator device can be told to generate various synchronization events which the user can query. These events are: mouth shape changes, word sync, and/or syllable sync. The events can be generated singly or in any combination, as requested by the user. See the Reading from the Narrator Device section for more details.
Opening the Narrator Device
Three primary steps are required to open the narrator device:
- Create a message port using CreatePort(). Reply messages from the device must be directed to a message port.
- Create an extended I/O request structure of type narrator_rb. The narrator_rb structure is created by the CreateExtIO() function.
- Open the narrator device. Call OpenDevice() passing the I/O request.
struct MsgPort *VoiceMP; struct narrator_rb *VoiceIO; if (VoiceMP = CreatePort("speech_write",0)) if (VoiceIO = (struct narrator_rb *) CreateExtIO(VoiceMP,sizeof(struct narrator_rb)); if (OpenDevice("narrator.device", 0, VoiceIO, 0)) printf("narrator.device did not open\n");
When the narrator device is first opened, it initializes certain fields in the user’s narrator_rb I/O request structure. In order to maintain backwards compatibility with older versions of the narrator device, a mechanism was needed for the device to ascertain whether it was being opened with a V37 or pre-V37 style I/O request structure. The pad field in the pre-V37 narrator_rb I/O request structure (which no one should have ever touched!) has been replaced by the flags field in the V37 narrator_rb structure, and is our path to upward compatibility. The device checks to see if a bit is set in this flags field. This bit must be set before opening the device if V37 or later features of the narrator device are to be used. There are two defined constants in the include file, NDB_NEWIORB and NDF_NEWIORB. NDB_NEWIORB specifies the bit which must be set in the flags field, NDF_NEWIORB is the field definition of the bit (1 << NDB_NEWIORB).
Once the device is opened, the mouth_rb (read) I/O request structure can be set up. Each CMD_READ request must be matched with an associated CMD_WRITE request. This is necessary for the device to match the various sync events with a particular utterance. The read I/O request structure is easily set up as follows:
- Create a read message port using the CreatePort() function.
- Allocate memory for the mouth_rb extended I/O request structure using AllocMem().
- Copy the narrator_rb I/O request structure used to open the device into the voice field of the mouth_rb I/O request structure. This will set the fields necessary for the device to make the correct correspondence between read and write requests.
- Copy the pointer to the read message port returned from CreatePort() into the voice.message.io_Message.mn_ReplyPort field of the mouth_rb structure.
The following code fragment, in conjunction with the OpenDevice() code fragment above, shows how to set up the mouth_rb structure:
struct MsgPort *MouthMP; struct mouth_rb *MouthIO; if (MouthMP = CreatePort("narrator_read", 0)) if (!(MouthIO = (struct mouth_rb *) AllocMem(sizeof(struct mouth_rb),MEMF_PUBLIC|MEMF_CLEAR))) { MouthIO->voice = *VoiceIO; /* Copy I/O request used in OpenDevice */ MouthIO->voice.message.io_Message.mn_ReplyPort = MouthMP; /* Set port */ } else printf("AllocMem failed\n"); else printf("CreatePort failed\n");
Closing the Narrator Device
Each OpenDevice() must be eventually matched by a call to CloseDevice(). This is necessary to allow the system to expunge the device in low memory conditions. As long as any task has the device open, or has forgotten to close it before terminating, the narrator device will not be expunged.
All I/O requests must have completed before the task can close the device. If any requests are still pending, the user must abort them before closing the device.
if ( !IExec->CheckIO(VoiceIO) ) { IExec->AbortIO(VoiceIO); /* Abort queued or in progress request */ } IExec->WaitIO((struct IORequest *)VoiceIO); /* Wait for abort to do its job */ IExec->CloseDevice(VoiceIO); /* Close the device */
Writing to the Narrator Device
You write to the narrator device by passing a narrator_rb I/O request to the device with CMD_WRITE set in io_Command, the number of bytes to be written set in io_Length and the address of the write buffer set in io_Data.
VoiceIO->message.io_Command = CMD_WRITE; VoiceIO->message.io_Offset = 0; VoiceIO->message.io_Data = PhonBuffer; VoiceIO->message.io_Length = strlen(PhonBuffer); IExec->DoIO((struct IORequest *)VoiceIO);
You can control several characteristics of the speech, as indicated in the narrator_rb struct shown in the “Device Interface” section.
Generally, the narrator device attempts to speak in a non-regional dialect of American English. With pre-V37 versions of the device, the user could change only a few of the more basic aspects of the speaking voice such as pitch, male/female, speaking rate, etc. With the V37 and later versions of the narrator device, the user can now change many more aspects of the speaking voice. In addition, in the pre-V37 device, only mouth shape changes could be queried by the user. With the V37 device, the user can also receive start of word and start of syllable synchronization events. These events can be generated independently, giving the user much greater flexibility in synchronizing voice to animation or other effects.
The following describes the fields of the narrator_rb structure:
- message.io_Data
- Points to a NULL-terminated ASCII phonetic input string. For backwards compatibility issues, the string may also be terminated with a ‘#’ symbol. See How to Write Phonetically for Narrator for details.
- message.io_Length
- Length of the input string. The narrator device will parse the input string until either a NULL or a ‘#’ is encountered, or until io_Length characters have been processed.
- reate
- The speaking rate in words/minute. Range is from 40 to 400 wpm.
- pitch
- The baseline pitch of the speaking voice. Range is 65 to 320 Hertz.
- mode
- The F0 (pitch) mode. ROBOTICF0 produces a monotone pitch, NATURALF0 produces a normal pitch contour, and MANUALF0 (new for V37 and later) gives the user more explicit control over the pitch contour by creative use of accent numbers. In MANUALF0 mode, a given accent number will have the same effect on the pitch regardless of its position in the sentence and its relation to other accented syllables. In NATURALF0 mode, accent numbers have a reduced effect towards the end of sentences (especially long ones). In addition, the proximity of other accented syllables, the number of syllables in the word, and the number of phrases and words in the sentence all affect the pitch contour. In MANUALF0 mode these things are ignored and it’s up to the user to do the controlling. This has the advantage of being able to have the pitch be more expressive. The F0enthusiasm field will scale the effect.
- sex
- Controls the sex of the speaking voice (MALE or FEMALE). In actuality, only the formant targets are changed. The user must still change the pitch and speaking rate of the voice to get the correct sounding sex. See the include files for default pitch and rate settings.
- ch_masks
- Pointer to a set of audio allocation maps. See Audio Device for details.
- nm_masks
- Number of audio allocation maps. See Audio Device for details.
- volume
- Sets the volume of the speaking voice. Range 0 - 64.
- sampfreq
- The synthesizer is “tuned” to a sampling frequency of 22,200 Hz. Changing sampfreq affects pitch and formant tunings and can be used to create unusual vocal effects. It is recommended that F1, F2, and F3adj be used instead to achieve this effect.
- mouths
- If set to a non-zero value will direct the narrator device to generate mouth shape changes and send this data to the user in response to read requests. See Reading from the Narrator Device for more details.
- chanmask
- Used internally by the narrator device. The user should not modify this field.
- numchan
- Used internally by the narrator device. The user should not modify this field.
- flags
- Used to specify features of the device. Possible bit settings are:
- NDB_NEWIORB - I/O request block uses V37 features.
- NDB_WORDSYNC - Device should generate start of word sync events.
- NDB_SYLSYNC - Device should generate start of syllable sync events.
- These bit definitions and their corresponding field definitions (NDF_NEWIORB, NDF_WORDSYNC, and NDF_SYLSYNC) can be found in the include files.
- F0enthusiasm
- The value of this field controls the scaling of pitch (F0) excursions used on accented syllables and has the effect of making the narrator device sound more or less “enthusiastic” about what it is saying. It is calibrated in 1/32s with unity (32) being the default value. Higher values cause more F0 variation, lesser values cause less. This feature is most useful in manual F0 mode.
- F0perturb
- Non-zero values in this field cause varying amounts of random low-frequency modulation of the pitch (F0). In other words, the pitch shakes in much the same way as an elderly person’s voice does. Range is 0 to 255.
- F1adj, F2adj, F3adj
- Changes the tuning of the formant frequencies. A formant is a major vocal tract resonance, and the frequencies of these formants move continuously as we speak. Traditionally, they have been given the abbreviations of F1, F2, F3... with F1 being the one lowest in frequency. Moving these formants away from their normal positions causes drastic changes in the sound of the voice and is a very powerful tool in the creation of character voices. This adjustment is in +/- 5% steps. Positive values raise the formant frequencies and vice versa. The default is zero. Use these adjustments instead of changing sampfreq.
- A1adj, A2adj, A3adj
- In a parallel formant synthesizer, the amplitudes of the formants need to be specified along with their frequencies. These fields bias the amplitudes computed by the narrator device. This is useful for creating different tonal balances (bass or treble), and listening to formants in isolation for educational purposes. The adjustments are calibrated directly in +/- 1db (decibel) steps. Using negative values will cause no problems; use of positive numbers can cause clipping. If you want to raise an amplitude, try cutting the others the same relative amount, then bring them all up equally until clipping is heard, then back them off. This should produce an optimum setting. This field has a +31 to -32 db range and the value -32db is equivalent to -infinity, shutting that formant off completely.
- articulate
- According to the popular theories of speech production, we move our articulators (jaw, tongue, lips, etc.) smoothly from one “target” position to the next. These articulatory targets correspond to acoustic targets specified by the narrator device for each phoneme. The device calculates the time it should take to get from one target to the next and this field allows you to intervene in that process. Values larger than the default will cause the transitions to be proportionately longer and vice versa. This field is calibrated in percent with 100 being the default. For example, a value of 50 will cause the transitions to take half the normal time, with the result being “sharper”, more deliberate sounding speech (not necessarily more natural). A value of 200 will cause the transitions to be twice as long, slurring the speech. Zero is a special value in the narrator device will take special measures to create no transitions at all and each phoneme will simply be abutted to the next.
- centralize
- This field together with centphon can be used to create regional accent effects by modifying vowel sounds. centralize specifies the degree (in percent) to which vowel targets are “pulled” towards the targets of the vowel specified by centphon. The default value of 0% indicates that each vowel in the utterance retains its own target values. The maximum value of 100% indicates that each vowel’s targets are replaced by the targets of the specified vowel. Intermediate values control the degree of interpolation between the utterance vowel’s targets and the targets of the vowel specified by centphon.
- centphon
- Pointer to an ASCII string specifying the vowel whose targets are used in the interpolation specified by centralize. The vowels which can be specified are: IY, IH, EH, AE, AA, AH, AO, OW, UH, ER, UW. Specifying other than these will result in an error code being returned.
- AVbias, AFbias
- Controls the relative amplitudes of the voiced and unvoiced speech sounds. Voiced sounds are those made with the vocal cords vibrating, such as vowels and some consonants like y, r, w, and m. Unvoiced sounds are made without the vocal cords vibrating and use the sound of turbulent air, such as s, t, sh, and f. Some sounds are combinations of both such as z and v. AVbias and AFbias change the default amplitude of the voiced and unvoiced components of the sounds respectively. (AV stands for Amplitude of Voicing and AF stands for Amplitude of Frication). These fields are calibrated in +/- 1db steps and have the same range as the other amplitude biases, namely +31 to -32 db. Again, positive values may cause clipping. Negative values are the most useful.
- priority
- Task priority while speaking. When the narrator device begins to synthesize a sentence, the task priority remains unchanged while it is calculating acoustic parameters. However, when speech begins at the end of this process, the priority is bumped to 100 (the default value). If you wish, you may change this to anything you want. Higher values will tend to lock out most anything while speech is going on, and lower values may cause audible breaks in the speech output. The following example shows how to issue a write request to the narrator device. The first write is done with the default parameter settings. The second write is done after modifying the first and third formant loudness and using the centralization feature.
The following example shows how to issue a write request to the narrator device. The first write is done with the default parameter settings. The second write is done after modifying the first and third formant loudness and using the centralization feature.
/* * Speak_Narrator.c * * This example program sends a string of phonetic text to the narrator * device twice, changing some of the characteristics the second time. * * Compile with gcc 4.2.4: * gcc -o Speak Speak_Narrator.c * */ #include <proto/exec.h> #include <proto/dos.h> #include <proto/utility.h> #include <devices/narrator.h> int main ( void ) { struct MsgPort* VoiceMP; struct narrator_rb* VoiceIO; STRPTR PhoneticText = "DHIHS IHZ AHMIY5GAH SPIY5KIHNX."; int8 audio_chan [ 4 ] = { 3, 5, 10, 12 }; /* Create the message port */ VoiceMP = IExec->AllocSysObjectTags ( ASOT_PORT, TAG_END ); /* Message port exists */ if ( VoiceMP ) { /* Create the I/O request */ VoiceIO = IExec->AllocSysObjectTags ( ASOT_IOREQUEST, ASOIOR_Size, sizeof ( struct narrator_rb ), ASOIOR_ReplyPort, VoiceMP, TAG_END ); /* I/O request exists */ if ( VoiceIO ) { /* Set the NEWIORB bit in the flags field to use the new fields */ VoiceIO->flags = NDF_NEWIORB; /* Open the narrator device */ if ( IExec->OpenDevice ( "narrator.device", 0, ( struct IORequest* ) VoiceIO, 0 ) ) { /* Inform user that it could not be opened */ IDOS->Printf( "Error: narrator.device did not open\n" ); } else { /* Speak the string using the default parameters */ VoiceIO->ch_masks = &audio_chan [ 0 ]; VoiceIO->nm_masks = sizeof ( audio_chan ); VoiceIO->message.io_Command = CMD_WRITE; VoiceIO->message.io_Data = PhoneticText; VoiceIO->message.io_Length = IUtility->Strlen ( PhoneticText ); IExec->DoIO ( ( struct IORequest* ) VoiceIO ); /* Now change some of the characteristics: * Raise the first formant, lower the third formant, * and move 50% of the way towards AO. * and speak it again. */ VoiceIO->A1adj = -32; /* Shut off first formant */ VoiceIO->A3adj = 11; /* Raise the third formant */ VoiceIO->centralize = 50; /* Move 50% of the way */ VoiceIO->centphon = "AO"; /* towards AO */ IExec->DoIO ( ( struct IORequest* ) VoiceIO ); /* Close the narrator device */ IExec->CloseDevice ( ( struct IORequest* ) VoiceIO ); } /* Delete the IORequest */ IExec->FreeSysObject ( ASOT_IOREQUEST, VoiceIO ); } /* Inform user that the I/O request could not be created */ else { IDOS->Printf( "Error: Could not create I/O request\n" ); } /* Delete the message port */ IExec->FreeSysObject ( ASOT_PORT, VoiceMP ); } /* Inform user that the message port could not be created */ else { IDOS->Printf ( "Error: Could not create message port\n" ); } }
Reading from the Narrator Device
All read requests to the narrator device must be matched to an associated write request. This is done by copying the narrator_rb structure used in the OpenDevice() call into the voice field of the mouth_rb I/O request structure. You must do this after the call to OpenDevice(). Matching the read and write requests allows the narrator device to coordinate I/O requests across multiple uses of the device.
In pre-V37 versions of the narrator device, only mouth shape changes can be queried from the device. This is done by setting the mouths field of the narrator_rb I/O request structure (the write request) to a non-zero value. The write request is then sent asynchronously to the device and while it is in progress, synchronous read requests are sent to the device using the mouth_rb I/O request structure. When the mouth shape has changed, the device will return the read request to the user with bit 0 set in the sync field of the mouth_rb. The fields width and height of the mouth_rb structure will contain byte values which are proportional to the actual width and height of the mouth for the phoneme currently being spoken. Read requests sent to the narrator device are not returned to the user until one of two things happen: either the mouth shape has changed (this prevents the user from having to constantly redraw the same mouth shape), or the speech has completed. The user can check io_Error to determine if the mouth shape has changed (a return code of 0) or if the speech has completed (return code of ND_NoWrite).
In addition to returning mouth shapes, reads to the V37 narrator device can also perform two new functions: word and syllable sync. To generate word and/or syllable sync events, the user must specify several bits in the flags field of the write request (narrator_rb structure). The bits are NDB_WORDSYNC and NDB_SYLSYNC, for start of word and start of syllable synchronization events, respectively, and, of course, NDB_NEWIORB, to indicate that the V37 I/O request is required.
NDB_WORDSYNC and NDB_SYLSYNC tell the device to expect read requests and to generate the appropriate event(s). As with mouth shape change events, the write request is sent asynchronously to the device and, while it is in progress, synchronous read requests are sent to the device. The sync field of the mouth_rb structure will contain flags indicating which events (mouth shape changes, word sync, and/or syllable sync) have occurred.
The returned sync field flags are:
bit 0 (0x01) | mouth shape change event |
bit 1 (0x02) | start-of-word synchronization event |
bit 2 (0x04) | start-of-syllable synchronization event |
One or more flags may be set for any particular read.
As with mouth shape changes, read requests will not return until the requested event(s) have occurred, and the user must test the io_Error field of the mouth_rb structure to tell when the speech has completed (an error return of ND_NoWrite).
Several read events can be compressed into a single event. This can occur in two ways: first when two dissimilar events occur between two successive read requests. For example, a single read may return both a mouth change and a syllable sync event. This should not present a problem if the user checks for all events. The second is when multiple events of the same type occur between successive read requests. This is of no great concern in dealing with mouth shape changes because, presumably, mouth events are used to drive animation, and the animation procedure will simply draw the current mouth shape.
Watch Those Sync Events |
---|
When word or syllable sync is desired, the narrator device may compress multiple sync events into a single sync event. Missing a word or syllable sync may cause word highlighting (for example) to lose sync with the speech output. A future version of the device will include an extension to the mouth_rb I/O request structure which will contain word and syllable counts and, possibly, other synchronization methods. |
The following code fragment shows the basics of how to perform reads from the narrator device. For a more complete example, see the sample program at the end of this article. For this fragment, take the code of the previous write example as a starting point. Then the following code would need to be added:
struct mouth_rb *MouthIO; /* Pointer to read IORequest block */ struct MsgPort *MouthMP; /* Pointer to read message port */ /* * (1) Create a message port for the read request. */ if (!(MouthMP = CreatePort("narrator_read", 0L))) BellyUp("Read CreatePort failed"); /* * (2) Create an extended IORequest of type mouth_rb. */ if (!(MouthIO = (struct mouth_rb *)CreateExtIO(MouthMP, sizeof(struct mouth_rb)))) BellyUp("Read CreateExtIO failed"); /* * (3) Set up the read IORequest. Must be done after the call to OpenDevice(). * We assume that the write IORequest and the OpenDevice have been done */ MouthIO->voice = *SpeakIO; MouthIO->voice.message.io_Message.mn_ReplyPort = ReadMsgPort; MouthIO->voice.message.io_Command = CMD_READ; /* * (4) Set the flags field of the narrator_rb write request to return the desired * sync events. If mouth shape changes are required, then the mouths field * of the IORequest should be set to a non-zero value. */ SpeakIO->mouths = 1; /* Generate mouth shape changes */ SpeakIO->flags = NDF_NEWIORB | /* Indicates V37 style IORequest */ NDF_WORDSYNC | /* Request start-of-word sync events */ NDF_SYLSYNC; /* Request start-of-syllable sync events */ /* * (5) Issue asynchronous write request. The driver initiates the write request * and returns immediately. */ SendIO(SpeakIO); /* * (6) Issue synchronous read requests. For each request we check the sync field * to see which events have occurred. Since any combination of events can * be returned in a single read, we must check all possibilities. We * continue looping until the read request returns an error of ND_NoWrite, * which indicates that the write request has completed. */ for (DoIO(MouthIO);MouthIO->voice.message.io_Error != ND_NoWrite;DoIO(MouthIO)) { if (MouthIO->sync & 0x01) DoMouthShape(); if (MouthIO->sync & 0x02) DoWordSync(); if (MouthIO->sync & 0x04) DoSyllableSync(); } /* * (7) Finally, we must perform a WaitIO() on the original write request. */ WaitIO(SpeakIO);
How to Write Phonetically for Narrator
This section describes in detail the procedure used to specify phonetic strings to the narrator speech synthesizer. No previous experience with phonetics is required. The only thing you may need is a good pronunciation dictionary for those times when you doubt your own ears. You do not have to learn a foreign language or computer language. You are just going to learn how to write down the English that comes out of your own mouth. In writing phonetically you do not have to know how a word is spelled, just how it is said.
Table of Phonemes
Phoneme | Example |
---|---|
IY | beet, eat |
IH | bit, in |
EH | bet, end |
AE | bat, ad |
AA | bottle, on |
AH | but, up |
AO | ball, awl |
UH | book, look |
ER | bird, early |
OH | border |
AX* | about |
IX* | information, infinite |
* AX and IX should never be used in stressed syllables.
Phoneme | Example |
---|---|
EY | bay, aid |
AY | bide, I |
OY | boy, oil |
AW | bound, owl |
OW | boat, own |
UW | brew, boolean |
Phoneme | Example |
---|---|
R | red |
L | long |
W | wag |
Y | yellow, comp(Y)uter |
M | men |
N | no |
NX | sing |
SH | shy |
S | soon |
TH | thin |
F | fed |
ZH | pleasure |
Z | has, zoo |
DH | then |
V | very |
WH | when |
CH | check |
J | judge |
/H | hole |
/C | loch |
B | but |
P | put |
D | dog |
T | toy |
K | keg, copy |
G | guest |
Phoneme | Example | Explanation |
---|---|---|
DX | pity | tongue flap |
LX | fall | |
RX | star | |
Q | kitt(Q)en | glottal stop |
QX | silent vowel |
UL | = | AXL |
IL | = | IXL |
UM | = | AXM |
IM | = | IXM |
UN | = | AXN |
IN | = | IXN |
Digits 1-9 | Syllabic stress, ranging from secondary through emphatic |
. | Period - sentence final character. |
? | Question mark - sentence final character |
- | Dash - phrase delimiter |
, | Comma - clause delimiter |
() | Parentheses - noun phrase delimiters (see text) |
The narrator device works on utterances at the sentence level. Even if you want to say only one word, it will treat it as a complete sentence. Therefore, narrator wants one of two punctuation marks to appear at the end of every sentence - a period or a question mark. The period is used for almost all utterances and will cause a final fall in pitch to occur at the end of a sentence. The question mark is used at the end of yes/no questions only, and results in a final rise in pitch.
For example, the question, Do you enjoy using your Amiga? would take a question mark at the end, while the question, What is your favorite color? should be followed (in the phonetic transcription) with a period. If no punctuation appears at the end of a string, narrator will append a dash to it, which will result in a short pause. Narrator recognizes other punctuation marks as well, but these are left for later discussion.
Phonetic Spelling
Utterances are usually written phonetically using an alphabet of symbols known as IPA (International Phonetic Alphabet). This alphabet is found at the front of most good dictionaries. The symbols can be hard to learn and were not readily available on computer keyboards, so the Advanced Research Projects Agency (ARPA) came up with the ARPABET, a way of representing each symbol using one or two upper case letters. Narrator uses an expanded version of the ARPABET to specify phonetic sounds.
A phonetic sound, or phoneme, is a basic speech sound, a speech atom. Working backwards: sentences can be broken into words, words into syllables, and syllables into phonemes. The word cat has three letters and (coincidentally) three phonemes. Looking at the table of phonemes we find the three sounds that make up the word cat. They are the phonemes K, AE, and T, written as KAET. The word cent translates as SEHNT. Notice that both words begin with the letter c, but because they are pronounced differently they have different phonetic spellings. These examples introduce a very important concept of phonetic spelling: spell it like it sounds, not like it looks.
Choosing the Right Vowel
Phonemes, like letters, are divided into two categories: vowels and consonants. Loosely defined, a vowel is a continuous sound made with the vocal cords vibrating and air exiting the mouth (as opposed to the nose). A consonant is any other sound, such as those made by rushing air (like S or TH), or by interruptions in the air flow by the lips or tongue (B or T). All vowels use a two letter ASCII phonetic code while consonants use a one or two letter code.
In English we write with only five vowels: a, e, i, o, and u. It would be easy if we only said five vowels. However, we say more than 15 vowels. Narrator provides for most of them. Choose the proper vowel by listening: Say the word aloud, perhaps extending the vowel sound you want to hear and then compare the sound you are making to the sounds made by the vowels in the examples on the phoneme list. For example, the a in apple sounds the same as the a in cat, not like the a in Amiga, talk, or made. Notice also that some of the example words in the list do not even use any of the same letters contained in the phoneme code; for example AA as in bottle.
Vowels are divided into two groups: those that maintain the same sound throughout their durations and those that change their sound. The ones that change are called diphthongs. Some of us were taught the terms long and short to describe vowel sounds. Diphthongs fall into the long category, but these two terms are inadequate to fully differentiate between vowels and should be avoided. The diphthongs are the last six vowels listed in the table. Say the word made out loud very slowly. Notice how the a starts out like the e in bet but ends up like the e in beet. The a, therefore, is a diphthong in this word and we would use EY to represent it. Some speech synthesis systems require you to specify the changing sounds in diphthongs as separate elements, but narrator takes care of the assembly of diphthongal sounds for you.
Choosing the Right Consonant
Consonants are divided into many categories by phoneticians, but we need not concern ourselves with most of them. Picking the correct consonant is very easy if you pay attention to just two categories: voiced and unvoiced. A voiced consonant is made with the vocal cords vibrating, and an unvoiced one is made when the vocal cords are silent. Sometimes English uses the same letter combinations to represent both. Compare the th in thin with the th in then. Notice that the first is made with air rushing between the tongue and upper teeth. In the second, the vocal cords are vibrating also. The voiced th phoneme is DH and the unvoiced one is TH. Therefore, thin is phonetically spelled as THIHN while the word then is spelled DHEHN.
A sound that is particularly subject to mistakes is voiced and unvoiced s, phonemes Z and S, respectively. Clearly the word bats ends with an S and the word has ends with a Z. But, how do you spell close? If you say “What time do you close?”, you spell it with a Z, and if you are saying “I love to be close to you.” you use an S.
Another sound that causes some confusion is the r sound. There are two different r-like phonemes in the Narrator alphabet: R under the consonants and ER under the vowels. Use ER if the r sound is the vowel sound in the syllable like in bird, absurd, and flirt. Use the R if the r sound precedes or follows another vowel sound in that syllable as in car, write, and craft.
Contractions and Special Symbols
There are several phoneme combinations that appear very often in English words. Some of these are caused by our laziness in pronunciation. Take the word connector for example. The o in the first syllable is almost swallowed out of existence. You would not use the AA phoneme; you would use the AX phoneme instead. It is because of this relaxation of vowels that we find ourselves using AX and IX very often. Since this relaxation frequently occurs before l, m, and n, narrator has a shortcut for typing these combinations. Instead of personal being spelled PERSIXNAXL, we can spell it PERSINUL, making it a little more readable. Anomaly goes from AXNAAMAXLIY to UNAAMULIY, and KAAMBIXNEYSHIXN becomes KAAMBINEYSHIN for combination. It may be hard to decide whether to use the AX or IX brand of relaxed vowel. The only way to find out is to use both and see which sounds best.
Other special symbols are used internally by narrator. Sometimes they are inserted into or substituted for part of your input sentence. You can type them in directly if you wish. The most useful is probably the Q or glottal stop, an interruption of air flow in the glottis. The word Atlantic has one between the t and the l. Narrator knows there should be a glottal stop there and saves you the trouble of typing it. But narrator is only close to perfect, so sometimes a word or word pair might slip by that would have sounded better with a Q stuck in someplace.
Stress and Intonation
It is not enough to tell narrator what you want said. For the best results you must also tell narrator how you want it said. In this way you can alter a sentence’s meaning, stress important words, and specify the proper accents in polysyllabic words. These things improve the naturalness and thus the intelligibility of the spoken output.
Stress and intonation are specified by the single digits 1-9 following a vowel phoneme code. Stress and intonation are two different things, but are specified by a single number.
Stress is, among other things, the elongation of a syllable. A syllable is either stressed or not, so the presence of a number after the vowel in a syllable indicates stress on that syllable. The value of the number indicates the intonation. These numbers are referred to here as stress marks but keep in mind that they also affect intonation.
Intonation here means the pitch pattern or contour of an utterance. The higher the stress mark, the higher the potential for an accent in pitch. A sentence’s basic contour is comprised of a quickly rising pitch gesture up to the first stressed syllable in the sentence, followed by a slowly declining tone throughout the sentence, and finally, a quick fall to a low pitch on the last syllable. The presence of additional stressed syllables causes the pitch to break its slow, declining pattern with rises and falls around each stressed syllable. Narrator uses a very sophisticated procedure to generate natural pitch contours based on how you mark the stressed syllables.
How and Where to Put the Stress Marks
The stress marks go immediately to the right of vowel phoneme codes. The word cat has its stress marked after the AE, e.g., KAE5T. You generally have no choice about the location of a number; there is definitely a right and wrong location. A number should either go after a vowel or it should not. Narrator will not flag an error if you forget to put a stress mark in or if you place it on the wrong vowel. It will only tell you if a stress mark has been put after a non-vowel, i.e., consonant or punctuation.
The rules for placing stress marks are as follows:
- Always place a stress mark in a content word. A content word is one that contains some meaning. Nouns, verbs, and adjectives are all content words, they tell the listener what you are talking about. Words like but, if, and the are not content words. They do not convey any real world meaning, but are required to make the sentence function, so they are given the name function words.
- Always place a stress mark on the accented syllable(s) of polysyllabic words, whether they are content or function words. A polysyllabic word is any word of more than one syllable. Commodore has its stress (often called accent) on the first syllable and would be spelled KAA5MAXDOHR, while computer is stressed on the second syllable: KUMPYUW5TER.
If you are in doubt about which syllable gets the stress, look up the word in a dictionary and you will find an accent mark over the stressed syllable. If more than one syllable in a word receives stress, they usually are not of equal value. These are referred to as primary and secondary stresses. The word understand has its first and last syllables stressed, with the syllable stand getting the primary stress and the syllable un getting the secondary stress. This produces the phonetic representation AH1NDERSTAE4ND. Syllables with secondary stress should be marked with a value of only 1 or 2.
Compound words (words with more than one root) such as baseball, software, and lunchwagon can be written as one word, but should be thought of as separate words when marking stress. Thus, lunchwagon would be spelled LAH5NCHWAE2GIN. Notice that the lunch got a higher stress mark than the wagon. This is common in compound words, the first word usually receives the primary stress.
Which Stress Value Do I Use?
If you get the spelling and stress mark positions correct, you are 95 percent of the way to a good sounding sentence. The next thing to do is decide on the stress mark values. They can be roughly related to parts of speech, and you can use the table shown below as a guide to assigning values.
Part of Speech | Stress Value |
---|---|
Exclamations | 9 |
Adverbs | 7 |
Quantifiers | 7 |
Nouns | 5 |
Adjectives | 5 |
Verbs | 4 |
Pronouns | 3 |
Secondary stress | 1 or 2 |
Everything else | None |
The above values merely suggest a range. If you want attention directed to a certain word, raise its value. If you want to downplay a word, lower it. Sometimes even a function word can be the focus of a sentence. It is quite conceivable that the word to in the sentence Please deliver this to Mr. Smith. could receive a stress mark of 9. This would add focus to the word, indicating that the item should be delivered to Mr. Smith in person.
Punctuation
In addition to the period or question mark that is required at the end of a sentence, Narrator also recognizes dashes, commas, and parentheses.
The comma goes where you would normally put a comma in an English sentence. It causes narrator to pause with a slightly rising pitch, indicating that there is more to come. The use of additional commas—that is, more than would be required for written English—is often helpful. They serve to set clauses off from one another. There is a tendency for a listener to lose track of the meaning of a sentence if the words run together. Read your sentence aloud while pretending to be a newscaster. The locations for additional commas should leap out at you.
The dash serves almost the same purpose as the comma, except that the dash does not cause the pitch to rise so severely. A rule of thumb is: Use dashes to divide phrases and commas to divide clauses.
Parentheses provide additional information to narrator’s intonation function. They should be put around noun phrases of two or more content words. This means that the noun phrase, a giant yacht should be surrounded with parentheses because it contains two content words, giant and yacht. The phrase my friend should not have parentheses around it because it contains only one content word. Noun phrases can get fairly large, like the best time I’ve ever had or a big basket of fruit and nuts. The parentheses are most effective around these large phrases; the smaller ones can sometimes go without. The effect of parentheses is subtle, and in some sentences you might not notice their presence. In sentences of great length, however, they help provide for a very natural contour.
Hints for Intelligibility
There are a few tricks you can use to improve the intelligibility of a sentence. Often, a polysyllabic word is more recognizable than a monosyllabic word. For instance, instead of saying huge, say enormous. The longer version contains information in every syllable, thus giving the listener a greater chance to hear it correctly.
Another good practice is to keep sentences to an optimal length. Writing for reading and writing for speaking are two different things. Try not to write a sentence that cannot be easily spoken in one breath. Such a sentence tends to give the impression that the speaker has an infinite lung capacity and sounds unnatural. Try to keep sentences confined to one main idea; run-on sentences tend to lose their meaning.
New terms should be highly stressed the first time they are heard. This gives the listener something to cue on, and can aid in comprehension.
The insertion of the glottal stop phoneme Q at the end of a word can sometimes help prevent slurring of one word into another. When we speak, we do not pause at the end of each word, but instead transition smoothly between words. This can sometimes reduce intelligibility by eliminating word boundary cues. Placing a Q, (not the silent vowel QX) at the end of a word results in some phonological effects taking place which can restore the word boundary cues.
Example of English and Phonetic Texts
Cardiomyopathy. I had never heard of it before, but there it was listed as the form of heart disease that felled not one or two but all three of the artificial heart recipients. A little research produced some interesting results. According to an article in the Nov. 8, 1984, New England Journal of Medicine, cigarette smoking causes this lethal disease that weakens the heart’s pumping power. While the exact mechanism is not clear, Dr. Arthur J. Hartz speculated that nicotine or carbon monoxide in the smoke somehow poisons the heart and leads to heart failure.
KAA1RDIYOWMAYAA5PAXTHIY. AY /HAED NEH1VER /HER4D AXV IHT BIXFOH5R, BAHT DHEH5R IHT WAHZ - LIH4STIXD AEZ (DHAX FOH5RM AXV /HAA5RT DIHZIY5Z) DHAET FEH4LD (NAAT WAH5N OHR TUW5) - BAHT (AO7L THRIY5 AXV DHAX AA5RTAXFIHSHUL /HAA5RTQ RIXSIH5PIYINTS). (AH LIH5TUL RIXSER5CH) PROHDUW5ST (SAHM IH5NTRIHSTIHNX RIXZAH5LTS). AHKOH5RDIHNX TUW (AEN AA5RTIHKUL IHN DHAX NOWVEH5MBER EY2TH NAY5NTIYNEYTIYFOH1R NUW IY5NXGLIND JER5NUL AXV MEH5DIXSIN), (SIH5GEREHT SMOW5KIHNX) KAO4ZIHZ (DHIHS LIY5THUL DIHZIY5Z) DHAET WIY4KINZ (DHAX /HAA5RTS PAH4MPIHNX PAW2ER). WAYL (DHIY IHGZAE5KT MEH5KINIXZUM) IHZ NAAT KLIY5R, DAA5KTER AA5RTHER JEY2 /HAARTS SPEH5KYULEYTIHD DHAET NIH5KAXTIYN, OHR KAA5RBIN MUNAA5KSAYD IHN DHAX SMOW5K - SAH5M/HAW1 POY4ZINZ DHAX /HAA5RT, AEND LIY4DZ TUW (/HAA5RT FEY5LYER).
Concluding Remarks
This guide should get you off to a good start in phonetic writing for Narrator. The only way to get really proficient is to practice. Many people become good at it in as little as one day. Others make continual mistakes because they find it hard to let go of the rules of English spelling, so trust your ears.
A More Technical Explanation
The narrator speech synthesizer is a computer model of the human speech production process. It attempts to produce accurately spoken utterances of any English sentence, given only a phonetic representation as input. Another program in the Amiga speech system, the translator device, derives the required phonetic spelling from English text. Timing and pitch contours are produced automatically by the synthesizer software.
In humans, the physical act of producing speech sounds begins in the lungs. To create a voiced sound, the lungs force air through the vocal folds (commonly called the vocal cords), which are held under tension and which periodically interrupt the flow of air, thus creating a buzz-like sound. This buzz, which has a spectrum rich in harmonics, then passes through the vocal tract and out the lips and nose, which alters its spectrum drastically. This is because the vocal tract acts as a frequency filter, selectively reinforcing some harmonics and suppressing others. It is this filtering that gives a speech sound its identity. The amplitude versus frequency graph of the filtering action is called the vocal tract transfer function. Changing the shape of the throat, tongue, and mouth retunes the filter system to accentuate different frequencies.
The sound travels as a pressure wave through the air, and it causes the listener’s eardrum to vibrate. The ear and brain of the listener decode the incoming frequency pattern. From this the listener can subconsciously make a judgement about what physical actions were performed by the speaker to make the sound. Thus the speech chain is completed, the speaker having encoded his physical actions on a buzz via selective filtering and the listener having turned the sound into guesses about physical actions by frequency decoding.
Now that we know how humans produce speech, how does the Amiga do it? It turns out that the vocal tract transfer function is not random, but tends to accentuate energy in narrow bands called formants. The formant positions move fairly smoothly as we speak, and it is the formant frequencies to which our ears are sensitive. So, luckily, we do not have to model throat, tongue, teeth and lips with our computer, we can imitate formant actions instead.
A good representation of speech requires up to five formants, but only the lowest three are required for intelligibility. The pre-V37 Narrator had only three formants, while the V37 Narrator has five formants for a more natural sounding voice. We begin with an oscillator that produces a waveform similar to that which is produced by the vocal folds, and we pass it through a series of resonators, each tuned to a different formant frequency. By controlling the volume and pitch of the oscillator and the frequencies of the resonators, we can produce highly intelligible and natural-sounding speech. Of course the better the model the better the speech; but more importantly, experience has shown that the better the control of the model’s parameters, the better the speech.
Oscillators, volume controls, and resonators can all be simulated mathematically in software, and it is by this method that the narrator system operates. The input phonetic string is converted into a series of target values for the various parameters. A system of rules then operates on the string to determine things such as the duration of each phoneme and the pitch contour. Transitions between target values are created and smoothed to produce natural, continuous changes from one sound to the next.
New values are computed for each parameter for every 8 milliseconds of speech, which produces about 120 acoustic changes per second. These values drive a mathematical model of the speech synthesizer. The accuracy of this simulation is quite good. Human speech has more formants that the narrator model, but they are high in frequency and low in energy content.
The human speech production mechanism is a complex and wonderful thing. The more we learn about it, the better we can make our computer simulations. Meanwhile, we can use synthetic speech as yet another computer output device to enhance the man/machine dialogue.
Example Speech and Mouth Movement Program
/* * Full_Narrator.c * * This example program sends a string of phonetic text to the narrator * device and, while it is speaking, highlights, word-by-word, a * corresponding English string. In addition, mouth movements are drawn * in a separate window. * * Compile with SAS C 5.10 lc -b1 -cfistq -v -y -L * * Requires Kickstart V37 or greater. */ #include "exec/types.h" #include "exec/memory.h" #include "dos/dos.h" #include "intuition/intuition.h" #include "ctype.h" #include "exec/exec.h" #include "fcntl.h" #include "devices/narrator.h" #include "clib/exec_protos.h" #include "clib/alib_protos.h" #include "clib/intuition_protos.h" #include "clib/graphics_protos.h" #include "clib/dos_protos.h" #include "stdlib.h" #include "string.h" #include "stdio.h" #ifdef LATTICE int CXBRK(void) { return(0); } /* Disable SAS CTRL/C handling */ int chkabort(void) { return(0); } /* really */ #endif /* * Due to an omission, the sync field defines were not included in older * versions of the narrator device include files. So, if they haven't * already been defined, do so now. */ #ifndef NDF_READMOUTH /* Already defined ? */ #define NDF_READMOUTH 0x01 /* No, define here */ #define NDF_READWORD 0x02 #define NDF_READSYL 0x04 #endif #define PEN3 3 /* Drawing pens */ #define PEN2 2 #define PEN1 1 #define PEN0 0 BOOL FromCLI = TRUE; BYTE chans[4] = {3, 5, 10, 12}; LONG EyesLeft; /* Left edge of left eye */ LONG EyesTop; /* Top of eyes box */ LONG EyesBottom; /* Bottom of eyes box */ LONG YMouthCenter; /* Pixels from top edge */ LONG XMouthCenter; /* Pixels from left edge */ LONG LipWidth, LipHeight; /* Width and height of mouth */ struct TextAttr MyFont = {"topaz.font", TOPAZ_SIXTY, FS_NORMAL, FPF_ROMFONT,}; struct IntuitionBase *IntuitionBase = NULL; struct GfxBase *GfxBase = NULL; struct MsgPort *VoicePort = NULL; struct MsgPort *MouthPort = NULL; struct narrator_rb *VoiceIO = NULL; struct mouth_rb *MouthIO = NULL; struct IntuiText HighLight; struct NewWindow NewWindow; struct Window *TextWindow; struct Window *FaceWindow; struct RastPort *FaceRast; void main(int argc, char **argv) { LONG i; LONG sentence; LONG Offset; LONG CharsLeft; LONG ScreenPos; LONG WordLength; LONG LineNum; UBYTE *Tempptr; UBYTE *English; UBYTE *OldEnglish; UBYTE c; UBYTE *PhonPtr; /* Pointer to phonetic text */ LONG PhonSize; /* Size of phonetic text */ UBYTE *PhonStart[100]; /* Start of phonetic sentences */ LONG NumPhonStarts; /* Number of phonetic sentences */ UBYTE *EngPtr; /* Pointer to English text */ LONG EngSize; /* Size of English text */ UBYTE *EngStart[100]; /* Start of English sentences */ LONG NumEngStarts; /* Number of English sentences */ UBYTE *EngLine[24]; /* Start of line on screen */ LONG EngBytes[24]; /* Bytes per line on screen */ LONG NumEngLines; /* Number of lines on screen */ extern void Cleanup(UBYTE *errmsg); extern void ClearWindow(struct Window *TextWindow); extern void DrawFace(void); extern void UpdateFace(void); /* * (0) Note whether the program was started from the CLI or from * Workbench. */ if (argc == 0) FromCLI = FALSE; /* * (1) Setup the phonetic text to be spoken. If there are any non- * alphabetic characters in the text (such as NEWLINES or TABS) * replace them with spaces. Then break up the text into sentences, * storing the start of each sentence in PhonStart array elements. */ PhonPtr = "KAA1RDIYOWMAYAA5PAXTHIY. AY /HAED NEH1VER /HER4D AXV IHT " "BIXFOH5R, BAHT DHEH5R IHT WAHZ - LIH4STIXD AEZ (DHAX FOH5RM " "AXV /HAA5RT DIHZIY5Z) DHAET FEH4LD (NAAT WAH5N OHR TUW5) - " "BAHT (AO7L THRIY5 AXV DHAX AA5RTAXFIHSHUL /HAA5RTQ " "RIXSIH5PIYINTS). (AH LIH5TUL RIXSER5CH) PROHDUW5ST (SAHM " "IH5NTRIHSTIHNX RIXZAH5LTS). AHKOH5RDIHNX TUW (AEN AA5RTIHKUL " "IHN DHAX NOWVEH5MBER EY2THQX NAY5NTIYNEYTIYFOH1R NUW IY5NXGLIND " "JER5NUL AXV MEH5DIXSIN), (SIH5GEREHT SMOW5KIHNX) KAO4ZIHZ " "(DHIHS LIY5THUL DIHZIY5Z) DHAET WIY4KINZ (DHAX /HAA5RTS " "PAH4MPIHNX PAW2ER). WAYL (DHIY IHGZAE5KT MEH5KINIXZUM) IHZ " "NAAT KLIY5R, DAA5KTER AA5RTHER JEY2 /HAARTS SPEH5KYULEYTIHD " "DHAET NIH4KAXTIY2N- OHR KAA5RBIN MUNAA5KSAYD IHN DHAX SMOW5K- " "SAH5M/HAW1 POY4ZINZ DHAX /HAA5RT, AEND LIY4DZ TUW (/HAA5RT " "FEY5LYER)."; PhonSize = strlen(PhonPtr); NumPhonStarts = 0; PhonStart[NumPhonStarts++] = PhonPtr; for (i = 0; i < PhonSize; ++i) { if (isspace((int)(c = *PhonPtr++))) *(PhonPtr-1) = ' '; if ((c == '.') || (c == '?')) { *PhonPtr = '\0'; PhonStart[NumPhonStarts++] = ++PhonPtr; } } /* * (2) Create the English text corresponding to the phonetic text above. * As before, insure that there are no TABS or NEWLINES in the text. * Break the text up into sentences and store the start of each * sentence in EngStart array elements. */ EngPtr = "Cardiomyopathy. I had never heard of it before, but there it was " "listed as the form of heart disease that felled not one or two but " "all three of the artificial heart recipients. A little research " "produced some interesting results. According to an article in the " "November 8, 1984, New England Journal of Medicine, cigarette smoking " "causes this lethal disease that weakens the heart's pumping power. " "While the exact mechanism is not clear, Doctor Arthur J Hartz " "speculated that nicotine or carbon monoxide in the smoke somehow " "poisons the heart and leads to heart failure."; EngSize = strlen(EngPtr); NumEngStarts = 0; EngStart[NumEngStarts++] = EngPtr; for (i = 0; i < EngSize; ++i) { if (isspace((int)(c = *EngPtr++))) *(EngPtr-1) = ' '; if ((c == '.') || (c == '?')) { *EngPtr = '\0'; EngStart[NumEngStarts++] = ++EngPtr; } } /* * (3) Open Intuition and Graphics libraries. */ if (!(IntuitionBase=(struct IntuitionBase *)OpenLibrary("intuition.library",0))) Cleanup("can't open intuition"); if ((GfxBase=(struct GfxBase *)OpenLibrary("graphics.library", 0)) == NULL) Cleanup("can't open graphics"); /* * (4) Setup the NewWindow structure for the text display and * open the text window. */ NewWindow.LeftEdge = 20; NewWindow.TopEdge = 100; NewWindow.Width = 600; NewWindow.Height = 80; NewWindow.DetailPen = 0; NewWindow.BlockPen = 1; NewWindow.Title = " Narrator Demo "; NewWindow.Flags = SMART_REFRESH | ACTIVATE | WINDOWDEPTH | WINDOWDRAG; NewWindow.IDCMPFlags = NULL; NewWindow.Type = WBENCHSCREEN; NewWindow.FirstGadget = NULL; NewWindow.CheckMark = NULL; NewWindow.Screen = NULL; NewWindow.BitMap = NULL; NewWindow.MinWidth = 600; NewWindow.MinHeight = 80; NewWindow.MaxWidth = 600; NewWindow.MaxHeight = 80; if ((TextWindow = (struct Window *)OpenWindow(*NewWindow)) == NULL) Cleanup("Text window could not be opened"); /* * (4) Setup the NewWindow structure for the face display, open the * window, cache the RastPort pointer, and draw the initial face. */ NewWindow.LeftEdge = 20; NewWindow.TopEdge = 12; NewWindow.Width = 120; NewWindow.Height = 80; NewWindow.DetailPen = 0; NewWindow.BlockPen = 1; NewWindow.Title = " Face "; NewWindow.Flags = SMART_REFRESH | WINDOWDEPTH | WINDOWDRAG; NewWindow.IDCMPFlags = NULL; NewWindow.Type = WBENCHSCREEN; NewWindow.FirstGadget = NULL; NewWindow.CheckMark = NULL; NewWindow.Screen = NULL; NewWindow.BitMap = NULL; NewWindow.MinWidth = 120; NewWindow.MinHeight = 80; NewWindow.MaxWidth = 120; NewWindow.MaxHeight = 80; if ((FaceWindow = (struct Window *)OpenWindow(*NewWindow)) == NULL) Cleanup("Face window could not be opened"); FaceRast = FaceWindow->RPort; DrawFace(); /* * (5) Create read and write msg ports. */ if ((MouthPort = CreatePort(NULL,0)) == NULL) Cleanup("Can't get read port"); if ((VoicePort = CreatePort(NULL,0)) == NULL) Cleanup("Can't get write port"); /* * (6) Create read and write I/O request blocks. */ if (!(MouthIO = (struct mouth_rb *) CreateExtIO(MouthPort,sizeof(struct mouth_rb)))) Cleanup("Can't get read IORB"); if (!(VoiceIO = (struct narrator_rb *) CreateExtIO(VoicePort,sizeof(struct narrator_rb)))) Cleanup("Can't get write IORB"); /* * (7) Set up the write I/O request block and open the device. */ VoiceIO->ch_masks = *chans[0]; VoiceIO->nm_masks = sizeof(chans); VoiceIO->message.io_Command = CMD_WRITE; VoiceIO->flags = NDF_NEWIORB; if (OpenDevice("narrator.device", 0, VoiceIO, 0) != NULL) Cleanup("OpenDevice failed"); /* * (8) Set up the read I/O request block. */ MouthIO->voice.message.io_Device = VoiceIO->message.io_Device; MouthIO->voice.message.io_Unit = VoiceIO->message.io_Unit; MouthIO->voice.message.io_Message.mn_ReplyPort = MouthPort; MouthIO->voice.message.io_Command = CMD_READ; /* * (9) Initialize highlighting IntuiText structure. */ HighLight.FrontPen = 1; HighLight.BackPen = 0; HighLight.DrawMode = JAM1; HighLight.ITextFont = *MyFont; HighLight.NextText = NULL; /* * (10) For each sentence, put up the English text in BLACK. As * Narrator says each word, highlight that word in BLUE. Also * continuously draw mouth shapes as Narrator speaks. */ for (sentence = 0; sentence < NumPhonStarts; ++sentence) { /* * (11) Begin by breaking the English sentence up into lines of * text in the window. EngLine is an array containing a * pointer to the start of each English text line. */ English = EngStart[sentence] + strspn((UBYTE *)EngStart[sentence], " "); NumEngLines = 0; EngLine[NumEngLines++] = English; CharsLeft = strlen(English); while (CharsLeft > 51) { for (Offset = 51; *(English+Offset) != ' '; --Offset) ; EngBytes[NumEngLines-1] = Offset; English += Offset + 1; *(English-1) = '\0'; EngLine[NumEngLines++] = English; CharsLeft -= Offset + 1; } EngBytes[NumEngLines-1] = CharsLeft; /* * (12) Clear the window and draw in the unhighlighted English text. */ ClearWindow(TextWindow); HighLight.FrontPen = 1; HighLight.LeftEdge = 10; HighLight.TopEdge = 20; for (i = 0; i < NumEngLines; ++i) { HighLight.IText = EngLine[i]; PrintIText(TextWindow->RPort, *HighLight, 0, 0); HighLight.TopEdge += 10; } HighLight.TopEdge = 20; HighLight.FrontPen = 3; HighLight.IText = EngLine[0]; /* * (13) Set up the write request with the address and length of * the phonetic text to be spoken. Also tell device to * generate mouth shape changes and word sync events. */ VoiceIO->message.io_Data = PhonStart[sentence]; VoiceIO->message.io_Length = strlen(VoiceIO->message.io_Data); VoiceIO->flags = NDF_NEWIORB | NDF_WORDSYNC; VoiceIO->mouths = 1; /* * (14) Send the write request to the device. This is an * asynchronous write, the device will return immediately. */ SendIO(VoiceIO); /* * (15) Initialize some variables. */ ScreenPos = 0; LineNum = 0; English = EngLine[LineNum]; OldEnglish = English; MouthIO->voice.message.io_Error = 0; /* * (16) Issue synchronous read requests. For each request we * check the sync field to see if the read returned a mouth * shape change, a start of word sync event, or both. We * continue issuing read requests until we get a return code * of ND_NoWrite, which indicates that the write has finished. */ for (DoIO(MouthIO);MouthIO->voice.message.io_Error != ND_NoWrite;DoIO(MouthIO)) { /* * (17) If bit 1 of the sync field is on, this is a start * of word sync event. In that case we highlight the * next word. */ if (MouthIO->sync * NDF_READWORD) { if ((Tempptr = strchr(English, ' ')) != NULL) { English = Tempptr + 1; *(English-1) = '\0'; } PrintIText(TextWindow->RPort, *HighLight, 0, 0); WordLength = strlen(OldEnglish) + 1; HighLight.IText = English; OldEnglish = English; ScreenPos += WordLength; if (ScreenPos >= EngBytes[LineNum]) { HighLight.LeftEdge = 10; HighLight.TopEdge += 10; ScreenPos = 0; English = OldEnglish = EngLine[++LineNum]; HighLight.IText = English; } else HighLight.LeftEdge += 10*WordLength; } /* * (18) If bit 0 of the sync field is on, this is a mouth * shape change event. In that case we update the face. */ if (MouthIO->sync * NDF_READMOUTH) UpdateFace(); } /* * (19) The write has finished (return code from last read equals * ND_NoWrite). We must wait on the write I/O request to * remove it from the message port. */ WaitIO(VoiceIO); } /* * (20) Program completed, cleanup and return. */ Cleanup("Normal completion"); } void Cleanup(UBYTE *errmsg) { /* * (1) Cleanup and go away. This routine does not return but EXITs. * Everything it does is pretty self explanatory. */ if (FromCLI) printf("%s\n\r", errmsg); if (TextWindow) CloseWindow(TextWindow); if (FaceWindow) CloseWindow(FaceWindow); if (VoiceIO ** VoiceIO->message.io_Device) CloseDevice(VoiceIO); if (VoiceIO) DeleteExtIO(VoiceIO); if (VoicePort) DeletePort(VoicePort); if (MouthIO) DeleteExtIO(MouthIO); if (MouthPort) DeletePort(MouthPort); if (GfxBase) CloseLibrary(GfxBase); if (IntuitionBase) CloseLibrary(IntuitionBase); exit(RETURN_OK); } void ClearWindow(struct Window *TextWindow) { LONG OldPen; /* * (1) Clears a window. */ OldPen = (LONG)TextWindow->RPort->FgPen; SetAPen(TextWindow->RPort, 0); SetDrMd(TextWindow->RPort, JAM1); RectFill(TextWindow->RPort, 3, 12, TextWindow->Width-3, TextWindow->Height-2); SetAPen(TextWindow->RPort, OldPen); } void DrawFace() { /* * (1) Draws the initial face. The variables defined here are used in * UpdateFace() to redraw the mouth shape. */ EyesLeft = 15; EyesTop = 20; EyesBottom = 35; XMouthCenter = FaceWindow->Width >> 1; YMouthCenter = FaceWindow->Height - 25; SetAPen(FaceWindow->RPort, PEN1); RectFill(FaceWindow->RPort, 3, 10, FaceWindow->Width-3, FaceWindow->Height-2); SetAPen(FaceWindow->RPort, PEN0); RectFill(FaceWindow->RPort, EyesLeft, EyesTop, EyesLeft+25, EyesTop+15); RectFill(FaceWindow->RPort, EyesLeft+65, EyesTop, EyesLeft+90, EyesTop+15); SetAPen(FaceWindow->RPort, PEN3); Move(FaceWindow->RPort, XMouthCenter-(FaceWindow->Width >> 3), YMouthCenter); Draw(FaceWindow->RPort, XMouthCenter+(FaceWindow->Width >> 3), YMouthCenter); } void UpdateFace() { /* * (1) Redraws mouth shape in response to a mouth shape change message * from the device. Its all pretty self explanatory. */ WaitBOVP(*FaceWindow->WScreen->ViewPort); SetAPen(FaceRast, PEN1); RectFill(FaceRast, 3, EyesBottom, FaceWindow->Width-3, FaceWindow->Height-2); LipWidth = MouthIO->width*3; LipHeight = MouthIO->height*2/3; SetAPen(FaceRast, PEN3); Move(FaceRast, XMouthCenter - LipWidth, YMouthCenter); Draw(FaceRast, XMouthCenter , YMouthCenter - LipHeight); Draw(FaceRast, XMouthCenter + LipWidth, YMouthCenter); Draw(FaceRast, XMouthCenter, YMouthCenter + LipHeight); Draw(FaceRast, XMouthCenter - LipWidth, YMouthCenter); }
Additional Information on the Narrator Device
Additional programming information on the narrator device can be found in the include files and the Autodocs for the narrator device and the Autodocs for the translator library. All are contained in the SDK.
Includes |
---|
devices/narrator.h |
AutoDocs |
---|
keyboard.doc |
translator.doc |