Date: Mon, 13 Jan 1997 03:57:38 -1000

From: Norman Roberts nroberts[AT SYMBOL GOES HERE]HAWAII.EDU

Subject: Re: IPA to Internet?

Does anybody know where I can find an IPA to Internet legend?

Here is one I downloaded some time ago. It's rather long so I hope your

mail server accepts it.

sci.lang #38113 (11 more) [1]

From: Georgy Pruss georgy[AT SYMBOL GOES HERE]zs.kiev.ua

[1] Repost: FAQ: Representing IPA Phonetics in ASCII

Date: Wed May 10 10:22:59 HST 1995

Organization: Zest Systems

Lines: 524

Distribution: world

NNTP-Posting-Host: render.gu.kiev.ua

X-Return-Path: zs!zs.kiev.ua!georgy[AT SYMBOL GOES HERE]figaro.gu.kiev.ua

Some people asked me to re-send it. Here you are.

Newsgroups: sci.lang,alt.usage.english

From: evan[AT SYMBOL GOES HERE]hplerk.hpl.hp.com (Evan Kirshenbaum)

Subject: FAQ: Representing IPA Phonetics in ASCII

Sender: news[AT SYMBOL GOES HERE]hplabsz.hpl.hp.com (News Subsystem (Rigel))

Message-ID: D25JCv.DzA[AT SYMBOL GOES HERE]hplabsz.hpl.hp.com

Date: Mon, 9 Jan 1995 18:58:07 GMT

Reply-To: kirshenbaum[AT SYMBOL GOES HERE]hpl.hp.com

Nntp-Posting-Host: hplerk.hpl.hp.com

Organization: Hewlett-Packard Laboratories

Lines: 502

Xref: lyra.csx.cam.ac.uk sci.lang:16066 alt.usage.english:38835

[Last Modified, 4 Jan 1993]

This article describes a standard scheme for representing IPA

transcriptions in ASCII for use in Usenet articles and email. The

following guidelines were kept in mind:

o It should be usable for both phonemic and narrow phonetic

transcription.

o It should be possible to represent *all* symbols and

diacritics in the IPA.

o The previous guideline notwithstanding, it is expected that

(as in the past) most use will be in transcribing English,

so where tradeoffs are necessary, decisions should be made

in favor of ease of representation of phonemes which are

common in English.

o The representation should be readable.

o It should be possible to mechanically translate from the

representation to a character set which includes IPA. The

reverse would also be nice.

In order to be able to represent a wide range of segments while making

common segments easy to type, we allow more than one representation

for a given segment. Each segment has an "explicit" representation,

which is a set of features between curly braces ("{" and "}"). Each

feature is represented as a three letter abbreviation taken from a

standardized set. The phoneme /b/ (a voiced, bilabial stop) could be

represented as /{vcd,blb,stp}/. A first cut at the feature set

appears in appendix A below.

The word "tag" could thus be represented phonemically as

/{vls,alv,stp}{low,fnt,unr,vwl}{vcd,vel,stp}/

and phonetically as

[{vls,asp,alv,stp}{low,fnt,lng,unr,vwl}{unx,vcd,vel,stp}]

This works, but it's a bit of a pain. To simplify transcription, we

allow an "implicit" representation for a segment which consists of a

(generally alphabetic) symbol followed by diacritics. Thus /b/ stands

for /{vcd,blb,stp}/. Case is significant (/n/ and /N/ are different

segments). The segment symbols are given in appendix B below.

The word "tag" can thus be represented phonemically as

/t&g/

The diacritics for a segment are represented between angle brackets

(" " and " ") and consist of symbols or features. (In the common case

where the diacritic symbol is a single character which does not encode

a segment, the brackets may be removed.) The features which the

diacritics map to override those of the segment.

The word "tag" thus becomes narrowly

[t asp & lng g unx ]

or

[t h & : g o ]

or

[t h &:g o ]

Some diacritic symbols encode more than one feature set. Which one is

meant should be apparent from context. For example, "." stands for

"{rnd}" when attached to a vowel, but "{rfx}" when attached to a

consonant.

Clicks are common to many languages (especially in Africa), but there

is no IPA diacritic that means "click". Rather than use up several

characters for clicks (which are infrequent in the languages most

often discussed), we instead use the diacritic "!" after the

homorganic unvoiced stop. Thus /t!/ (= /t clk / = /{alv,clk}/) is the

sound commonly written "tsk" and used in English to show disapproval.

The complete set of diacritic symbols appears in appendix C below.

Appendices D and E contain representations of segments more or less

ordered by feature (appendix D in tabular form, appendix E as a list).

Appendix F contains a list of all of the ASCII characters and the uses

they have been pressed to.

For transcription of any specific language a group can by convention

alter the character mappings (as an example, for Spanish /R/ may be

better used to represent /{alv,trl}/ than /{mid,cnt,rzd,vwl}/). An

author may also press a little used symbol (for the language under

consideration) into service to highlight a distinction. Such an

alteration should be made explicitly to avoid confusion.

The diacritics "+" and "=" and the segment symbols "$" and "%" are

explicitly left unspecified so that they can be used to mark

language-specific features (that are otherwise cumbersome to mark).

Such symbols can be assigned either by convention for a specific

language or in an ad-hoc manner by an individual author.

Stress marks are prepended to the syllable they attach to. "'"

signals primary stress, "," signals secondary stress. Spaces should

be employed to separate words (cliticized words may be written

unseparated). When discussing single words, it may be helpful to

insert a space before each syllable that doesn't carry a

suprasegmental marker.

The "I hear the secretary" for an American might be something like

/aI hir D[AT SYMBOL GOES HERE] 'sEkrI,t&ri/

while to an Englishman it might be more like

/aI hi[AT SYMBOL GOES HERE] DI 'sEkr^tri/

Transcribing tone is harder. Here's an attempt. For register tone

languages (e.g., Hausa, Navajo), numbers should be used with one being

the lowest. Thus in Navajo, "1" is low tone and "2" is high. In

Yoruba "1" is low, "2" is mid, and "3" is high. The language's

"default" tone need not be specified. For contour tone languages

(e.g., Mandarin, Thai), there is generally a numeric system in place

(Mandarin: "1" is high, "2" is rising, "3" is falling rising, "4" is

falling). The tone indication should follow the syllable (vowel?).

The symbol "#" is used to represent a syllable or word boundary.

Appendix A. Feature Abbreviations

----