Kenneth Noble Stevens
|Died||August 19, 2013 (aged 89)|
|Alma mater||MIT, University of Toronto|
|Awards||National Medal of Science (1999)|
|Fields||Electrical engineering, Acoustic phonetics|
|Doctoral advisor||Leo Beranek|
|Other academic advisors||J. C. R. Licklider, Walter A. Rosenblith|
|Doctoral students||James L. Flanagan|
Lawrence R. Rabiner
Kenneth Noble Stevens (March 24, 1924 – August 19, 2013) was the Clarence J. LeBel Professor of Electrical Engineering and Computer Science, and Professor of Health Sciences and Technology at the Research Laboratory of Electronics at MIT. Stevens was head of the Speech Communication Group in MIT's Research Laboratory of Electronics (RLE), and was one of the world's leading scientists in acoustic phonetics.
He was awarded the National Medal of Science from President Bill Clinton in 1999, and the IEEE James L. Flanagan Speech and Audio Processing Award in 2004.
He died in 2013 from complications of Alzheimer's disease.
Ken Stevens was born in Toronto on March 23, 1924. His older brother, Pete, was born in England; Ken was born four years later, shortly after the family emigrated to Canada. His childhood ambition was to become a doctor, because he admired an uncle who was a doctor. He attended high school at a school attached to the Department of Education at the University of Toronto.
Stevens attended college in the School of Engineering at the University of Toronto on a full scholarship. He lived at home throughout his undergraduate years. Though Stevens himself could not fight in World War II because of his visual impairment, his brother was away for the entire war; his parents tuned in nightly to the BBC for updates. Stevens majored in engineering physics at the university, covering topics from the design of motorized machines through to basic physics, which was taught by the physics department. During summers he worked in the defense industry, including one summer at a company that was developing radar. He received both his S.B. and S.M. degrees in 1945.
Stevens had been a teacher since his undergraduate years, when he lectured sections of home economics that involved some aspect of physics. After receiving his master's degree, he stayed at the University of Toronto as an instructor, teaching courses to young men returning from the war, including his own older brother. He was a fellow of the Ontario Foundation from 1945 to 1946, then worked as an instructor at the University of Toronto until 1948.
During his master's research Stevens became interested in control theory, and took courses from the applied mathematics department, where one of his professors recommended that he should apply to MIT for doctoral studies.
Shortly after Stevens was admitted to MIT, a new professor named Leo Beranek noticed that Stevens had taken acoustics. Beranek contacted Stevens in Toronto, to ask if he would be a teaching assistant for Beranek's new acoustics course, and Stevens agreed. Shortly after that, Beranek contacted Stevens again to offer him a research position on a new speech project, which Stevens also accepted. The Radiation Laboratory at MIT (building 20) was converted, after the war, into the Research Laboratory of Electronics (RLE); among other labs, RLE hosted Beranek's new Acoustics Lab.
In November 1949, the office next to Ken's was given to a visiting doctoral student from Sweden named Gunnar Fant, with whom he formed a friendship and collaboration that would last more than half a century. Stevens focused on the study of vowels during his doctoral research; in 1950 he published a short paper arguing that the autocorrelation could be used to discriminate vowels, while his 1952 doctoral thesis reported perceptual results for vowels synthesized using a set of electronic resonators. Fant convinced Stevens that a transmission-line model of the vocal tract was more flexible than a resonator model and the two published this work together in 1953.
Ken credits Fant with the association between the Linguistics Department and the Research Laboratory for Electronics at MIT. Roman Jakobson, a phonologist at Harvard, had an office at MIT by 1957, while Morris Halle joined the MIT Linguistics Department and moved to RLE in 1951. Stevens' collaborations with Halle began with acoustics, but grew to focus on the way in which acoustics and articulation organize the sound systems of language.
Stevens defended his doctoral thesis in 1952; his doctoral committee included his adviser Leo Beranek, as well as J. C. R. Licklider and Walter A. Rosenblith. After receiving his doctorate, Stevens went to work at Bolt, Beranek and Newman (now BBN Technologies) in Harvard Square. In the early 1950s, Beranek decided to retire from the MIT faculty in order to work full-time at BBN. He knew that Stevens loved to teach, so he encouraged Stevens to apply for a position on the MIT faculty. Stevens did so, and joined the faculty in 1954.
Stevens is best known for his contributions to the fields of Phonology, speech perception, and speech production. Stevens' most well-known book, Acoustic Phonetics, is organized according to the distinctive features of Stevens' phonological system.
Stevens is perhaps best known for his proposal of a theory that answers the question: Why are the sounds of the world's languages (their phonemes or segments) so similar to one another? On first learning a foreign language, one is struck by the remarkable differences that can exist between one language's sound system and that of any other. Stevens turned the student's perception on its head: rather than asking why languages are different, he asked, if the sound system of each language is completely arbitrary, why are languages so similar? His answer is the quantal theory of speech. Quantal theory is supported by a theory of language change, developed in collaboration with Samuel Jay Keyser, which postulates the existence of redundant or enhancement features.
Stevens' methodology in the investigation of speech sounds is organized into three steps. The first step is to use physics (mainly tube models) to model the shape of the articulators (e.g. the shapes of the front and back cavity, rounding or non-rounding of lips, etc). Based on the articulatory tube models, resonant frequencies can be calculated, which are the formant frequencies. Once the resonant frequencies are calculated, speech data are collected and analyzed to compare to theoretical calculations. This second stage is mainly experimental, where tokens of interest are usually recorded either in isolation, and/or embedded in a controlled carrier phrase, usually spoken by both several female/male native speakers of the language. The key to data collection is controlling for as many factors as possible so that the acoustic evidence of interest can be investigated with minimum amount of artifacts. The last stage in the investigation is to compare the data results with the theoretical predictions and to account for the differences that occur. Differences can sometimes be explained by the fact that tube models usually are simplified as to not account for loss due to softness of vocal walls (though resistors can be added to the theoretical model). Subglottal system might also affect the vocal tract productive system when the glottal opening is large (please see research on subglottal resonance on effects of speech). Theretical model predictions can give general predictions about what one can expect to find in real speech, and evidence from real speech can also help refine the original model, and give better insight to the production of speech sounds.
Quantal theory aims to elegantly describe (using physics) and organize all the acoustic features of all possible sounds into a matrix. (See chapter five in Acoustics Phonetics) The ultimate constraint on all speech sounds is the physical articulatory system itself, thus supporting the claim that there can only be a finite set of sounds among languages. The reason that the set of speech sounds is finite is that while the movement of the articulators is continuous, only certain configurations tend to be articulatorily and/or acoustically stable, giving rise to fix frequencies for formants that form sounds that are relatively universal for all languages (i.e. vowels and consonants). Each acoustic sound can thus be described by a handful of defining features (usually binary). For example, lip-round (either on or off) is a feature. Tongue height (either high or low) is another feature. In addition to these defining features which serve as the essential description of the acoustic sounds, there are also enhancing features which help to make the sounds more recognizable. For each of these features, one can apply Stevens' methodology to first use a tube model to model the articulators, and predict the resonant frequencies, then collect data to examine the acoustic properties of that feature, and finally to reconcile with the theoretical model and summarize the acoustic properties of that feature.
To get an introduction to the world of speech science, one can first read the book "The Speech Chain" by Denes P. and Pinson E., where one is given a broad overview of the production and transmission of speech. One is introduced to spectrograms and formant frequencies, which are the main acoustic description of sound segments.
As the vocal folds vibrate, puffs of air pushed through (filtered) by the vocal tract, producing sound. This sound source is modeled as a current source in a circuit modeling the production of sound. Changes in the vocal tract would cause change to the sound that is produced. The frequency of vibration for females vocal folds tend to be higher than that of males, giving female voices higher pitch than male voices.
Research (Hanson, H.M. 1997) has shown there is a difference between how females and males vibrate their vocal folds; there is a greater spread for female glottis, which gives female voices a more breathy quality than male voices.
The subglottal system refers to the system that is below the glottis in the human body. It includes the trachea, bronchi, and the lungs. It is essentially a fixed system, so does not change for each individual speaker. Research results have shown that during the open phase of the glottal cycle (when the glottis is open), coupling is introduced due to the subglottal system, manifesting acoustically as pole/zero pairs in the frequency domain. These pole/zero pairs introduced by the coupling serve are hypothesized to serve as prohibited or unstable regions in the spectra, serving as natural boundaries for vowel features such as +front or +back.
For adult males, the resonant frequencies of their subglottal system have been measured (using invasive methods) to be 600, 1550, and 2200 Hz. (Acoustic Phonetics, pg 197, Ishizaka et. al., Crane & Boves). The subglottal resonant frequencies of females are slightly higher due to their smaller dimensions. One non-invasive way of measuring these peaks is to use an accelerometer placed above the sternal notch (Henke) to record the acceleration of the skin during phonation. The vibration would capture the resonant frequencies below the glottis (of the subglottal system).
The vocal tract refers to the passage way that is above the glottis, all the way to opening of the lips. A two-tube model is usually used to model the vocal tract, one capturing the dimension (cross-sectional area and length) of the back cavity, the other modeling the front cavity. Resonant frequencies calculated from the tube model are the formant frequencies. To produce the schwa vowel /ə/, the vocal tract is relatively open all the way from the glottis to the mouth, thus the tube model can be thought of as a relatively uniform open tube, making the resonant frequencies (or formants) evenly apart. The radiation at the mouth would cause these resonant frequencies to be about five percent lower. (Acoustics Phonetics, pg 139) Female vocal tracts (average of 14.1 cm) are on average shorter than the male vocal tracts (average of 17.7 cm), thus making them having higher formant frequencies than males.
Since the vocal tract walls are soft, energy is lost in the vocal tract, which increases the bandwidth of the formants.
When the velopharyngeal port opens during the production of certain sounds, such as /n/ and /m/, coupling is introduced due to the naval cavity, which gives the output a nasal quality.
The quantal theory suggests that the phonological inventory of a language is defined primarily by the acoustic characteristics of each segment, with boundaries specified by the acoustic-articulatory mapping. The implication is that phonological segments must have some type of acoustic invariance. Blumstein and Stevens demonstrated what appeared to be an invariant relationship between the acoustic spectrum and the perceived sound: by adding energy to the burst spectrum of "pa" at a particular frequency, it is possible to turn it into "ta" or "ka" respectively, depending on the frequency. Presence of the extra energy causes perception of the lingual consonant; its absence causes perception of the labial.
Stevens' recent work has re-structured the theory of acoustic invariance into a shallow hierarchical perceptual model, the model of acoustic landmarks and distinctive features.
While on sabbatical at KTH in Sweden in 1962, Stevens volunteered as a participant in cineradiography experiments being conducted by Sven Öhman. Stevens' cineradiographic films are among the most widely distributed; copies exist on laserdisc, and some are available online.
After returning to MIT, Stevens agreed to supervise the research of a dentistry student named Joseph S. Perkell. Perkell's knowledge of oral anatomy permitted him to trace Stevens' X-ray films onto paper, and to publish the results.
Other contributions to the study of speech production include a model by which one can predict the spectral shape of turbulent speech excitation (depending on the dimensions of the turbulent jet), and work related to the vocal fold configurations that lead to different modes of phonation.
In fact, the spectral properties (formants, bandwidth of formants, other glottal characteristics) of all possible sound phonemes in all languages can theoretically be modeled and predicted using physics-based resonator models. Basic tube resonators can be used to give a general prediction of formants for vowels. Additional refinement to the basic model is used by adding resistors and/or capacitors to the model to represent energy losses due to vocal tract walls. Acoustical coupling due to the subglottal system can also be modeled by adding additional tubes to the model of the original vocal tract, introducing pole/zero in the spectra that represent the effects of subglottal coupling. (The locations of these pole/zero pairs are the resonant frequencies of the subglottal system). Glottal characteristics such as vocal pitch (F0), open quotient (H1-H2), and degree of breathiness (H1-A3) can also be modeled and measured from the spectra. (Hanson & Stevens).
Stevens joined MIT as an assistant professor in 1954. He became an associate professor in 1957, a full professor in 1963, and was appointed as the Clarence J. Lebel Chaired Professor in 1977. One of his long-time collaborators, Dennis Klatt (who wrote DECtalk while working in Stevens' lab), said that "As a leader, Ken is known for his devotion to students and his miraculous ability to run a busy laboratory while appearing to manage by a principle of benevolent anarchy."
The first doctoral thesis Stevens signed at MIT was that of his fellow student, James L. Flanagan, in 1955. Flanagan started graduate school at MIT in the same year as Stevens, but without a prior master's degree; he earned his M.S. in 1950 under Beranek's supervision, then finished his doctoral thesis under Stevens' supervision in 1955.
Stevens estimated in 2001 that he had supervised approximately forty Ph.D. candidates.
On the occasion of his receipt of the Gold Medal of the Acoustical Society of America, in 1995, colleagues wrote of Stevens' Speech Group that "during its existence of almost four decades" it "has been outstanding in the support that it has provided to women researchers, many of whom have gone on to populate the upper echelons of research labs throughout the world.". Stevens’ laboratory has been referred to by colleagues as a "national treasure" 
Stevens was active in the Acoustical Society of America since his time as a graduate student. He was a member of the executive council from 1963 to 1966, Vice President from 1971–2, and President of the Society from 1976–7. He is a Fellow of the ASA. In 1983 he received its Silver Medal in Speech Communication, and in 1995 he received the Gold Medal from the society.
Stevens was also active in the IEEE, where he held the rank of IEEE Life Fellow. In 2004, Ken Stevens and Gunnar Fant were the joint first winners of the IEEE James L. Flanagan Speech and Audio Processing Award.
Stevens was a Fellow of the American Academy of Arts and Sciences, a member of the National Academy of Engineering, a member of the National Academy of Sciences, and a 1999 recipient of the United States National Medal of Science.