Philip Rubin
Philip E. Rubin is an American cognitive scientist, technologist, and science administrator known for raising the visibility of behavioral and cognitive science, neuroscience, and ethical issues related to science, technology, and medicine, at a national level.
His research career is noted for his theoretical contributions and pioneering technological developments, starting in the 1970s, related to speech synthesis and speech production, including articulatory synthesis and sinewave synthesis, and their use in studying complex temporal events, particularly understanding the biological bases of speech and language.
Rubin is the President and a Trustee of Rothschild Wilder, a private foundation that supports social justice and ethics, science and innovation, the arts and humanities, and the preservation of popular culture artifacts. He is also Chair of the Board of Directors of Haskins Laboratories in New Haven, Connecticut, where he is Chief Executive Officer emeritus and was for many years a senior scientist. In addition, he is a Professor Adjunct in the Department of Surgery, Otolaryngology at the Yale University School of Medicine, a Research Affiliate in the Department of Psychology at Yale University, a Fellow at Yale's Trumbull College,
and a Trustee of the University of Connecticut.
He is the current Past President of the Federation of Associations in Behavioral and Brain Sciences, a role in which he will serve through 2025.
From 2012 through Feb. 2015 he was the Principal Assistant Director for Science at the Office of Science and Technology Policy in the Executive Office of the President of the United States, and led the White House's neuroscience initiative, which included the BRAIN Initiative. He also served as the Assistant Director for Social, Behavioral and Economic Sciences at OSTP. For many years he has been involved with issues of science advocacy, education, funding, and policy.
Education
Philip Rubin received his BA in psychology and linguistics in 1971 from Brandeis University and subsequently attended the University of Connecticut where he received his PhD in experimental psychology in 1975 under the tutelage of Michael Turvey, Ignatius Mattingly, Philip Lieberman, and Alvin Liberman.Career
Philip Rubin's research spans a number of disciplines, combining computational, engineering, linguistic, physiological, and psychological approaches to study embodied cognition, most particularly the biological bases of speech and language. He is best known for his work on articulatory synthesis , speech perception, sinewave synthesis, signal processing, perceptual organization, and theoretical approaches and modeling of complex temporal events. At the same time, he has been involved in leadership roles related to science administration, policy, and advocacy.Speech Synthesis and Speech Production
Starting in the early 1970s, Rubin worked on foundational issues in speech technology.These include: participating with Rod McGuire on Haskins aspects of the ARPANET Network Voice Protocol, a predecessor of Voice over IP;
collaborating with Leonard Szubowicz, Douglas Whalen, and others on digitized speech, particularly extensions of the Haskins Pulse-code modulation implementation,
focusing on expanding temporal markers and event labels; and working with Patrick Nye on the Digital Pattern Playback, which was eventually replaced by Rubin's HADES system.
During his time at Haskins Laboratories, Rubin was responsible for the design of many computational models and other software systems. Most prominent are ASY, the Haskins articulatory synthesis program,
and SWS, the Haskins sinewave synthesis program, both developed in the 1970s.
ASY expanded the Mermelstein vocal-tract model developed at Bell Laboratories, adding additional articulatory control, simulation of nasal sounds, sound generation, and digital sound production.
Most importantly, Rubin designed and implemented an approach for describing and controlling articulatory events, now known as speech gestures.
In addition to use in standard articulatory synthesis, the ASY program has been used as part of a gestural-computational model that combines articulatory phonology, task dynamics, and articulatory synthesis. With Louis Goldstein and Mark Tiede, Rubin designed a radical revision of the articulatory synthesis model, known as CASY, the configurable articulatory synthesizer. This 3-dimensional model of the vocal tract permits researchers to replicate MRI images of actual speakers and has been used to study the relation between speech production and perception. With colleagues Hosung Nam, Catherine Browman, Louis Goldstein, Michael Proctor, Elliot Saltzman, and Mark Tiede, a software system called TADA was developed. It implemented the task dynamic model of inter-articulator speech coordination, incorporating also a coupled-oscillator model of inter-gestural planning, a gestural-coupling model, and portions of the Haskins articulatory model. The system also generated articulatory models of English utterances from either phonetic or orthographic text input.
The sinewave synthesis system designed by Rubin, known as SWS, is based on a technique for synthesizing speech by replacing the formants with pure tone whistles, and was designed to explore the spatiotemporal aspects of speech signals. It was the first sinewave synthesis system developed for the automatic, large-scale creation of stimuli for perceptual experiments, and has been used by Robert Remez, Rubin, David B. Pisoni, and other colleagues and researchers to study the time-varying characteristics of the speech signal.
Rubin is also the designer of the HADES signal processing system and the SPIEL programming language, a predecessor of MATLAB.
From 1992 through 2012, Rubin was the core and administrative leader of Haskins Laboratories' main research activity, the National Institutes of Health/NICHD funded P-01 program project, “The Nature and Acquisition of the Speech Code and Reading.”
In 1998, he was the co-founder and first President of AVISA, the Auditory-Visual Speech Association, now part of the International Speech Communication Association.
He was the co-creator, with Eric Vatikiotis-Bateson, of the Talking Heads website, which is no longer active.
Theoretical Contributions
Dynamical systems / action theory perspective on speech.With Carol Fowler, Robert Remez, and Michael Turvey, Rubin introduced the consideration of speech in terms of a dynamical systems / action theory perspective. Rubin's theoretical approach to perception and production, particularly in the case of speech, eschews attention to the momentary and punctate aspects of the signal, focusing not on traditional features and cues, but on spatiotemporal coordination of global aspects of the system, such as spectral coherence over long stretches of time.
Perceptual organization.
With Robert Remez and various other colleagues, he has used the technique of sinewave synthesis to explore perceptual organization. They have noted that "the criteria for the perceptual organization of speech - visible, audible, and even palpable - are actually specified in a general form, removed from any particular sensory modality...",
but, to Rubin, related to the underlying spectral coherence of signals created by coordinated physiological activity.
Events, gestures, and embodiment.
Rubin's approach stresses the constraints and structure stemming from the realities of embodied systems, again across both time and physical space. He expanded the modeling of speech production to incorporate an event based approach to control movement over time and articulatory space of the vocal tract, by building on the conceptual approach developed by Paul Mermelstein and colleagues at Bell Laboratories. This work was influenced, in part, by the event-based focus of James J. Jenkins. Rubin's articulatory synthesis model, ASY, illustrates how simple physical changes, such as velar opening, directly account for degrees of nasality, avoiding the complexity of attempting to reconcile numerous spectral cues. This event orientation evolved into a gestural computational system developed at Haskins Laboratories that combined ASY with the articulatory phonology of Catherine Browman and Louis Goldstein, and the task dynamic model of Elliot Saltzman. In this system utterances are organized ensembles of units of articulatory action called gestures. Each gesture is modeled as a dynamical system that characterizes the formation of a local constriction within the vocal tract. Goldstein and Rubin have described the "dances of the vocal tract" that underlie the production of continuous speech.
Biomechanical constraints on inverse mapping.
Biomechanical constraints stemming from such embodiment also can be exploited in the recovery of vocal tract shapes from the acoustic signal as seen in the continuity mapping approach of John Hogden, used by Hogden, Rubin, and colleagues to re-conceptualize how realistic physical constraints affect pattern recognition.
This involves reverse engineering the path from the acoustic signal to its physiological source using a gradient maximum likelihood approach.
Audiovisual speech and multimodality.
Rubin expanded his approach to understanding the importance of spatiotemporal coordination in his collaborations on audiovisual speech with Eric Vatikiotis-Bateson, Hani Yehia, and other colleagues, focusing on multimodality by exploring the simultaneous combination of speech, facial information, and gesture, leading to innovations in analysis, synthesis, and simulation.
National Science Foundation
From 2000 to 2003 Rubin was the Director of the Division of Behavioral and Cognitive Sciences at the National Science Foundation in Arlington, Virginia, where he helped launch the Cognitive Neuroscience, Human Origins, Documenting Endangered Languages, and other programs,was part of the NBIC convergence activities, and was the first chair of the Human and Social Dynamics priority area.
Rubin returned to the NSF during the second term of the Obama Administration to serve as a senior advisor in the Directorate for Social, Behavioral and Economic Sciences.