[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [xmca] Deb Roy: The birth of a word
That is a fair point, Lauren. But it would be rhetorically stronger if you were to get us started! ;)
Let's take a look, then, at "New Horizons in the Study of Child Language Acquisition."
First Roy taught a robot to segment recordings of naturally occurring speech and pair these segments with visual categories of objects, but realized that this approach "could never in principle learn the difference in meaning of “ball” versus “round” since both terms would be grounded in terms of the same perceptual category." This got him started on his baby study.
Now he has 230,000 hours of data (90,000 video, 140,000 audio) in "the Human Speechome Project." (Named, obviously, after the Human Genome Project, another massive investigation which it is often now said has not made good on its original aims.) With the new corpus he is aiming to feed a bunch of data into "a machine learner that embodies a computationally precise hypothesis of child language acquisition." The data will include machine word-level transcription, identification of speaker and prosody, based on the audio recordings, plus machine encoding of "what [participants] were doing and how (activity classification, manner analysis), and with what objects (object tracking and classification)," based on the video recordings.
Leaving aside how a computer can determine the appropriate level at which to describe what someone is doing, Roy at least recognizes that "A fundamental limitation of the framework is its implication that language learning wholly consists of passively processing sensory input," and apologizes for the fact that "The passive nature of the analysis framework reflects the inherent limitations of working with “dead data” – frozen records of human interactions."
And yet (David K.), he celebrates "the objective stance towards his development that the Speechome corpus enables me to take."
Here too, as in the presentation, I see lots of "amazing" claims, but these seem to me largely to be based on very dubious assumptions (some of which Roy recognizes, to give him credit) and to be largely promises about what will be achieved at some time in the future.
For example, the automated transcription process had errors of over 75%, so Roy's "BlitzScribe" system now feeds "sound bites" to a human transcriber. In other words, it enables human transcribers to type faster and, Roy claims, still with accuracy. But speaking as someone who has transcribed a lot of recordings of natural speech, I have questions about the accuracy inherent in an approach that deliberately eliminates spoken and nonverbal context.
It is also strange that Roy's data on "word birth" shows a curve very different from the exponential growth curves previously reported for vocabulary growth. Instead he finds a "shark's fin" curve, in which new words appear with increasing frequency up to 20 months, and then "word birth" then drops back dramatically through 24 months (where data-collection ended). Roy promises that "Further investigations will aim to explain the shape of the curve."
How about you, Lauren? What do you see of value?
On Mar 19, 2011, at 12:26 PM, Lauren Zentz wrote:
> With all due respect to all the brilliant minds on this list and in this
> discussion, I have been following along here and there since this
> conversation started and wondering the entire time exactly what research and
> knowledge implications we should be worried about based on a 20 minute TED
> Talk. It seems that for us as researchers it is very important to know what
> Roy is doing with language acquisition and development research, and who
> will be buying which ideas that he puts forth; but I feel like the intended
> message of his talk, which was given to a *very* broad, and generally
> non-linguist, non-cognitivist, and non-social scientist audience, was
> basically to demonstrate how amazing are the technological tools he is using
> to do this research, and to generally inspire a larger population of
> listeners regarding how complex and precious is the nature of human
> (language) development.
> I wonder if maybe, if we want to discuss the implications of his research,
> those of us interested could take a look at the actual publications he has
> written, where he has published them, and what audiences read them:
> Lauren Zentz
> Doctoral Candidate, Language, Reading and Culture
> College of Education, University of Arizona
> On Sat, Mar 19, 2011 at 9:48 AM, mike cole <firstname.lastname@example.org> wrote:
>> I wonder if criticisms of the sort voiced in this company might not
>> influence the subsequent course of inquiry. There are a bunch of critical
>> comments below the Roy
>> presentation that could benefit from this discussion.
>> On Sat, Mar 19, 2011 at 9:14 AM, Martin Packer <email@example.com> wrote:
>>> On Mar 16, 2011, at 9:16 PM, David Kellogg wrote:
>>>> I am not entirely sure I agree with Martin's and Jim's criticisms.
>>> of all, when I read Halliday's work on early language acquisition, it
>>> MORE objective than Deb Roy's "space time worms". Halliday is looking at
>>> grammar and especially at function. But I am really not sure at all what
>>> Roy is looking at. I can't even understand, when I am looking at the
>>> what is space and what is time, but above all I can't understand how it
>>> helps him organize his transcriptions. (I can see how it makes for a cool
>>> presentation, though!)
>>> Like Jim, I'd like to clarify my previous message. I didn't mean to sound
>>> as though I were rejecting any use of technology for this kind of
>>> Obviously videorecording and other techniques of objectification are
>>> for the study of a phenomenon as fleeting as speech. But any
>>> of children's acquisition of language has to make use of the intuitions
>>> speakers of that language. One needs to be able to recognize the legal
>>> combinations of phonemes, and syllables, and the illegal combinations, in
>>> order to plot the movement from one to the other. One needs to recognize
>>> word, and approximations to it, and what it signifies in a specific
>>> of use. The utility of computers, then, to help conduct an analysis of a
>>> child's speech depends on ones ability to program them with the
>>> of these intuitions. The degree of success with which we have been able
>>> program computers to recognize human speech is still very limited, and
>>> ability to program them to understand context has been even more limited.
>>> Yet once one collects massive amounts of data, as Roy has done, the use
>>> computers becomes virtually unavoidable. My point about Halliday's
>>> was that he drew not only on his speaker/hearer's intuitions, he also
>>> on what was available to him as a participant interacting intimately with
>>> the child speaker. Roy of course had the same type of interactions, but
>>> rather than build on these he chose instead the strategy of massive data
>>> collection. There is, presumably as a consequence of, apparently no
>>> attention to semantics in Roy's analysis - not that one would expect to
>>> the child showing an understanding of concepts, but knowing something of
>>> adults' interpretations of his words in context would surely be
>>> helpful in understanding the acquisition process.
>>> I assume that the fact that in his presentation Roy could provide only
>>> sound bites of the child's approximations to "water" indicates that his
>>> system for automated analysis of the videos was not able to parse those
>>> events. Was the computer able to judge these utterances to be tokens of a
>>> single type? Or did humans still need to go through the recordings to
>>> such judgments? If the latter, then it seems to me that the accumulation
>>> massive amounts of data made the researchers' task more difficult, not
>>> easier, and it is not clear to me what the benefit is of Roy's approach.
>>> Martin __________________________________________
>>> xmca mailing list
>> xmca mailing list
> xmca mailing list
xmca mailing list