X

Artificial intelligence models are trained to learn from children by watching them and listening to them

Delhi, New Delhi In a recent study, headcam video records from the child’s six-month birthday to their second birthday were used to train an AI model to recognize words and concepts through the eyes and ears of a single child. It was demonstrated by researchers that the artificial intelligence (AI) model could pick up a significant amount of vocabulary and ideas from little portions of the child’s experience. They claimed that actual language learning may occur even though the video only showed 1% of the child’s waking hours.

“We can address classic debates about what ingredients children need to learn words – whether they need language-specific biases, innate knowledge, or just associative learning to get started,” stated Brenden Lake, an assistant professor in NYU’s Center for Data Science and Department of Psychology and senior author of the study published in the journal Science. “By using AI models to study the real language-learning problem faced by children,” Lake said.

In order to create the model, the researchers first examined a child’s learning process that was recorded on weekly basis between the ages of six and twenty-five months using a light, head-mounted camera.

The scientists used almost 60 hours of video footage to find that it included about a quarter of a million word instances—the amount of words said, many of them repeatedly—that were connected to video frames showing the child’s eye view of those words as they were said.

According to the team, the video also showed a variety of developmental events, such as mealtimes, book reading, and the youngster playing.

After that, the researchers used two different modules to train a multimodal neural network: one module processed individual video frames, while the second module processed the voice that was addressed to the child in transcription form.

They claimed that an algorithm known as contrastive learning, which seeks to learn by creating connections in the input data, was used to combine and train these modules.

They clarified, for example, that when a parent spoke from the child’s point of view, it was probably because part of the words were referring to something the child could see, meaning that comprehension was ingrained through the connection of language and visual signals.

“This gives the model a hint as to which words should be linked to which objects,” research scientist Wai Keen Vong of NYU’s Center for Data Science said.

“Combining these cues is what enables contrastive learning to gradually determine which words belong with which visuals and to capture the learning of a child’s first words,” Vong stated.

The team tested the model after it had been trained by giving it the target word and four alternative image possibilities, and asked it to choose the one that best matched the target word.

According to the researchers, the model managed to pick up “substantial” amounts of words and ideas from the child’s daily experiences.

Furthermore, it was found that the model could generalize some of the words it had learnt to visual occurrences that were not in its training set.

According to the researchers, this showed a generalization characteristic that is also observed in children during laboratory studies.

Categories: Technology
Pratik Patil:
X

Headline

Privacy Settings