Stanford Team Creates Computer Vision Algorithm to Describe Photos


Computer software only recently became smart enough to recognize objects in photographs. Now, Stanford researchers using machine learning have created a system that takes the next step, writing a simple story of what’s happening in any digital image.

“The system can analyze an unknown image and explain it in words and phrases that make sense,” said Fei-Fei Li, a professor of computer science and director of the Stanford Artificial Intelligence Lab.

“This is an important milestone,” Li said. “It’s the first time we’ve had a computer vision system that could tell a basic story about an unknown image by identifying discrete objects and also putting them into some context.”

Humans, Li said, create mental stories that put what we see into context. “Telling a story about a picture turns out to be a core element of human visual intelligence, but so far it has proven very difficult to do this with computer algorithms,” she said.

At the heart of the Stanford system are algorithms that enable the system to improve its accuracy by scanning scene after scene, looking for patterns, then using the accumulation of previously described scenes to extrapolate what is being depicted in the next unknown image.

“It’s almost like the way a baby learns,” Li said.

She and her collaborators, including Andrej Karpathy, a graduate student in computer science, describe their approach in a paper submitted in advance of a forthcoming conference on cutting edge research in the field of computer vision.

Eventually these advances will lead to robotic systems that can navigate unknown situations.

In the near term, machine-based systems that can discern the story in a picture will enable people to search photo or video archives and find specific images.

“Most of the traffic on the Internet is visual data files, and this might as well be dark matter as far as current search tools are concerned,” Li said. “Computer vision seeks to illuminate that dark matter.”

The new Stanford paper describes two years of effort that flows from research that Li has been pursuing for a decade.

Her work builds on advances that have come, slowly at times, over the last 50 years since MIT scientist Seymour Papert convened a “summer project” to create computer vision in 1966.

Conceived during the early days of artificial intelligence, that timeline proved exceedingly optimistic, as computer scientists struggled to replicate in machines what took millions of years to evolve in living beings.

It took researchers 20 years to create systems that could take the relatively simple first step of recognizing discrete objects in photographs.

More recently the emergence of the Internet has helped to propel computer vision.

On one hand, the growth of photo and video uploads has created a demand for tools to sort, search and sift visual information.

On the other, sophisticated algorithms running on powerful computers have led to electronic systems that can train themselves by performing repetitive tasks, improving as they go.


Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>