Wednesday, January 16 2013
15:30

Alladi Ramakrishnan Hall

NONLINEAR TRANSFER LEARNING ACROSS HIGH-DIMENSIONAL DATASETS

Sridhar Mahadevan

University of Massachusetts, Amherst

As the size and variety of massive digital datasets grows, a large number
of real-world applications involve analyzing paired high-dimensional data
representing multiple "views" of an object. For example, on Mars the
Curiosity rover zaps rocks with a laser sending back high dimensional
spectra that chemists analyze using laser-induced breakdown spectroscopy
(LIBS) to determine a rock's chemical composition. Curiosity zaps the
same rock multiple times as well as rocks in nearby areas. In
bioinformatics, genome-wide association studies analyze paired
gene-expression and DNA copy number datasets. In computer graphics,
researchers automate the building of 3D models of objects by transferring
shape and texture information across related objects. In cross-lingual
information retrieval, users search for documents in a target language
(e.g., say Arabic) by issuing queries in a source language (e.g,
English). In eldercare, researchers analyze sensor streams measuring
activity using multiple sensors to minimize the risk of falling, a
leading cause of injury and death. In decipherment studies, researchers
are interested in modeling the statistics of pairs of unknown and known
languages.

Canonical correlation analysis (or CCA), a statistical method for
analyzing paired high-dimensional data, and invented by Hotelling almost
80 years ago, is widely used to analyze paired datasets. We introduce a
new class of manifold-based statistical methods for analyzing paired
high-dimensional datasets that provides significant advantages over CCA,
and appears to demonstrably outperform it in many cases. Our approach --
based on modeling the underlying geometric structure of paired datasets
-- is more flexible than CCA, and easily adaptable to a variety of
different optimization criteria and types of correspondence information
available. A range of algorithms will be presented in this talk, coupled
with interesting applications illustrating their widespread usefulness
across many domains.

BIO: Sridhar Mahadevan is a professor and Graduate Program Director at the
School of Computer Science at the University of Massachusetts, Amherst,
one of the largest research and graduate training programs in computer
science with 40 faculty and nearly 300 graduate students. His research
interests span a variety of areas in artificial intelligence and machine
learning, from sequential decision making and reinforcement learning to
manifold learning. He received the NSF CAREER award in 1995, and his
research has been recognized by a number of best paper awards or
nominations at major international conferences. He is the author of a
number of books, including a pioneering book on robot learning in 1993,
and a recent book on representation discovery in 2008. He directs the
Autonomous Learning Laboratory, an interdisciplinary laboratory with 20
graduate students working on a variety of research projects in AI, machine
learning, neuroscience, robotics, and scientific applications. His
research is funded by a variety of grants from the NSF and AFOSR.



Download as iCalendar

Done