Invited Speakers of IDA 2012
The program of IDA 2012 will include invited talks by distinguished members of the intelligent data analysis community. We are proud to present the following confirmed invited speakers:
Arno Siebes, University of Utrecht, The Netherlands
Title: Queries for Data Analysis
Abstract of the talk: If we view data as a set of queries with an answer, what would a model be? In this paper we explore this question. The motivation is that there are more and more kinds of data that have to be analyzed. Data of such a diverse nature that it is not easy to define precisely what data analysis actually is. Since all these different types of data share one characteristic — they can be queried — it seems natural to base a notion of data analysis on this characteristic.
The discussion in this talk is preliminary at best. There is no attempt made to connect the basic ideas to other — well known — foundations of data analysis. Rather, it just explores some simple consequences of its central tenet: data is a set of queries with their answer.
Short bio:
Arno Siebes' group focusses on the theory and algorithmics
of the extraction of information from data. In particular,
the research concentrates on the algorithmic questions in
designing information systems that have to deal with large,
and ever more quickly growing, amounts of data. Such data
may be stored in many varieties, from neatly organised
databases to unordered documents on the web. The
fundamental principles of fitting search methods, data
mining and knowledge discovery and the design of algorithmic
technology for this form some of the major challenges in the
field. The research context consists of the many datarich
environments in which information systems for domain
research and support have to be devised. Important
applications that drive his fundamental research aim at
solving problems encountered in, for example, the biomedical
domain.
Paola Sebastiani, Boston University, USA
Title: Intelligent data analysis of human genetic data
Short bio: Paola Sebastiani, Ph.D. joined the
Department of Biostatistics in 2003 as an Associate
Professor, after holding faculty positions in Italy, England
and United States. She is author of over 70 peer-reviewed
publications in theoretical and methodological statistics,
artificial intelligence and computational biology. She is
member of the editorial board of the Machine Learning
journal and Evaluation of Intelligent Systems. She is also a
regular reviewer for major journals in statistics and
computer science, and serves on the program committee of
several international conferences at the interface between
statistics and artificial intelligence. Paola's research
interests focus primarily on the development of Bayesian
methods and their application to a wide spectrum of
problems, ranging from cognitive robotics to bioinformatics,
from biosurveillance to knowledge discovery. She is
particularly interested in the development of automated
analytical methods and many of her methodological
contributions have been implemented in computer programs,
such as Bayesware Discoverer: the first publicly available
program for the automated discovery of Bayesian networks
from incomplete databases, and Caged, a program for Bayesian
model-based clustering of gene expression profiles measured
in temporal experiments. Paola is founder of Bayesware LLC,
a software company developing and commercializing knowledge
discovery programs based Bayesian methods.
Gavin Cawley, University of East Anglia, United Kingdom
Title: Over-fitting in Model Selection and Its Avoidance
Short bio: Dr Cawley obtained his PhD in electronic
systems engineering from the University of Essex in 1996,
and is currently a senior lecturer in the School of
Computing Sciences at the University of East Anglia. His
research interests lie in theoretical and algorithmic issues
with a direct impact on the practical application of machine
learning techniques, including topics such as feature
selection, model selection, performance estimation, model
comparison, covariate shift, dealing with imbalanced or
"non-standard" data and predictive
uncertainty. Application areas focus on computational
biology and environmental sciences. He has won several
machine learning and data mining challenges associated with
IEEE World Congress on Computational Intelligence.
Participation in these competitions has led to the
realisation that over-fitting in model selection is a
significant issue in the practical application of machine
learning methods, and the development of techniques to avoid
this form of over-fitting is likely to be a fruitfull area
of research.