Invited Speakers of IDA 2012

The program of IDA 2012 will include invited talks by distinguished members of the intelligent data analysis community. We are proud to present the following confirmed invited speakers:

Arno Siebes, University of Utrecht, The Netherlands

Title: Queries for Data Analysis

Abstract of the talk: If we view data as a set of queries with an answer, what would a model be? In this paper we explore this question. The motivation is that there are more and more kinds of data that have to be analyzed. Data of such a diverse nature that it is not easy to define precisely what data analysis actually is. Since all these different types of data share one characteristic — they can be queried — it seems natural to base a notion of data analysis on this characteristic.

The discussion in this talk is preliminary at best. There is no attempt made to connect the basic ideas to other — well known — foundations of data analysis. Rather, it just explores some simple consequences of its central tenet: data is a set of queries with their answer.

Short bio: Arno Siebes Arno Siebes' group focusses on the theory and algorithmics of the extraction of information from data. In particular, the research concentrates on the algorithmic questions in designing information systems that have to deal with large, and ever more quickly growing, amounts of data. Such data may be stored in many varieties, from neatly organised databases to unordered documents on the web. The fundamental principles of fitting search methods, data mining and knowledge discovery and the design of algorithmic technology for this form some of the major challenges in the field. The research context consists of the many datarich environments in which information systems for domain research and support have to be devised. Important applications that drive his fundamental research aim at solving problems encountered in, for example, the biomedical domain.

Paola Sebastiani, Boston University, USA

Title: Intelligent data analysis of human genetic data

Paola Sebastiani Short bio: Paola Sebastiani, Ph.D. joined the Department of Biostatistics in 2003 as an Associate Professor, after holding faculty positions in Italy, England and United States. She is author of over 70 peer-reviewed publications in theoretical and methodological statistics, artificial intelligence and computational biology. She is member of the editorial board of the Machine Learning journal and Evaluation of Intelligent Systems. She is also a regular reviewer for major journals in statistics and computer science, and serves on the program committee of several international conferences at the interface between statistics and artificial intelligence. Paola's research interests focus primarily on the development of Bayesian methods and their application to a wide spectrum of problems, ranging from cognitive robotics to bioinformatics, from biosurveillance to knowledge discovery. She is particularly interested in the development of automated analytical methods and many of her methodological contributions have been implemented in computer programs, such as Bayesware Discoverer: the first publicly available program for the automated discovery of Bayesian networks from incomplete databases, and Caged, a program for Bayesian model-based clustering of gene expression profiles measured in temporal experiments. Paola is founder of Bayesware LLC, a software company developing and commercializing knowledge discovery programs based Bayesian methods.

Gavin Cawley, University of East Anglia, United Kingdom

Title: Over-fitting in Model Selection and Its Avoidance

Gawin Cawley Short bio: Dr Cawley obtained his PhD in electronic systems engineering from the University of Essex in 1996, and is currently a senior lecturer in the School of Computing Sciences at the University of East Anglia. His research interests lie in theoretical and algorithmic issues with a direct impact on the practical application of machine learning techniques, including topics such as feature selection, model selection, performance estimation, model comparison, covariate shift, dealing with imbalanced or "non-standard" data and predictive uncertainty. Application areas focus on computational biology and environmental sciences. He has won several machine learning and data mining challenges associated with IEEE World Congress on Computational Intelligence. Participation in these competitions has led to the realisation that over-fitting in model selection is a significant issue in the practical application of machine learning methods, and the development of techniques to avoid this form of over-fitting is likely to be a fruitfull area of research.