Interdisciplinary Cluster Workshop "Challenges in statistical inference"

Seminar room, ground floor (5530.EG.003) (Exzellenzzentrum / IGSSE. Seminar room, ground floor (5530.EG.003))

Seminar room, ground floor (5530.EG.003)

Exzellenzzentrum / IGSSE. Seminar room, ground floor (5530.EG.003)

Exzellenzzentrum / IGSSE (International Graduate School of Science and Engineering), Boltzmannstrasse 17, 85748 Garching
Andreas Müller (Excellence Cluster Universe)
The Excellence Cluster Universe organises an interdisciplinary workshop "Challenges in statistical inference" which will take place during 7-9 November 2016.

Venue: International Graduate School of Science and Engineering (IGSSE), Boltzmannstr. 17 on Garching campus close to the Mensa.

This workshop aims to connect experts working on challenging data-analysis  problems applied on various disciplines ranging from natural sciences to computer science and engineering. A broad spectrum of methodologies related to qualitative assessment of data interpretation will be presented, discussed, and advanced.

A forefront topic of the workshop is on the statistical inference of complex models. Differences and synergies between various data-analysis approaches will be highlighted in order to establish interdisciplinary links and knowledge transfer.  

The key questions, addressed during the three days workshop, are:  
1. What are the current advanced methods and what are their strengths and limitations?  
2. What are the upcoming challenges with the interpretation of complex data?  
3. How to address these challenges?  

–Machine learning,  
–Data interpretation for complex models,  
–Model comparison and testing,  
–Information theory,  
–Data-constrained simulation,  
–Optimization and optimal control 

If you are interested to attend, please register.
Submissions of titles and abstracts ended.

There will be no fee.  

Scientific Organising Committee:
Frederik Beaujean (LMU)
Torsten Ensslin (MPA)
Fabrizia Guglielmetti (MPA)
Stefan Hilbert (LMU)
Jens Jasche (TUM)
Andreas Mueller (TUM); Chair

This event is organised and funded by the Excellence Cluster Universe.

Update 26th August: First titles+abstracts online (see timetable)
Update 18th October: Registration deadline changed to 31st October.
Update 27th October: Call for abstracts closed.
Update 27th October: Final program is online!
Update 4th November: Map to venue added
Update 6th November: Buchner talk moved from Monday to Wednesday
Update 9th November: First talk files (pdf) online
HowTo Find Venue (map)
Program with abstracts (Update 4 Nov 2016)
Program without abstracts (Update 4 Nov 2016)
  • Alex Saro
  • Andreas Hoenle
  • Andreas Zöller
  • Anjishnu Bandyopadhyay
  • Christian Graf
  • Daniel Pumpe
  • David Straub
  • Eva Krägeloh
  • Fabian Knust
  • Fabio Baruffa
  • Felix Bott
  • Florian Kaspar
  • Frederik Beaujean
  • Guenter Duckeck
  • Ignacio Izaguirre
  • Inh Jee
  • Isabell Franck
  • Javad Komijani
  • Johannes Buchner
  • Jorge S. Diaz
  • Jovan Mitrevski
  • Julia Sawatzki
  • Jörg Dietrich
  • Lukas Bruder
  • Malin Renneby
  • Maria Cordero
  • Martin Losekamm
  • Maximilian Koschade
  • Maximilian Totzauer
  • Mohammad Mirkazemi
  • Natalia Porqueres
  • Paola Andreani
  • Peter Krizan
  • Philipp Bauer
  • Philipp Gadow
  • Reimar Leike
  • Rui Zhang
  • Sebastian Grandis
  • Tamas Norbert Varga
  • Thomas Kuhr
  • Thomas Pöschl
  • Varvara Batozskaya
  • Xun Shi
  • yu wang
    • 09:00 12:00
      Model comparison and testing
    • 09:00 09:45
      Overview of Model Comparison and Testing Approaches 45m
      Model testing and model comparison are performed in a number of ways in the sciences, and there is no consensus concerning best practice. The conceptual basis for different approaches will be presented. Model testing in frequentist and Bayesian style analysis will be discussed, and model comparison and model selection using Bayes factors, p-values and frequentist tests will be reviewed and commented.
      Speaker: Allen Caldwell (MPP)
    • 09:45 10:15
      Quantifying Tensions between Independent Data Sets 30m
      Modern Cosmology is blessed by a wealth of high-precision data sets, putting independent constraints on the underlying cosmological model. In this context, an important test for every model is the consistency and agreement of the constraints derived from the different independent measurements. Given the heterogeneity of cosmological data sets, this comparison is best performed in the space of model parameters. We present here a recently development measure of data set consistency, the ‘Surprise’, derived from the information theory. After comparing it to other proposed measures of data set agreement, we show how the Surprise can be estimated from samples of the prior and posterior distributions. Furthermore, we present different applications of the Surprise in cosmological context.
      Speaker: Sebastian Grandis (LMU)
    • 10:15 11:00
      Coffee 45m
    • 11:00 11:30
      Dynamic System Classifier 30m
      Stochastic differential equations describe well many physical, biological and sociological systems, despite the simplification often made in their derivation. Here the usage of simple stochastic differential equations to characterize and classify complex dynamical systems is proposed within a Bayesian framework. To this end, we develop a dynamic system classifier (DSC). The DSC first abstracts training data of a system in terms of time dependent coefficients of the descriptive stochastic differential equation. Thereby the DSC identifies unique correlation structures within the training data. For definite- ness we restrict the presentation of DSC to oscillation processes with a time dependent frequency ω(t) and damping factor γ(t). Although real systems might be more com- plex, this simple oscillator captures many characteristic features. The ω and γ timelines represent the abstract system characterization and permit the construction of efficient signal classifiers. Numerical experiments show that such classifiers perform well even in the low signal-to-noise regime.
      Speaker: Daniel Pumpe (MPA)
    • 11:30 12:00
      Model comparison challenges in ATLAS 30m
      The ATLAS experiment searches for new physics in the enormous amount of data recorded in proton-proton collisions delivered by the Large Hadron Collider (LHC), and measures the properties of the Standard Model of Particle Physics to increasing higher precision. The challenges in a typical search for new physics are numerous, but one key aspect is the construction of the best possible statistical model to describe the data most precisely which allows to test for any kind of new physics. This talk will describe the statistical methods and models used by the ATLAS experiment, with a particular emphasis on how to set limits on new physics or how to claim discovery.
      Speaker: Jeanette Lorenz (LMU)
    • 12:00 14:00
      Lunch 2h
    • 14:00 16:15
      Data interpretation for complex models
    • 14:00 14:45
      Scalable Scientific Data Visualization 45m
      In this talk a number of recent developments in the area of large data visualization at the Chair for Computer Graphics and Visualization at TUM will be discussed. Some of the results of this research will be presented, which have been achieved in collaboration with astrophysicists and meteorologists. The focus of the talk will be on scalability issues with respect to both the increasing amount of data and the increasing complexity of this data. Recent approaches for data compression and feature extraction, as well as approaches for visualizing the uncertainty that is present in ensembles of fields will be shown. In addition, the use of parallel graphics hardware to achieve interactivity will be demonstrated.
      Speaker: Rüdiger Westermann (TUM)
    • 14:45 15:15
      Challenges of atmospheric data assimilation 30m
      In this talk, we present the mechanisms of the data assimilation algorithms on examples that range from toy models to atmospheric applications. We focus on the ensemble Kalman filter algorithm to estimate the atmospheric state as well as its necessary modifications for our application. We argue that relaxing underlying assumptions of the data assimilation algorithms might be possible by improving the link between the data assimilation and the model. For example, the stronger connection can be established by constraining the analysis with imposing conservation laws and other physical constraints. Besides the inclusion of constraints in order to obtain more physically based solution that is consistent with both the nature and the prediction model, the problem of the representativeness error (mismatch of scales and processes present in observation and model) and model error are discussed.
      Speaker: Tijana Janjic Pfander (LMU Meteorology)
    • 15:15 15:45
      Coffee 30m
    • 15:45 16:15
      Bayesian Component Separation 30m
      Speaker: Jakob Knollmüller (MPA)
    • 09:00 14:30
      Machine learning
    • 09:00 09:45
      Machine learning methods in Astrophysics 45m
      Speaker: Giuseppe Longo (University Federico II, Napoli and Caltech, USA)
    • 09:45 10:15
      Machine learning in Cosmology 30m
      Ben will provide a brief introduction to machine learning, and discuss within which regimes it is a suitable statistical tool. He will highlight recent uses of machine learning in the astrophysics and cosmology literature, and describe some recent projects for which he has found machine learning to be a competitive tool, including star galaxy separation, optimised target selection, and photometric redshifts.
      Speaker: Ben Hoyle (LMU)
    • 10:15 10:45
      Coffee 30m
    • 10:45 11:15
      Rosat-2RXS counterparts using Nway – An accurate algorithm to pair sources simultaneously between N catalogs 30m
      The increasing number of surveys available at any wavelength is allowing the construction of Spectral Energy Distribution (SED) for any kind of astrophysical object. However, a) different surveys/instruments, in particular at X-ray, UV and MIR wavelength, have different positional accuracy and resolution and b) the surveys depth do not match each other and depending on redshift and SED, a given source might or might not be detected at a certain wavelength. All this makes the pairing of sources among catalogs not trivial, specially in crowded fields. In order to overcome this issue, we propose a new algorithm that combine the best of Bayesian and frequentist methods but that can be used as the common Likelihood Ratio (LR) technique in the simplest of the applications. In this talk Mara will introduce the code and how it has been used for finding the ALLWISE counterparts to the X-ray ROSAT All-sky survey.
      Speaker: Mara Salvato (MPE)
    • 11:15 11:45
      New Smart Data Services of the Digital Future 30m
      The ongoing digital transformation produces massive amounts of new data at every instant that may be used as the basis for the development of novel Smart Data Services with an increasing importance for our everyday life. In this talk René will review the status of data driven applications and will give an outlook on Smart Data innovations that will be possible in the next few years. He will discuss examples of current data challenges to be solved in the fields of Smart Energy, Smart Mobility, Smart City, Smart Factory as well as consumer data innovation.
      Speaker: René Fassbender (OmegaLambdaTec)
    • 11:45 12:15
      Mixture models in high dimensions 30m
      Maksim will talk about density estimation using mixture models, more specifically Gaussian and categorical mixture models. He will address their performance with increasing numbers of parameters (1000 parameters and more) and with complex non-linear dependencies between the parameters. He will also talk about conditional sampling from the trained mixture model.
      Speaker: Maksim Greiner (TUM)
    • 12:15 14:00
      Lunch 1h 45m
    • 14:00 14:30
      SOMBI: Bayesian identification of parameter relations in unstructured cosmological data 30m
      Generally, cosmological or astronomical observations are generated by a combination of different physical effects. Separating these observations into different subgroups, which permit to study individual aspects of the corresponding physical theory, is a particularly challenging task in regimes of huge datasets or in high dimensions where manual clustering of data is not feasible. As a response to this problem we present SOMBI, a Bayesian inference approach to search for data clusters and relations between observed parameters without human intervention. SOMBI aims to automatically identify relations between different observed parameters by first identifying data clusters in high-dimensional datasets via the self organizing map neural network algorithm. Parameter relations are then revealed by means of a Bayesian inference within respective identified data clusters.
      Speaker: Philipp Frank (LMU)
    • 14:30 15:15
      Compressive Computation 45m
      Asymmetry between prior proposals and posterior conclusions lies at the heart of inference. Passage between the two is generally irreversible. The prior supports the posterior, but *not* the other way round. Prior possibilities are eliminated when modulated by a zero likelihood factor, and cannot be recovered because dividing by that zero is impossible. Accordingly, the time-honoured techniques of reversible detailed balance do not apply. Instead, we need a one-way algorithm to do systematic compression. The standard technique of “simulated annealing” introduces likelihood modulation gradually through fractional powers, which fails because any fractional power of zero is still zero. That shows up as failure to compute phase changes. Compression is best accomplished iteratively by elimination of successive layers of low likelihood, which amounts to programming Lebesgue integration. This direct and general technique is “nested sampling”.
      Speaker: John Skilling (MaxEnt Data Consultant Ltd.)
    • 14:30 17:45
      Information theory
    • 15:15 15:45
      Information field theory 30m
      Information field theory (IFT) describes probabilistic image reconstruction from incomplete and noisy data. Based on field theoretical concepts IFT provides optimal methods to generate images exploiting all available information. Applications in astrophysics are galactic tomography, gamma- and radio- astronomical imaging, and the analysis of cosmic microwave background data.
      Speaker: Torsten ENSSLIN (MPA)
    • 15:45 16:15
      Coffee 30m
    • 16:15 16:45
      Quantification of model inadequacy within high-dimensional Bayesian inverse problems 30m
      While calibration can almost always been archived it becomes problematic if the underlying model is incorrect, which will lead to wrong predictions and interpretations. Traditional approaches use an additional regression model (e.g. GP) added to the model output or within a submodel to account for an underlying model error. This can either violate physical constraints and/or is infeasible in high dimensions. In this work we unfold conservation and constitutive laws to estimate model discrepancies accurately and use Variational Bayes to decrease computational costs. We investigate this problem within a high-dimensional inverse problem from solid mechanics where an identification of the mechanical properties can lead to noninvasive, medical diagnosis.
      Speaker: Isabell Franck (TUM)
    • 16:45 17:15
      Operator Calculus for Information Field Theory 30m
      Signal inference problems with non-Gaussian posteriors are very hard to tackle. Through using the concept of Gibbs free energy these problems can be rephrased as Gaussian problems for the price of computing expectation values of various functions with respect to a Gaussian distribution. We present a new way of translating these expectation values to the language of operators which allows us to simplify many calculations, especially calculations that arose from log-normal priors which are the natural priors for signals that vary over many orders of magnitude.
      Speaker: Reimar Leike (MPA)
    • 17:15 17:45
      NIFTy and D2O: A framework for numerical IFT implementations 30m
      NIFTy, “Numerical Information Field Theory”, is a versatile library designed to enable the development of signal inference algorithms that operate regardless of the underlying spatial grid and its resolution. Its object-oriented framework is written in Python, although it accesses libraries written in Cython, C++, and C for efficiency. NIFTy offers a toolkit that abstracts discretized representations of continuous spaces, fields in these spaces, and operators acting on fields into classes. In order to utilize the power of high-performance computing clusters NIFTy is built on D2O, a Python module for cluster-distributed multi-dimensional numerical arrays. An overview will be given.
      Speaker: Theo Steininger (MPA)
    • 18:30 20:30
      Dinner in Garching (self-paid) 2h
    • 09:00 12:00
      Data-constrained simulation
    • 09:00 09:30
      Large scale Bayesian inference in cosmology 30m
      Presently proposed and designed future cosmological probes and surveys permit us to anticipate the upcoming avalanche of cosmological information during the next decades. The increase of valuable observations needs to be accompanied with the development of efficient and accurate information processing technology in order to analyse and interpret this data. The analysis of the structure and evolution of our inhomogeneous Universe therefore requires to solve non-linear statistical inference problems in very high dimensional parameter spaces, involving on the order of 10^7 or more parameters. In this talk Jens will address the problem of high dimensional Bayesian inference from cosmological data sets via the recently proposed BORG algorithm. This method couples an approximate model of structure formation to an Hybrid Monte Carlo algorithm providing a fully probabilistic, physical model of the non-linearly evolved density field as probed by galaxy surveys. Besides highly accurate and detailed measurements of three dimensional density and velocity fields, this methodology also infers plausible dynamic formation histories for the observed large scale structure.
      Speaker: Jens Jasche (TUM)
    • 09:30 10:00
      Modeling star and planet formation: how to infer the physics from observations? 30m
      Understanding the physics that governs the star and planet formation process is one of the major challenges of modern astrophysics. These are key processes that govern the cycling of diffuse matter into stars and the formation of planetary systems. The key difficulties that we face as observational astrophysicists are that we are not able to run controlled experiments. We do observe the possible initial conditions, the processes at work, and their final outcome, but not as a well defined sequential experiment, as, in most cases, we can only make educated guesses on the past history and future outcome of any object we observe. Leonardo will present two approaches that can be followed to understand the underlying physics in the star and plant formation process: simplified modeling of large samples of objects, to identify the main physical properties, and the comparison of detailed numerical simulations with observed objects. This latter approach offer the possibility of running controlled (numerical) experiments, but the challenge is to compare these with real observations.
      Speaker: Leonardo Testi (ESO)
    • 10:00 10:30
      Physical Modelling of X-Shooter data 30m
      One of the main steps in analysis and calibration of optical spectroscopy is to remove the instrument as well as the atmosphere signature. In particular, it means generating a one dimensional spectrum (wavelength, intensity) from a 2D CCD frame. Wolfgang has developed a model that allows a good reconstruction of this 2D Frame and is working on the fitting routines. In this talk he will show how combining linear least squares with non-linear least squares is a good approach to combat the large number of parameters in a novel technique. Wolfgang will conclude by showing first results and discussion about the broader applications of this technique.
      Speaker: Wolfgang Kerzendorf (ESO)
    • 10:30 11:00
      Coffee 30m
    • 11:00 11:30
      Modeling gene expression from genetic data 30m
      The cells of nearly all cell types of your body contain the same copy of your genome. Why do they do so different things? Different cells read different parts of the genome. For each of the 22,000 genes of your genome, the amount of RNA and protein molecules that a cell makes depends on the cell type and on stimulations from the cellular environment. In turn, this regulatory program is encoded in the genome, within and also between the genes. Understanding the genetic regulatory code and how errors in the regulatory program can lead to diseases is the research topic of Julien’s lab. In this talk, he will present statistical and machine leaning models which, by integrating large-scale genetic and molecular datasets, help deciphering the regulatory code and how these models allow predicting quantitatively effects of genetic mutations on gene regulation. He will finish by showing how such integrative analyses can help pinpointing the genetic defects of patients with rare diseases.
      Speaker: Julien Gagneur (TUM)
      Gagneur Slides
    • 11:30 12:00
      Population inference in Astronomy - Demographics from limited samples, luminosity functions, hierarchical models 30m
      Hierarchical models allow to infer distributions of an underlying population from a sample of detected objects with uncertain measurements. Since decades, astronomers have employed what is now popular as hierarchical (Bayesian) inference, including the consistent handling of selection biases. Yet these powerful methods are under-used today when dealing with samples and their uncertainties. I will give an introduction and demonstrate that the method is simple to apply for a wide range of problems, including deriving the intrinsic luminosity distribution of a flux-limited sample, estimation of population properties and distinction between physical models.
      Speaker: Johannes Buchner (MPE)
    • 12:00 14:00
      Lunch 2h
    • 14:00 14:45
      Adaptive sparse grids for high-dimensional machine learning 45m
      Speaker: Valeriy Khakhutskyy (TUM)
    • 14:00 17:15
      Optimization and optimal control
    • 14:45 15:15
      Sampling from a Gaussian in high dimensions 30m
      Samples from a Gaussian form the basis of many more sophisticated algorithms. While simple in low number of dimensions, it turns out to be very challenging in millions of dimensions when matrix inversions or factorizations have to be avoided because of poor scaling. Fred will present a comparison of reflective slice sampling and Hamiltonian Monte Carlo, both relying on the gradient of the target density.
      Speaker: Frederic Beaujean (LMU)
    • 15:15 15:45
      Robust Parameter estimation 30m
      Data collections, like astronomical surveys or sets of simulation results generally contain entries with limited information about their accuracy. In addition, the ubiquitous assumption of Gaussian uncertainties is often not adequate. We review the nature of this problem and present recent methodological developments in the area of robust estimation. In particular recent developments in the application of L1-regression techniques to multidimensional gridded data sets are discussed.
      Speaker: Udo von Toussaint (IPP)
    • 15:45 16:15
      Coffee 30m
    • 16:15 16:45
      Sparse Optimal Control in Measure Spaces 30m
      In this talk we consider optimal control problem governed by elliptic and parabolic equations, where the control variable lies in a measure space. Such formulations lead to a sparse structure of the optimal control, which provides among other things an elegant way to attack problems of optimal actuator or sensor placement as well as point source identification problems. We discuss the functional analytic settings of such problems and the regularity issues of the optimal solutions. Moreover, we present a discretization concept and discuss error estimates for the discretization error.
      Speaker: Boris Vexler (TUM)
    • 16:45 17:15
      Performance enhancement via surrogate minimization 30m
      Finding the Maximum a Posteriori solution is a numerically challenging problem, especially when estimating an expensive objective function defined in an high-dimensional design domain. We propose to use Kriging surrogates to speed up optimization schemes, like steepest descent. Surrogate models are built and incorporated in a sequential optimization strategy. Results are presented with application on astronomical images, showing the proposed method can effectively search the global optimum.
      Speaker: Fabrizia Guglielmetti (MPA)
    • 17:15 17:45
      Discussion session
Your browser is out of date!

Update your browser to view this website correctly. Update my browser now