IG4DS

International Conference on Information Geometry for Data Science

September 19 - 23, 2022, Hamburg, Germany

Traditionally, information geometry has been concerned with the identification of natural geometric structures of statistical models. It has been demonstrated that their use has a great impact on the quality of statistical methods and learning algorithms. One instance of this is given by the natural gradient method, which improves the learning simply by utilising the natural geometry induced by the Fisher-Rao metric. The general geometric perspective of information geometry had already a great influence on machine learning and is expected to further influence the general field of data science.

This conference will bring together scientists from various fields in order to explore the potential of information geometry for the foundations of data science. In addition to invited keynote presentations of leading experts, it will accommodate contributed oral and poster presentations. The submissions of corresponding extended abstracts (two pages at least and five pages at most) will be reviewed by the scientific committee with regard to their quality and their potential relevance for data science.

The general worldwide situation creates serious challenges for international travel, making long-term planning very difficult. This is why we decided, unfortunately, to run the conference fully virtually. On the other hand, this offers some flexibility in our planning. First of all, we can now extend the deadline for submissions of contributions. The new deadline is July 24th, 2022, at 23:59 (CEST). Furthermore, the conference fee can be further reduced to 60 Euros. All registrations already made will be issued according to this new amount. The authors will receive a notification of the review outcome until August 12th, 2022.

The lectures of the conference will be recorded and made accessible to registered participants, assuming the consent of the speakers.

In conjunction with the conference, there will be several special issues of the Springer journal Information Geometry (INGE, https://www.springer.com/journal/41884). We particularly encourage submissions of articles to be considered for publication to these special issues, if the authors intend to participate in the conference. When submitting an article, the authors should kindly indicate whether they would like their work to be also considered for a presentation at IG4DS and therefore reviewed by its scientific committee. The calls at the following special issues are closed now. Please contact the editor, if you consider to submit an article for one of these special issues:

INGE Special Issue on “Information Geometry for Data Science” (https://www.dsf.tuhh.de/index.php/information-geometry/),
Editor:
Nihat Ay (Hamburg University of Technology)

INGE Special Issue on “Information Geometry for Algorithms: Learning, Signal Processing and Applications”,
Editors:
Noboru Murata (Waseda University, Japan)
Klaus-Robert Müller (Technical University of Berlin, Germany)
Aapo Hyvärinen (University of Helsinki, Finland)
Shotaro Akaho (AIST, Japan)

 

 

Confirmed invited speakers

  • Kenji Fukumizu (The Institute of Statistical Mathematics, Tokyo, Japan)
    Biography

    KENJI FUKUMIZU is a Professor with The Institute of Statistical Mathematics, where he also serves as Director of the Research Center for Statistical Machine Learning. He received his B.S. and Ph.D. degrees from Kyoto University in 1989 and 1996, respectively. He joined the Research and Development Center, Ricoh Co., Ltd in 1989, worked as a Researcher at the Institute of Physical and Chemical Research (RIKEN) from 1998, and became an Associate Professor with The Institute of Statistical Mathematics in 2000. He was a visiting scholar at the Department of Statistics, UC Berkeley in 2002-2003, and a Humboldt Fellow at Max Planck Institute for Biological Cybernetics in 2006-2007. He has been serving as Technical Advisor at Preferred Networks, Inc. since 2018. He has served as an organizing committee member for many conferences, including Program Co-Chair of AISTATS 2021, and Area Chair on the Program Committees of NeurIPS and ICML. He has also served as Chief Editor of the Annals of the Institute of Statistical Mathematics from 2011-2017 and currently serves as an Action Editor for the Journal of Machine Learning Research.
  • Hideitsu Hino (The Institute of Statistical Mathematics, Tokyo, Japan)
    Biography

    Hideitsu Hino received his Bachelor’s degree in engineering in 2003, and Master’s degree in Applied Mathematics and Physics in 2005 from Kyoto University.
    He earned Doctor’s degree in engineering in 2010 from Waseda University.
    He is a Professor at The Institute of Statistical Mathematics. His research interest includes the analysis of learning algorithms from the view point of geometry.
  • Dominik Janzing (Amazon Research, Tübingen, Germany)
    Biography

    Dominik Janzing received his Diploma in Physics in 1995 and 1998 his PhD in Mathematics in Tübingen. In 2006 he received his Habilitation (teaching permission) in Computer Science at the Karlsruhe Institute of Technology (KIT). From 1995 to 2006, he worked in quantum information theory, quantum thermodynamics, and quantum complexity theory at KIT, where he simultaneously started research on Causal Inference in 2003. In 2007 he joined Max-Planck-Institute of Biological Cybernetics in Tübingen, where he co-founded the team “Causal Inference” together with Bernhard Schölkopf, later at Max Planck Institute for Intelligent Systems. In 2018 he joined Amazon Research Tübingen where he leads foundational and applied research on Causal Inference.
  • Emtiyaz Khan (RIKEN Center for Advanced Intelligence Project, Japan)
    Biography

    Emtiyaz Khan (also known as Emti) is a team leader at the RIKEN center for Advanced Intelligence Project (AIP) in Tokyo where he leads the Approximate Bayesian Inference Team. Previously, he was a postdoc and then a scientist at Ecole Polytechnique Fédérale de Lausanne (EPFL), where he also taught two large machine learning courses and received a teaching award. He finished his PhD in machine learning from University of British Columbia in 2012. The main goal of Emti’s research is to understand the principles of learning from data and use them to develop algorithms that can learn like living beings. For more than a decade, his work has focused on developing Bayesian methods that could lead to such fundamental principles. The approximate Bayesian inference team now continues to use these principles, as well as derive new ones, to solve real-world problems.
  • Wuchen Li (University of South Carolina, USA)
    Biography

    Wuchen Li received his BSc in Mathematics from Shandong university in 2009. He obtained a Ph.D. in Mathematics from the Georgia Institute of Technology in 2016. He was a CAM Assistant Adjunct Professor in the Department of Mathematics at the University of California, Los Angeles, from 2016 to 2020. Now, he is an assistant professor at the University of South Carolina. His research interests include optimal transport, information geometry, and mean field games with applications in data science, scientific computing and elsewhere.
  • James Martens (DeepMind, London, UK)
    Biography

    James Martens is a Staff Research Scientist at DeepMind working on theoretically-motivated neural network training methods (including approximate natural gradient methods), initialization schemes, regularizers, and architecture design principles. He obtained his undergraduate in Pure Math and Computer Science from U of Waterloo, and his Masters and PhD in Machine Learning from U of Toronto supervised by Geoff Hinton and Rich Zemel.
  • Guido Montúfar (MPI for Mathematics in the Sciences, Germany & UCLA, USA)
    Biography

    Guido Montúfar is an Associate Professor of Mathematics and Statistics at UCLA and Head of the Math Machine Learning Group at the Max Planck Institute for Mathematics in the Sciences. His research focuses on deep learning theory and more generally mathematical aspects of machine learning. He studied mathematics and theoretical physics at TU Berlin, obtained the Dr.rer.nat. in 2012 as an IMPRS fellow in Leipzig, and was a postdoc at PennState and MPI MiS. Guido Montúfar is the recipient of an ERC Starting Grant, an NSF CAREER award, and he is a 2022 Alfred P. Sloan Research Fellow.
  • Klaus-Robert Müller (TU Berlin, Germany)
    Biography

    I am Full Professor for Machine Learning at the department of Computer Science at Technische Universität Berlin and at the Department of Cognitive Science and Engineering at Korea University, Seoul. For 5 years I was director of the Bernstein Center for Neurotechnology, from 2014 I became Co-director of the Berlin Center for Big Data and from 2018 I simultaneously became director of the Berlin Machine Learning Center. In 2020/2021, I was on a short sabbatical from academia to lead a team at Google Brain. In 2012, I was elected to be a member of the German National Academy of Sciences – Leopoldina, in 2017 of the Berlin Brandenburg Academy of Sciences, in 2022 member of the German National Academy of Engineering and also in 2017 as an external scientific member of the Max-Planck Society (MPII). In 2014, I received the Berlin Science prize awarded by the governing Mayor of Berlin; in 2017 the Vodafone Innovation Award. Form 2019 on I became ISI Highly Cited Researcher. My research interest is in the field of machine learning, deep learning and data analysis covering a wide range of theory and numerous scientific (Physics, Chemistry and Neuroscience) and industrial applications.
  • Masafumi Oizumi (The University of Tokyo, Japan)
  • Gabriel Peyré (CNRS & École normale supérieure, France)
    Biography

    Gabriel Peyré is CNRS senior researcher and professor at the Ecole Normale Supérieure, Paris.
    He works at the interface between applied mathematics, imaging and machine learning.
    He obtained 2 ERC grants (starting in 2010 and consolidator in 2017), the Blaise Pascal prize from the French academy of sciences in 2017,
    the Magenes Prize from the Italian Mathematical Union in 2019 and the silver medal from CNRS in 2021.
    He is invited speaker at the European Congress for Mathematics in 2020. He is the deputy director of the Prairie Institute for artificial intelligence,
    the director of the ENS center for data science and the former director of the GdR CNRS MIA.
    He is the head of the ELLIS (European Lab for Learning & Intelligent Systems) Paris Unit (https://ellis-paris.github.io/).
    He is engaged in reproducible research and code education, in particular through the platform www.numerical-tours.com.
  • Minh Ha Quang (RIKEN Center for Advanced Intelligence Project, Japan)
    Biography

    Minh Ha Quang received his PhD in Mathematics from Brown University (USA) under the supervision of Stephen Smale.
    He currently leads the Functional Analytic Learning Unit at the RIKEN Center for Advanced Intelligence Project in Tokyo, Japan.
    Before joining RIKEN, he was a researcher at the Italian Institute of Technology in Genova, Italy.
    His current research interests are machine learning and statistical methodologies using functional analysis, information geometry, and optimal transport.
  • Sho Sonoda (RIKEN Center for Advanced Intelligence Project, Japan)
    Biography

    I am a research scientist at the Deep Learning Theory team (PI: Assoc. Proc. Taiji Suzuki), RIKEN Center for Advanced Intelligence Project (RIKEN AIP), Tokyo, Japan. I received the degree of Doctor of Engineering from Waseda University in 2017 under the supervision of Prof. Noboru Murata, and my major is in machine learning. One of my primary research questions is to understand/control the parameters of neural networks. I have been working on a mathematical theory of neural networks, called ridgelet analysis, since my undergrad research in 2008. Since then, I have investigated neural network theories using Wasserstein geometry, probabilistic numerics, functional analysis, harmonic analysis on homogeneous spaces, and representation theory.
  • Leonard Wong (University of Toronto, Canada)
    Biography

    Leonard Wong is an Assistant Professor in Statistics at the Department of Statistical Sciences, University of Toronto. He obtained his BSc and MPhil degrees at the Chinese University of Hong Kong, and received his PhD in Mathematics from the University of Washington. Before joining U of T in 2018, he spent two years at the University of Southern California as a non-tenure track Assistant Professor in Financial Mathematics. His current research interests include mathematical finance, probability, optimal transport and information geometry, as well as applications in statistics and machine learning. He is currently an associate editor of Information Geometry.

 

 

September 19 – 23, 2022

Monday, Sep 19, 2022

09:15 – 09:30 Nihat Ay: Welcome Address
09:30 – 10:30 Emtiyaz Khan: The Bayesian Learning Rule for Adaptive AI
Humans and animals have a natural ability to autonomously learn and quickly adapt to their surroundings. How can we design AI systems that do the same? In this talk, I will present Bayesian principles to bridge such gaps between humans and AI. I will show that a wide-variety of machine-learning algorithms are instances of a single learning-rule called the Bayesian learning rule. The rule unravels a dual perspective yielding new adaptive mechanisms for machine-learning based AI systems. My hope is to convince the audience that Bayesian principles are indispensable for an AI that learns as efficiently as we do.

10:30 – 10:45 break
10:45 – 11:15 Csongor Huba Varady, Luigi Malago, Riccardo Volpi, Nihat Ay: Natural Reweighted Wake-Sleep
11:15 – 11:45 Masanari Kimura, Hideitsu Hino: Information Geometry of Dropout Training
11:45 – 12:15 Geoffrey Wolfer, Shun Watanabe: Information Geometry of Reversible Markov Chains
12:15 – 13:30 break
13:30 – 14:30 Guido Montúfar: Memoryless policy optimization in POMDPs
We consider the problem of optimizing the expected long term reward in a Partially Observable Markov Decision Process over the set of memoryless stochastic policies. In this talk I will discuss the properties of the objective function, in particular the existence of policy improvement cones and optimizers in low-dimensional subsets of the search space. Then I will discuss how the problem can be formulated as the optimization of a linear function over a constrained set of state-action frequencies and present descriptions of the parametrization and the constraints, which allows us to estimate the number of critical points and formulate optimization strategies in state-action space. The talk is based on works with Johannes Rauh and Nihat Ay and recent works with Johannes Müller.

14:30 – 14:45 break
14:45 – 15:15 Rob Brekelmans, Frank Nielsen: Rho-Tau Bregman Information and the Geometry of Annealing Paths
15:15 – 15:45 Alessandro Bravetti, Maria L. Daza-Torres, Hugo Flores-Arguedas, Michael Betancourt: Bregman dynamics, contact transformations and convex optimization
15:45 – 16:15 Wu Lin: Structured second-order methods via natural-gradient descent
16:15 – 16:30 break
16:30 – 17:30 James Martens: Rapid training of deep neural networks without skip connections or normalization layers using Deep Kernel Shaping
Using an extended and formalized version of the Q/C map analysis of Pool et al. (2016) along with Neural Tangent Kernel theory, we identify the main pathologies present in deep networks that prevent them from training fast and generalizing to unseen data, and show how these can be avoided by carefully controlling the “shape” of the network’s initialization-time kernel function. We then develop a method called Deep Kernel Shaping (DKS), which accomplishes this using a combination of precise parameter initialization, activation function transformations, and small architectural tweaks, all of which preserve the model class. In our experiments we show that DKS enables SGD training of residual networks without normalization layers on Imagenet and CIFAR-10 classification tasks at speeds comparable to standard ResNetV2 and Wide-ResNet models, with only a small decrease in generalization performance. And when using K-FAC as the optimizer, we chieve similar results for networks without skip connections. Our results apply for a large variety of activation functions, including hose which traditionally perform very badly, such as the logistic sigmoid. In addition to DKS, we contribute a detailed analysis of skip connections, normalization layers, special activation functions like RELU and SELU, and various initialization schemes, explaining their effectiveness as alternative (and ultimately incomplete) ways of “shaping” the network’s initialization-time kernel.

Tuesday, Sep 20, 2022

09:30 – 10:30 Minh Ha Quang: Rényi divergences in RKHS and Gaussian process settings
Rényi divergences, including in particular its special case the Kullback-Leibler divergence, play an important role in numerous problems in statistics, probability, and machine learning. In this talk, we present their regularized versions in the reproducing kernel Hilbert space (RKHS) and Gaussian process settings. These are formulated using the Alpha Log-Det divergences on the Hilbert manifold of positive definite Hilbert-Schmidt operators on a Hilbert space. We show that these infinite-dimensional divergences can be consistently estimated from finite sample data, with dimension-independent convergence rates. The theoretical formulations will be illustrated with applications in functional data analysis.

10:30 – 10:45 break
10:45 – 11:15 Jakub Bober, Anthea Monod, Emil Saucan, Kevin N. Webster: Rewiring Networks for Graph Neural Network Training Using Discrete Geometry
11:15 – 11:45 Geoffrey Wolfer, Shun Watanabe: Geometric Aspects of Data-Processing of Markov Chains
11:45 – 12:15 Riccardo Volpi, Luigi Malago: Alpha-Embeddings for Natural Language Processing
12:15 – 13:30 break
13:30 – 14:30 Hideitsu Hino: A Geometrical Generalization of Covariate Shift
Many machine learning methods assume that the training data and the test data follow the same distribution, but in the real world, this assumption is very often violated. In particular, the phenomenon that the marginal distribution of the data changes is called the covariate shift, and it is one of the most important research topics. We show that the well-known family of methods for covariate shift adaptation can be unified in the framework of information geometry. Furthermore, we show that parameter search for geometrically generalized methods of covariate shift adaptation can be achieved efficiently by information criterion for a simple parametric case, or by a Bayesian optimization method in general case. It is experimentally shown that the proposed generalization can almost always achieves better performance than the existing methods it encompasses. This work was done in collaboration with my Mr. Masanari Kimura.

14:30 – 14:45 break
14:45 – 15:15 Jesse van Oostrum, Johannes Müller, Nihat Ay: Parametrisation Invariance of the Natural Gradient in Overparametrised Systems
15:15 – 15:45 Henrique K. Miyamoto, Fábio C. C. Meneghetti, Sueli I. R. Costa: The Fisher-Rao Loss for Learning under Label Noise
15:45 – 16:15 Keiji Miura, Ruriko Yoshida: Plücker Coordinates of the best-fit Stiefel Tropical Linear Space to a Mixture of Gaussian Distributions
16:15 – 16:30 break
16:30 – 17:30 Leonard Wong: Logarithmic divergences
Divergences such as Bregman and Kullback-Leibler divergences are fundamental in probability, statistics and machine learning. We begin by explaining how divergences arise naturally from the geometry of optimal transport. Then, we study a family of logarithmic costs – originally motivated by financial applications – which may be regarded as a canonical deformation of the negative dot product in Euclidean quadratic transport. It induces a logarithmic divergence which has remarkable probabilistic and geometric properties. As an application, we introduce a generalization of continuous-time mirror descent.

Wednesday, Sep 21, 2022

09:30 – 10:30 Gabriel Peyré: Scaling Optimal Transport for High dimensional Learning
Optimal transport (OT) has recently gained lot of interest in machine learning. It is a natural tool to compare in a geometrically faithful way probability distributions. It finds applications in both supervised learning (using geometric loss functions) and unsupervised learning (to perform generative model fitting). OT is however plagued by the curse of dimensionality, since it might require a number of samples which grows exponentially with the dimension. In this talk, I will explain how to leverage entropic regularization methods to define computationally efficient loss functions, approximating OT with a better sample complexity. More information and references can be found on the website of our book “Computational Optimal Transport”: https://optimaltransport.github.io/

10:30 – 10:45 break
10:45 – 11:15 Emil Saucan, Vladislav Barkanass, Jürgen Jost: Coarse geometric kernels for networks embedding
11:15 – 11:45 Emil Saucan, Vladislav Barkanass: Can we see the shape of our data?
11:45 – 12:15 Pablo A. Morales, Jan Korbel, Fernando E. Rosas: Geometric and thermodynamic implications of deformed Legendre transform on curved statistical manifolds
12:15 – 13:30 break
13:30 – 14:30 Dominik Janzing: Causal Maximum Entropy Principle: inferring distributions from causal directions and vice versa
The principle of insufficient reason (PIR) assigns equal probabilities to each alternative of a random experiment whenever there is no reason to prefer one over the other. The maximum entropy principle (MaxEnt) generalizes PIR to the case where statistical information like expectations are given. It is known that both principles result in paradoxical probability updates for joint distributions of cause and effect. This is because constraints on the conditional P(effect∣cause) result in changes of P(cause) that assign higher probability to those values of the cause that offer more options for the effect, suggesting “intentional behavior.” Earlier work therefore suggested sequentially maximizing (conditional) entropy according to the causal order, but without further justification apart from plausibility on toy examples. I justify causal modifications of PIR and MaxEnt by separating constraints into restrictions for the cause and restrictions for the mechanism that generates the effect from the cause. I further sketch why causal PIR also entails “Information Geometric Causal Inference.” I will also briefly discuss problems of generalizing the causal version of MaxEnt to arbitrary causal DAGs, which are related to the non-trivial relation between directed and undirected graphical models. I will also describe our recent work on merging datasets to obtain more causal insights.

[1] D. Janzing: Causal versions of maximum entropy and principle of insufficient reason, Journal of Causal Inference, 2021.
[2] S.- H. Garrido-Mejia, E. Kirschbaum, D. Janzing: Obtaining Causal Information by Merging Datasets with MAXENT, AISTATS 2022.
[3] D. Janzing, J. Mooij, K. Zhang, J. Lemeire, J. Zscheischler, P. Daniusis, B. Steudel, B. Schölkopf: Information-geometric approach to inferring causal directions, Artificial Intelligence 2013.

14:30 – 14:45 break
14:45 – 15:15 M. Ashok Kumar, Kumar Vijay Mishra: Information Geometry of Relative Alpha-Entropy
15:15 – 15:45 José Crispín Ruíz-Pantaleón: Areas on the space of smooth probability density functions on S^2
15:45 – 16:15 Max von Renesse: Entropic Regularization and Iterative Scaling for Unbalanced Optimal Transport – A Reprise: The Sinkhole Algorithm
16:15 – 16:30 break
16:30 – 17:30 Wuchen Li: Transport information Bregman divergences
We study Bregman divergences in probability density space embedded with the Wasserstein-2 metric. Several properties and dualities of transport Bregman divergences are provided. In particular, we derive the transport Kullback-Leibler (KL) divergence by a Bregman divergence of negative Boltzmann-Shannon entropy in Wasserstein-2 space. We also derive analytical formulas and generalizations of transport KL divergence for one-dimensional probability densities and Gaussian families. We also discuss some connections between Wasserstein-2 geometry and information geometry.

Thursday, Sep 22, 2022

09:30 – 10:30 Masafumi Oizumi: Unified framework for quantifying causal influences based on information geometry
Assessment of causal influences is a ubiquitous and important subject across diverse research fields. Whereas pairwise causal influences between elements can be easily quantified, quantifying multiple influences among many elements poses two major mathematical difficulties. First, overestimation occurs due to interdependence among influences if each influence is separately quantified in a part-based manner and then simply summed over. Second, it is difficult to isolate causal influences while avoiding noncausal confounding influences. To resolve these difficulties, we propose a theoretical framework based on information geometry for the quantification of multiple causal influences with a holistic approach. In this framework, we quantify causal influences as the divergence between the actual probability
distribution of a system and a constrained probability distribution where causal influences among elements are statistically disconnected. This framework provides intuitive geometric interpretations harmonizing various information theoretic measures in a unified manner, including mutual information (predictive information), transfer entropy, stochastic interaction, and integrated information, each of which is characterized by how causal influences are disconnected. Our framework should help to analyze causal relationships in complex systems in a complete and hierarchical manner.

10:30 – 10:45 break
10:45 – 11:15 Carlotta Langer, Nihat Ay: Gradually Increasing the Latent Space in the em-Algorithm
11:15 – 11:45 Masahito Hayashi: Algorithm for rate distortion theory based on em algorithm
11:45 – 12:15 Hisatoshi Tanaka: Efficient Design of Randomised Experiments
12:15 – 13:30 break
13:30 – 14:30 Kenji Fukumizu: Stability in learning of generative adversarial networks
Generative adversarial networks (GANs) learn the probability distirubution of data and generate samples from the learned distribution. While GANs show a remarkable ability to generate samples of high quality, it is known that GANs often show unstable behavior during training. In this work, we develop a theoretical framework for understanding the stability of learning GAN models. We discuss the dynamics of probabilities acquired by the generator of GAN, and derive sufficient conditions that guarantee the convergence of the gradient descent learning. We show that existing GAN variants with stabilization techniques satisfy some, but not all, of these conditions. Using tools from convex analysis, optimal transport, and reproducing kernels, we construct a GAN that fulfills these conditions simultaneously.

14:30 – 14:45 break
14:45 – 15:15 Jun Zhang: Partially-Flat Geometry and Natural Gradient Method
15:15 – 15:45 Ionas Erb: Power Transformations of Relative Count Data as a Shrinkage Problem
15:45 – 16:15 Uriel Legaria, Sergio Martinez, Sergio Mota, Alfredo Coba, Argenis Chable, Antonio Neme: Anomaly detection in the probability simplex under different geometries and distances
16:15 – 16:30 break
16:30 – 17:30 Klaus-Robert Müller: Applications of Geometrical Concepts for Learning
I will address the usage of geometrical concepts across a number of application domains. One is to add to the theoretical backbone of explainable AI by studying Diffeomorphic Counterfactuals and Generative Models. The other one — if time permitting — will consider the inclusion of problem inherent (e.g. Lie-group) invariance structure to build less data hungry models.

Friday, Sep 23, 2022

09:30 – 10:30 Sho Sonoda: The Ridgelet Transforms of Neural Networks on Manifolds and Hilbert Spaces
To investigate how neural network parameters are organized and arranged, it is easier to study the distribution of parameters than to study the parameters in each neuron. The ridgelet transform is a pseudo-inverse operator (or an analysis operator) that maps a given function f to the parameter distribution \gamma so that the network S[\gamma] represents f. For depth-2 fully-connected networks on Euclidean space, the closed-form expression has been known, so it could describe how the parameters are organized. However, the closed-form expression has not been known for a variety of today’s neural networks. Recently, our research group has found to systematically derive ridgelet transforms for fully-connected layers on manifolds (non-compact symmetric spaces) and for group convolution layers on abstract Hilbert spaces. In this talk, the speaker will explain a natural way to derive those ridgelet transforms

10:30 – 10:45 break
10:45 – 11:15 Kazu Ghalamkari, Mahito Sugiyama: Non-negative low-rank approximations for multi-dimensional arrays on statistical manifold
11:15 – 11:45 Hiroshi Matsuzoe: Geometry of quasi-statistical manifolds and geometric pre-divergences
11:45 – 12:15 Domenico Felice, Nihat Ay: A canonical divergence from the perspective of data science
12:15 – 13:30 break
13:30 – 14:30 Frank Nielsen, Jun Zhang: Questions and Answers (see tutorials)

Frank Nielsen:
Video: “Introduction to Information Geometry” by Frank Nielsen
Slides: PrintIntroductionInformationGeometry-FrankNielsen.pdf
Email: frank.nielsen.x@gmail.com

Jun Zhang:
Video: Information Geometry Tutorial (2021, BANFF-CMO)
Email: junz@umich.edu

14:30 – 14:45 break
14:45 – 15:45 Nihat Ay: Summary and concluding discussion on Information Geometry for Data Science

Hideitsu Hino
Title: A Geometrical Generalization of Covariate Shift
short Abstract

Many machine learning methods assume that the training data and the test data follow the same distribution, but in the real world, this assumption is very often violated. In particular, the phenomenon that the marginal distribution of the data changes is called the covariate shift, and it is one of the most important research topics. We show that the well-known family of methods for covariate shift adaptation can be unified in the framework of information geometry. Furthermore, we show that parameter search for geometrically generalized methods of covariate shift adaptation can be achieved efficiently by information criterion for a simple parametric case, or by a Bayesian optimization method in general case. It is experimentally shown that the proposed generalization can almost always achieves better performance than the existing methods it encompasses. This work was done in collaboration with my Mr. Masanari Kimura.

Dominik Janzing
Title: Causal Maximum Entropy Principle: inferring distributions from causal directions and vice versa
short Abstract

The principle of insufficient reason (PIR) assigns equal probabilities to each alternative of a random experiment whenever there is no reason to prefer one over the other. The maximum entropy principle (MaxEnt) generalizes PIR to the case where statistical information like expectations are given. It is known that both principles result in paradoxical probability updates for joint distributions of cause and effect. This is because constraints on the conditional P(effect∣cause) result in changes of P(cause) that assign higher probability to those values of the cause that offer more options for the effect, suggesting “intentional behavior.” Earlier work therefore suggested sequentially maximizing (conditional) entropy according to the causal order, but without further justification apart from plausibility on toy examples. I justify causal modifications of PIR and MaxEnt by separating constraints into restrictions for the cause and restrictions for the mechanism that generates the effect from the cause. I further sketch why causal PIR also entails “Information Geometric Causal Inference.” I will also briefly discuss problems of generalizing the causal version of MaxEnt to arbitrary causal DAGs, which are related to the non-trivial relation between directed and undirected graphical models. I will also describe our recent work on merging datasets to obtain more causal insights.

Literature:
[1] D. Janzing: Causal versions of maximum entropy and principle of insufficient reason, Journal of Causal Inference, 2021.
[2] S.- H. Garrido-Mejia, E. Kirschbaum, D. Janzing: Obtaining Causal Information by Merging Datasets with MAXENT, AISTATS 2022.
[3] D. Janzing, J. Mooij, K. Zhang, J. Lemeire, J. Zscheischler, P. Daniusis, B. Steudel, B. Schölkopf: Information-geometric approach to inferring causal directions, Artificial Intelligence 2013.

Emtiyaz Khan
Title: The Bayesian Learning Rule for Adaptive AI
short Abstract

Humans and animals have a natural ability to autonomously learn and quickly adapt to their surroundings. How can we design AI systems that do the same? In this talk, I will present Bayesian principles to bridge such gaps between humans and AI. I will show that a wide-variety of machine-learning algorithms are instances of a single learning-rule called the Bayesian learning rule. The rule unravels a dual perspective yielding new adaptive mechanisms for machine-learning based AI systems. My hope is to convince the audience that Bayesian principles are indispensable for an AI that learns as efficiently as we do.

Wuchen Li
Title: Transport information Bregman divergences
short Abstract

We study Bregman divergences in probability density space embedded with the Wasserstein-2 metric. Several properties and dualities of transport Bregman divergences are provided. In particular, we derive the transport Kullback-Leibler (KL) divergence by a Bregman divergence of negative Boltzmann-Shannon entropy in Wasserstein-2 space. We also derive analytical formulas and generalizations of transport KL divergence for one-dimensional probability densities and Gaussian families. We also discuss some connections between Wasserstein-2 geometry and information geometry.

James Martens
Title: Rapid training of deep neural networks without skip connections or normalization layers using Deep Kernel Shaping
short Abstract

Using an extended and formalized version of the Q/C map analysis of Pool et al. (2016), along with Neural Tangent Kernel theory, we identify the main pathologies present in deep networks that prevent them from training fast and generalizing to unseen data, and show how these can be avoided by carefully controlling the “shape” of the network’s initialization-time kernel function. We then develop a method called Deep Kernel Shaping (DKS), which accomplishes this using a combination of precise parameter initialization, activation function transformations, and small architectural tweaks, all of which preserve the model class. In our experiments we show that DKS enables SGD training of residual networks without normalization layers on Imagenet and CIFAR-10 classification tasks at speeds comparable to standard ResNetV2 and Wide-ResNet models, with only a small decrease in generalization performance. And when using K-FAC as the optimizer, we achieve similar results for networks without skip connections. Our results apply for a large variety of activation functions, including those which traditionally perform very badly, such as the logistic sigmoid. In addition to DKS, we contribute a detailed analysis of skip connections, normalization layers, special activation functions like RELU and SELU, and various initialization schemes, explaining their effectiveness as alternative (and ultimately incomplete) ways of “shaping” the network’s initialization-time kernel.

Guido Montúfar
Title: Memoryless policy optimization in POMDPs
short Abstract

We consider the problem of optimizing the expected long term reward in a Partially Observable Markov Decision Process over the set of memoryless stochastic policies. In this talk I will discuss the properties of the objective function, in particular the existence of policy improvement cones and optimizers in low-dimensional subsets of the search space. Then I will discuss how the problem can be formulated as the optimization of a linear function over a constrained set of state-action frequencies and present descriptions of the parametrization and the constraints, which allows us to estimate the number of critical points and formulate optimization strategies in state-action space. The talk is based on works with Johannes Rauh and Nihat Ay and recent works with Johannes Müller.

Masafumi Oizumi
Title: Unified framework for quantifying causal influences based on information geometry
short Abstract

Assessment of causal influences is a ubiquitous and important subject across diverse research fields. Whereas pairwise causal influences between elements can be easily quantified, quantifying multiple influences among many elements poses two major mathematical difficulties. First, overestimation occurs due to interdependence among influences if each influence is separately quantified in a part-based manner and then simply summed over. Second, it is difficult to isolate causal influences while avoiding noncausal confounding influences. To resolve these difficulties, we propose a theoretical framework based on information geometry for the quantification of multiple causal influences with a holistic approach. In this framework, we quantify causal influences as the divergence between the actual probability
distribution of a system and a constrained probability distribution where causal influences among elements are statistically disconnected. This framework provides intuitive geometric interpretations harmonizing various information theoretic measures in a unified manner, including mutual information (predictive information), transfer entropy, stochastic interaction, and integrated information, each of which is characterized by how causal influences are disconnected. Our framework should help to analyze causal relationships in complex systems in a complete and hierarchical manner.

Gabriel Peyré
Title: Scaling Optimal Transport for High dimensional Learning
short Abstract

Optimal transport (OT) has recently gained lot of interest in machine learning. It is a natural tool to compare in a geometrically faithful way probability distributions. It finds applications in both supervised learning (using geometric loss functions) and unsupervised learning (to perform generative model fitting). OT is however plagued by the curse of dimensionality, since it might require a number of samples which grows exponentially with the dimension. In this talk, I will explain how to leverage entropic regularization methods to define computationally efficient loss functions, approximating OT with a better sample complexity. More information and references can be found on the website of our book ” Computational Optimal Transport“.

Minh Ha Quang
Title: Rényi divergences in RKHS and Gaussian process settings
short Abstract

Rényi divergences, including in particular its special case the Kullback-Leibler divergence, play an important role in numerous problems in statistics, probability, and machine learning. In this talk, we present their regularized versions in the reproducing kernel Hilbert space (RKHS) and Gaussian process settings. These are formulated using the Alpha Log-Det divergences on the Hilbert manifold of positive definite Hilbert-Schmidt operators on a Hilbert space. We show that these infinite-dimensional divergences can be consistently estimated from finite sample data, with dimension-independent convergence rates. The theoretical formulations will be illustrated with applications in functional data analysis.

Leonard Wong
Title: Logarithmic divergences
short Abstract

Divergences such as Bregman and Kullback-Leibler divergences are fundamental in probability, statistics and machine learning. We begin by explaining how divergences arise naturally from the geometry of optimal transport. Then, we study a family of logarithmic costs – originally motivated by financial applications – which may be regarded as a canonical deformation of the negative dot product in Euclidean quadratic transport. It induces a logarithmic divergence which has remarkable probabilistic and geometric properties. As an application, we introduce a generalization of continuous-time mirror descent.

 

 

 

Nihat Ay (Hamburg University of Technology, Germany)
Shinto Eguchi (The Institute of Statistical Mathematics, Japan)
Hiroshi Matsuzoe (Nagoya Institute of Technology, Japan)
Noboru Murata (Waseda University, Japan)
Frank Nielsen (Sony CSL, Japan)
Jun Zhang (University of Michigan, USA)

The conference will be fully virtually!

Hamburg University of Technology

Blohmstraße 15
21079 Hamburg
Germany

 

The conference will be organised by the Institute for Data Science Foundations led by Nihat Ay (nihat.ay@tuhh.de).
The administrative contact is Sandra Krüger (ig4ds@tuhh.de).

The conference will be fully virtually!

 

Important notes:

  • The conference will be fully virtually.
  • The conference fee is 60 EUR.
  • The fee includes all taxes.
  • The payment can be made by bank transfer only.
  • The registration period is from January 31st, 2022 to July 24th, 2022.
  • In case of acceptance the confirmation/invoice will be sent via email until August 12th, 2022.

By registering for the conference you accept the Terms and Conditions, the Declaration of Consent and Declaration of Consent for Filming and Photography
Please register here.

 

Scroll to Top