Topics in Deep Learning
This topics course aims to present the mathematical, statistical and computational challenges of building stable representations for highdimensional data, such as images, text and audio. We will delve into selected topics of Deep Learning, discussing recent models from both supervised and unsupervised learning. Special emphasis will be on convolutional architectures, invariance learning, unsupervised learning and nonconvex optimization.
Detailed Syllabus and Lectures
Lec1: Intro and Logistics

Lec2: Representations for Recognition : stability, variability. Kernel approaches / Feature extraction.
 Elements of Statistical Learning, chapt. 12, Hastie, Tibshirani, Friedman.

Lec3: Groups, Invariants and Filters.

Lec4: Scattering Convolutional Networks.
further reading

Lec5: Further Scattering: Properties and Extensions.

Lec6: Convolutional Neural Networks: Geometry and first Properties.
 Deep Learning Y. LeCun, Bengio & Hinton.
 Understanding Deep Convolutional Networks, S. Mallat.

Lec7: Properties of learnt CNN representations: Covariance and Invariance, redundancy, invertibility.
 Deep Neural Networks with Random Gaussian Weights: A universal Classification Strategy?, R. Giryes, G. Sapiro, A. Bronstein.
 Intriguing Properties of Neural Networks C. Szegedy et al.
 Geodesics of Learnt Representations O. Henaff & E. Simoncelli.
 Inverting Visual Representations with Convolutional Networks, A. Dosovitskiy, T. Brox.
 Visualizing and Understanding Convolutional Networks M. Zeiler, R. Fergus.

Lec8: Connections with other models (Dict. Learning, Random Forests)
 Proximal Splitting Methods in Signal Processing Combettes & Pesquet.
 A Fast Iterative ShrinkageThresholding Algorithm for Linear Inverse Problems Beck & Teboulle
 Learning Fast Approximations of Sparse Coding K. Gregor & Y. LeCun
 Task Driven Dictionary Learning J. Mairal, F. Bach, J. Ponce
 Exploiting Generative Models in Discriminative Classifiers T. Jaakkola & D. Haussler
 Improving the Fisher Kernel for LargeScale Image Classification F. Perronnin et al.
 NetVLAD R. Arandjelovic et al.

Lec9: Other high level tasks: localization, regression, embedding, inverse problems.
 Object Detection with Discriminatively Trained Deformable Parts Model Felzenswalb, Girshick, McAllester and Ramanan, PAMI'10
 Deformable Parts Models are Convolutional Neural Networks, Girshick, Iandola, Darrel and Malik, CVPR'15.
 Rich Feature Hierarchies for accurate object detection and semantic segmentation Girshick, Donahue, Darrel and Malik, PAMI'14.
 Graphical Models, messagepassing algorithms and convex optimization M. Wainwright.
 Conditional Random Fields as Recurrent Neural Networks Zheng et al, ICCV'15
 Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation Tompson, Jain, LeCun and Bregler, NIPS'14.

Lec10: Extensions to nonEuclidean domain. Representations of stationary processes. Properties.
 Dimensionality Reduction by Learning an Invariant Mapping Hadsell, Chopra, LeCun,'06.
 Deep Metric Learning via Lifted Structured Feature Embedding Oh Song, Xiang, Jegelka, Savarese,'15.
 Spectral Networks and Locally Connected Networks on Graphs Bruna, Szlam, Zaremba, LeCun,'14.
 Spatial Transformer Networks Jaderberg, Simonyan, Zisserman, Kavukcuoglu,'15.
 Intermittent Process Analysis with Scattering Moments Bruna, Mallat, Bacry, Muzy,'14.
Lec11: Guest Lecture ( W. Zaremba, OpenAI ) Discrete Neural Turing Machines.

Lec12: Representations of Stationary Processes (contd). Sequential Data: Recurrent Neural Networks.
 Intermittent Process Analysis with Scattering Moments J.B., Mallat, Bacry and Muzy, Annals of Statistics,'13.
 A mathematical motivation for complexvalued convolutional networks Tygert et al., Neural Computation'16.
 Texture Synthesis Using Convolutional Neural Networks Gatys, Ecker, Betghe, NIPS'15.
 A Neural Algorithm of Artistic Style, Gatys, Ecker, Betghe, '15.
 Time Series Analysis and its Applications Shumway, Stoffer, Chapter 6.
 Deep Learning Goodfellow, Bengio, Courville,'16. Chapter 10.

Lec13: Recurrent Neural Networks (contd). Long Short Term Memory. Applications.
 Deep Learning Goodfellow, Bengio, Courville,'16. Chapter 10.
 Generating Sequences with Recurrent Neural Networks A. Graves.
 The Unreasonable Effectiveness of Recurrent Neural Networks A. Karpathy
 The Unreasonable effectiveness of Characterlevel Language Models Y. Goldberg

Lec14: Unsupervised Learning: Curse of dimensionality, Density estimation. Graphical Models, Latent Variable models.
 Describing Multimedia Content Using Attentionbased EncoderDecoder Networks K. Cho, A. Courville, Y. Bengio
 Graphical Models, Exponential Families and Variational Inference M. Wainwright, M. Jordan.

Lec15: Autoencoders. Variational Inference. Variational Autoencoders.
 Graphical Models, Exponential Families and Variational Inference, chapter 3 M. Wainwright, M. Jordan.
 Variational Inference with Stochastic Search J.Paisley, D. Blei, M.Jordan.
 Stochastic Variational Inference M. Hoffman, D. Blei, Wang, Paisley.
 AutoEncoding Variational Bayes, Kingma & Welling.
 Stochastic Backpropagation and variational inference in deep latent gaussian models D. Rezende, S. Mohamed, D. Wierstra.

Lec16: Variational Autoencoders (contd). Normalizing Flows. Generative Adversarial Networks.
 Semisupervised learning with Deep generative models Kingma, Rezende, Mohamed, Welling.
 Importance Weighted Autoencoders Burda, Grosse, Salakhutdinov.
 Variational Inference with Normalizing Flows Rezende, Mohamed.
 Unsupervised Learning using Nonequilibrium Thermodynamics SohlDickstein et al.
 Generative Adversarial Networks, Goodfellow et al.

Lec17: Generative Adversarial Networks (contd).
 Generative Adversarial Networks, Goodfellow et al.
 Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks Denton, Chintala, Szlam, Fergus.
 Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks Radford, Metz, Chintala.

Lec18: Maximum Entropy Distributions. Selfsupervised models (analogies, video prediction, text, word2vec).
 Graphical Models, Exponential Families and Variational Inference, chapter 3 M. Wainwright, M. Jordan.
 An Introduction to MCMC for Machine Learning Andrieu, de Freitas, Doucet, Jordan.
 Stochastic relaxation, Gibbs distributions and the Bayesian Restoration of Images Geman & Geman.
 Distributed Representations of Words and Phrases and their compositionality Mikolov et al.
 word2vec Explained: deriving Mikolov et al's negativesampling embedding method Goldberg & Levy.

Lec19: Selfsupervised models (contd). Nonconvex Optimization. Stochastic Optimization.
 Pixel Recurrent Neural Networks A. van den Oord, N. Kalchbrenner, K. Kavukcuoglu.
 The tradeoffs of Large Scale Learning Bottou, Bousquet.
 Introduction to Statistical Learning Theory Bousquet, Boucheron, Lugosi.
Lec20: Guest Lecture (S. Chintala, Facebook AI Research), "The Adversarial Network Nonsense".

Lec21: Accelerated Gradient Descent, Regularization, Dropout.
 Convex Optimization: Algorithms and Complexity S. Bubeck
 Optimization, Simons Big Data Boot Camp B. Recht
 The Zen of Gradient Descent M. Hardt.
 Train Faster, Generalize Better: Stability of Stochastic Gradient Descent M. Hardt, B. Recht, Y. Singer.
 Dropout: a simple way to prevent neural networks from Overfitting Srivastava, Hinton et al.

Lec22: Dropout (contd). Batch Normalization, Tensor Decompositions.
 Dropout Training as Adaptive Regularization Wager, Wang, Liang.
 Batch Normalization: accelerating Deep Network Training by Reducing internal covariate shift Ioffe, Szegedy.
 Global Optimality in Tensor Factorization, Deep Learning and Beyond Haefflele, Vidal.
Lec23 Guest Lecture (Yann Dauphin, Facebook AI Research), "Optimizing Deep Nets".

Lec24: Tensor Decompositions (contd), Spin Glasses.
 On the expressive power of Deep Learning: a tensor analysis Cohen, Sharir, Shashua.
 Beating the Perils of nonconvexity: Guaranteed Training of Neural Networks using Tensor methods Janzamin, Sedghi, Anandkumar.
 The Loss Surfaces of Multilayer Networks Choromaska, Henaff, Mathieu, Ben Arous, LeCun.