Lecture 1: The Curse of Dimensionality
Main References

“Breaking the curse of dimensionality with convex neural networks”, F. Bach

“Understanding Machine Learning: From Theory to Algorithms”, S. ShalevSwartz, BenDavid

“Nesterov Punctuated Equilibrium”, argmin post by Frostig & Recht

“Failures of GradientBased Deep Learning”, S. ShalevShwartz et al.
Further References

“EQUIVALENCE OF DISTANCEBASED AND RKHSBASED STATISTICS IN HYPOTHESIS TESTING”, Sejdinovic et al

“Random GradientFree Minimization of Convex Functions”, Y.Nesterov
Lecture 2: Geometric Stability in Euclidean Domains.
Main References:

Group Invariant Scattering, S. Mallat

Invariant Scattering Convolutional Networks, J.Bruna, S. Mallat
Lecture 3: The Scattering Transform and Beyond
Main References:

Group Invariant Scattering, S. Mallat

Scattering Representations for Recognition, J.Bruna PhD Thesis.

Rotation, Scaling and Deformation Invariant Scattering for Texture Discrimination, Sifre and Mallat, CVPR15.
Further References:

Exponential Decay of Scattering Coefficients, I. Waldspurger.

Analysis of TimeFrequency Scattering Transforms, Czaja and Li.

Energy Propagation in Deep Convolutional Neural Networks, Wiatowski et al.
Lecture 4: NonEuclidean Geometric Stability and Graph Neural Networks
Main References:
 Geometric Deep Learning: Going beyond Euclidean Data, M. Bronstein et al, 17
Further References:

iRevNet: Deep Invertible Networks, Jacobsen, Smeulders, Oyallon, ICLR’18

Spherical CNNs, Cohen, Welling et al, ICLR’18

Deep Image Prior, Ulyanov, Vedaldi et al,’17

Community Detection with Graph Neural Networks, B. and Li’18
Lecture 5: Graph Neural Network Applications
Main References:

Geometric Deep Learning: Going beyond Euclidean Data, M. Bronstein et al, 17

Community Detection with Graph Neural Networks, B. and Li’18

Neural Message Passing for Quantum Chemistry, Gilmer et al.17
Further References:

SemiSupervised Classification with Graph Convolutional Networks, Kipf & Welling

Representation Learning on Graphs: Methods and Applications, Hamilton, Ying and Leskovec

Quadratic Assignment with Graph Neural Networks, Nowak et al
Lecture 6: Unsupervised Learning under Geometric Priors
Main References:
Further References:
Lecture 7: Discrete vs Continuous Time Optimization: The Convex Case
Main References:

Largescale Machine Learning and convex optimization, F. Bach, 17

A differential Equation for Modelling Nesterov’s Accelerated Gradient Method, Su, Boyd, Candes, ‘14
Further References
Lecture 8: Discrete vs Continuous Time Optimization: Stochastic and Nonconvex case
Main References:
Lecture 9: Discrete vs Continuous Time Optimization: Stochastic and Nonconvex case
Main References:
Lecture 10: Nonconvex Optimization
Main References:

Gradient Descent can take exponential time to escape saddle points, Lee et al.’17

Escaping from Saddle points– online stochastic gradient for tensor decomposition, Ge et al.’15
Further References
Lecture 11: Landscape of Optimization
Main References:

Random Matrices and the complexity of Spin Glasses, Auffinger et al’10

Neural Networks with Finite Intrinsic Dimension have no Spurious Valleys, Venturi et al.

Topology and Geometry of HalfRectified Network Optimization, Freeman et al
Further References
Lecture 12: Guest Lecture Behnam Neyshabur (IAS/NYU): Generalization in Deep Learning
Main References:
 Understanding Machine Learning: From Theory to Algorithms. Shai ShalevShwartz and Shai BenDavid. Cambridge University Press, 2014: Part I (Foundations) and Part IV (Advanced Theory).
Further References:

Implicit Regularization in Deep Learning. Behnam Neyshabur. PhD Thesis, 2017. Part I (Implicit Regularization and Generalization)
Lecture 13: Landscape of Optimization of Deep Neural Networks. Positive and Negative Results
Main References:

Neural Networks with Finite Intrinsic Dimension have no Spurious Valleys, Venturi et al.

A Critical View of Global Optimiality in Deep Learning, Yun et al.’18

Are Resnets provably better than Linear Predictors?, Shamir’18