# YiDa.Xu ML notes

# Infinity in Deep Learning

Detailed derivation of neural networks as (1) Gaussian Process using central Limit theorem (2) Neural Tangent Kernel (NTK)

Discuss Neural ODE and in particular the use of adjoint equation in Parameter training

# Sinovasinovation DeeCamp

properties of Softmax, Estimating softmax without compute denominator, Probability re-parameterization: Gumbel-Max trick and REBAR algorithm

Expectation-Maximization & Matrix Capsule Networks; Determinantal Point Process & Neural Networks compression; Kalman Filter & LSTM; Model estimation & Binary classifier

# Video Tutorial to these notes

- I recorded about 20% of these notes in videos in 2015 in Mandarin (all my notes and writings are in English) You may find them on Youtube and bilibili and Youku

# 3D Geometry Computer vision

### 3D Geometry Fundamentals

Camera Models, Intrinsic and Extrinsic parameter estimation, Epipolar Geometry, 3D reconstruction, Depth Estimation

### Recent Deep 3D Geometry based Research

Recent research of the following topics: Single image to Camera Model estimation, Multi-Person 3D pose estimation from multi-view, GAN-based 3D pose estimation, Deep Structure-from-Motion, Deep Learning based Depth Estimation

# Deep Learning

### New Research on Softmax function

Out-of-distribution, Neural Network Calibration, Gumbel-Max trick, Stochastic Beams Search (some of these lectures overlap with DeeCamp2019)

### Optimisation methods

Optimisation methods in general. not limited to just Deep Learning

### Neural Networks

basic neural networks and multilayer perceptron

### Convolution Neural Networks: from basic to recent Research

detailed explanation of CNN, various Loss function, Centre Loss, contrastive Loss, Residual Networks, Capsule Networks, YOLO, SSD

### Word Embeddings

Word2Vec, skip-gram, GloVe, Fasttext

### Deep Natural Language Processing

RNN, LSTM, Seq2Seq with Attenion, Beam search, Attention is all you need, Convolution Seq2Seq, Pointer Networks

### Mathematics for Generative Adversarial Networks

How GAN works, Traditional GAN, Mathematics on W-GAN, Duality and KKT conditions, Info-GAN, Bayesian GAN

### Restricted Boltzmann Machine

basic knowledge in Restricted Boltzmann Machine (RBM)

# Reinforcement Learning

### Reinforcement Learning Basics

basic knowledge in reinforcement learning, Markov Decision Process, Bellman Equation and move onto Deep Q-Learning

### Monto Carlo Tree Search

Monto Carlo Tree Search, alphaGo learning algorithm

### Policy Gradient

Policy Gradient Theorem, Mathematics on Trusted Region Optimization in RL, Natural Gradients on TRPO, Proximal Policy Optimization (PPO), Conjugate Gradient Algorithm

# Data Science

### 30 minutes introduction to AI and Machine Learning

An extremely gentle 30 minutes introduction to AI and Machine Learning. Thanks to my PhD student Haodong Chang for assist editing

### Regression methods

Classification: Logistic and Softmax; Regression: Linear, polynomial; Mix Effect model

**[costFunction.m]**and**[soft_max.m]**

### Recommendation system

collaborative filtering, Factorization Machines, Non-Negative Matrix factorisation, Multiplicative Update Rule

### Dimension Reduction

classic PCA and t-SNE

### Introduction to Data Analytics and associate Jupyter notebook

Supervised vs Unsupervised Learning, Classification accuracy

# Probability and Statistics Background

### Bayesian model

revision on Bayes model include Bayesian predictive model, conditional expectation

### Probabilistic Estimation

some useful distributions, conjugacy, MLE, MAP, Exponential family and natural parameters

### Statistics Properties

useful statistical properties to help us prove things, include Chebyshev and Markov inequality

# Probabilistic Model

### Expectation Maximisation

Proof of convergence for E-M, examples of E-M through Gaussian Mixture Model,

**[gmm_demo.m]**and**[kmeans_demo.m]**and**[bilibili video]**

### State Space Model (Dynamic model)

explain in detail of Kalman Filter

**[bilibili video]**,**[kalman_demo.m]**and Hidden Markov Model**[bilibili video]**

# Inference

### Variational Inference

explain Variational Bayes both the non-exponential and exponential family distribution plus stochastic variational inference.

**[vb_normal_gamma.m]**and**[bilibili video]**

### Stochastic Matrices

stochastic matrix, Power Method Convergence Theorem, detailed balance and PageRank algorithm

### Introduction to Monte Carlo

inverse CDF, rejection, adaptive rejection, importance sampling

**[adaptive_rejection_sampling.m]**and**[hybrid_gmm.m]**

### Markov Chain Monte Carlo

M-H, Gibbs, Slice Sampling, Elliptical Slice sampling, Swendesen-Wang, demonstrate collapsed Gibbs using LDA

**[lda_gibbs_example.m]**and**[test_autocorrelation.m]**and**[gibbs.m]**and**[bilibili video]**

### Particle Filter (Sequential Monte-Carlo)

Sequential Monte-Carlo, Condensational Filter algorithm, Auxiliary Particle Filter

**[bilibili video]**

# Advanced Probabilistic Model

### Bayesian Non Parametrics (BNP) and its inference basics

Dircihlet Process (DP), Chinese Restaurant Process insights, Slice sampling for DP

**[dirichlet_process.m]**and**[bilibili video]**and**[Jupyter Notebook]**

### Bayesian Non Parametrics (BNP) extensions

Hierarchical DP, HDP-HMM, Indian Buffet Process (IBP)

### Completely Random Measure (early draft – written in 2015)

Levy-Khintchine representation, Compound Poisson Process, Gamma Process, Negative Binomial Process

### Sample correlated integers from HDP and Copula

This is an alternative explanation to our IJCAI 2016 papers. The derivations are different from the paper, but portraits the same story.

### Determinantal Point Process

explain the details of DPP’s marginal distribution, L-ensemble, its sampling strategy, our work in time-varying DPP

## Join the discussion