When we check and filter out the duplicates for a web crawler, bloom filter is a good choice to curtail the memory cost. Here is a brief introduction.

## Likelihood-based Generative Models II: Flow Models

Flow models are used to learn **continuous data**.

## Likelihood-based Generative Models I: Autoregressive Models

The brain has about 10

^{14}synapses and we only live for about 10^{9}seconds. So we have a lot more parameters than data. This motivates the idea that we must do a lot of unsupervised learning since the perceptual input (including proprioception) is the only place we can get 10^{5}dimensions of constraint per second.(Geoffrey Hinton)

## Neural Network Tricks

Techniques of NN training. Keep updating.

## BERTology: An Introduction!

This is an introduction of recent BERT families.

## Efficient Softmax Explained

`Softmax`

encounters large computing cost when the output vocabulary size is very large. Some feasible approaches will be explained under the circumstance of skip-gram pretraining task.

## Relational Reasoning Networks

Reasoning the relations between objects and their properties is a hallmark of intelligence. Here are some notes about the relational reasoning neural networks.

## Transformer Variants: A Peek

This is an introduction of variant Transformers.^{[1]}

## Counting the Number of Parameters in Deep Learning

Calculate the # of trainable parameters by hand.

## Dynamic Programming in NLP

**Dynamic Programming** (DP) is ubiquitous in NLP, such as Minimum Edit Distance, Viterbi Decoding, forward/backward algorithm, CKY algorithm, etc.