Yekun's Note

Inductive Positions in Transformers

Posted on 2023-01-26 Edited on 2023-09-18 In Transformer Disqus:

We summarize the positional encoding approaches in transformers.

Summary

PE	Relative	Trainable	Each Layer	Extrapolation
Sinusoidal	✘	✘	✘	✘
T5 bias	✔	✔	✔	✔
RoPE	✔	✔	✔	✘
ALiBi	✔	✘	✔	✔
KERPLE	✔	✔	✔	✔
Sandwich	✔	✘	✔	✔
xPos	✔	✘	✔	✔

Diffusion Models: A Mathematical Note from Scratch

Posted on 2022-12-12 Edited on 2023-03-07 In Diffusion Models , ML Disqus:

A diffusion probabilistic model is a parameterized Markov chain trained to reverse a predefined forward process, closely related to both likelihood-based optimization and score matching. The forward diffusion process is a stochastic process constructed to gradually corrupt the original data into random nose.

Large Language Models for Programming Languages

Posted on 2022-05-13 Edited on 2023-07-21 Disqus:

A note of code pre-trained language models (PLMs).

Mask Denoising Strategy for Pre-trained Language Models

Posted on 2022-01-10 Edited on 2023-04-07 In LLM , Pre-training Disqus:

Mask modeling is a crucial role in pre-training language models. This note provides a short summary.

Subword Tokenization in Natural Language Processing

Posted on 2021-11-29 Edited on 2023-09-15 In LLM , Tokenization Disqus:

Summary of word tokenization in natural language processing.

Scaling Up Large Language Models: A Summary

Posted on 2021-10-09 Edited on 2023-04-07 In LLM , Scaling Disqus:

A summary of large language models (LLMs) on a large scale (beyond 10B).

Sequence GANs in a Nutshell

Posted on 2020-08-30 Edited on 2021-01-01 In NLP , NLG , GAN Disqus:

Background: Conventional maximum likelihood approaches for sequence generation with teacher forcing algorithms are inherently prone to exposure bias at the inference stage due to the training-testing discrepancy—the generator produces a sequence iteratively conditioned on its previously predicted ones that may be never observed during training—leading to accumulative mismatch with the increment of generated sequences. In other words, the model is only trained on demonstrated behaviors (real data samples) but not free-running mode.
Generative Adversarial Networks (GANs) hold the promise of mitigating such issues for generating discrete sequences, such as language modeling, speech/music generation, etc.

Automatic Evaluation Metrics for Language Generation

Posted on 2020-06-05 In NLP , NLG , NLG Evaluation Disqus:

A summary of the automatic evaluation metric for natural language generation (NLG) applications.

The human evaluation considers the aspects of adequacy, fidelity, and fluency, but it is quite expensive.

Adequacy: Does the output convey the same meaning as the input sentence? Is part of the message lost, added, or distorted?
Fluency: Is the output good fluent English? This involves both grammatical correctness and idiomatic word choices.

Thus, a useful metric for automatic evaluation in NLG applications holds the promise, such as machine translation, text summarization, image captioning, dialogue generation, poetry/story generation, etc.