Fork me on GitHub

The brain has about 1014 synapses and we only live for about 109 seconds. So we have a lot more parameters than data. This motivates the idea that we must do a lot of unsupervised learning since the perceptual input (including proprioception) is the only place we can get 105 dimensions of constraint per second.
(Geoffrey Hinton)

Read more »

Softmax encounters large computing cost when the output vocabulary size is very large. Some feasible approaches will be explained under the circumstance of skip-gram pretraining task.

Read more »