Reasoning the relations between objects and their properties is a hallmark of intelligence. Here are some notes about the relational reasoning neural networks.

# Relation Network

## Relation Networks (RNs)

Relation Networks(RNs)^{[1]} adopt the functional form of a neural network for relational reasoning. RNs consider the potential relations beween all object pairs.

“RNs learn to infer the existence and implications of object relations.” (Santoro et. al, 2017)^{[1]}

where $\color{red}{\pmb{a}}$ is the aggregation function.

When we take it as **summation**, the simplest form is:

where

- the input is Objects
- and are two MLPs with parameters $\phi$ and $\theta$. The same MLP operates on all possible pairs. captures the representation of pair-wise relations, and integrates information about all pairs.

The **summation** in RN equations indicating the **order (permutation) invariance** of the object set. max and average pooling can be used instead.

^{[2]}

^{[1]}

## Wild Relation Network (WReN)

Wild Relation Network (WReN) do RN module multiple times to infer the inter-pannel relationships.^{[3]} Afterward, pass all scores to a softmax function.

^{[3]}

## Visual Interaction Network(VIN)

Visual Interaction Network(VIN) adopts ConvNets to encoder images. Two consecutive input frames are convolved into a state code.

Afterward, employ RN in its **Interaction Net**(IN).

- For each slot, RN is applied to the slot’s concatenation with each other slot.
- Then a self-dynamics net is applied to the slot itself.
- FInally sum all the results and produce the output.

## Relational Memory Core(RMC)

Relational Memory Core(RMC)^{[4]} assembles LSTMs and non-local networks(i.e. Transformer).

- Encoding new memories

Let matrix $M$ denote stored memories with row-wise memories . RMC apply multi-head dot product attention(MHDPA) to allow memories interacting with others. $\color{green}{[M;x]}$ include**memories**and**new observations**. The output size of $\tilde{M}$ is the same as $M$.

- Introducing recurrence into variant LSTM

where is a row/memory-wise MLP with layer normalization.^{[4]}

# recurrent Memory, Attention and Composition (MAC)

The MAC recurrent cell consists of control unit, read unit and write unit.

- control unit: attends to different parts of the task question (question)
- read unit: extacts information out of knowledge base (image in VQA task)
- write unit: integrates the retrieved information into the memory state

Input:

- concat the last states of bi-LSTM on textual questions as $\pmb{q}$
- convolve image as the knowledge base

## Control unit

Given the contextual question word , the question representation , the previous control state .

- Concat and and feed into a FFNN.
- Measure the similarity between and each question word ; then use a softmax layer to normalize the weights, aquiring attention distribution.
- Weighted averaging the question context words, and get current control state

## Read unit

- Interact between the knowledge-based element and memory , get
- concat and feed into a dense layer
- compute attention distribution over the knowledge base and finally do weighted average.

## Write unit

## Output unit

Concat $q$ and , then pass 2-layer FFCC followed by a softmax function.

# References

- 1.Santoro, A., Raposo, D., Barrett, D. G., Malinowski, M., Pascanu, R., Battaglia, P., & Lillicrap, T. (2017). A simple neural network module for relational reasoning. In Advances in neural information processing systems (pp. 4967-4976). ↩
- 2.Raposo, D., Santoro, A., Barrett, D., Pascanu, R., Lillicrap, T., & Battaglia, P. (2017). Discovering objects and their relations from entangled scene representations. arXiv preprint arXiv:1702.05068. ↩
- 3.Barrett, D. G., Hill, F., Santoro, A., Morcos, A. S., & Lillicrap, T. (2018). Measuring abstract reasoning in neural networks. arXiv preprint arXiv:1807.04225. ↩
- 4.Santoro, A., Faulkner, R., Raposo, D., Rae, J., Chrzanowski, M., Weber, T., ... & Lillicrap, T. (2018). Relational recurrent neural networks. In Advances in Neural Information Processing Systems (pp. 7299-7310). ↩
- 5.Hudson, D. A., & Manning, C. D. (2018). Compositional attention networks for machine reasoning. arXiv preprint arXiv:1803.03067. ↩