Graph Neural Networks (GNNs) has demonstrated efficacy on non-Euclidean data, such as social media, bioinformatics, etc.

Image source: [1]

# Related background

## Convolution

Denoting $g$ as the filter on $f$, the convolution:

## Fourier transformation

• Fourier transformation:
• Inverse Fourier transformation:

• The convolution can be rewritten as:

# Spectral methods

## Graph Convolutional Networks (GCN)

GCN[2] applies convolutions in the Fourier domain by computing the eigendecomposition of the graph Lapacian [3] using the first-order approximation. When applying convolution on the signal $x$ with the filter $g_\theta = \text{diag}(\theta)$ parameterized by $\theta \in \mathbb{R}^N$ in the Fourier domain, i.e.,

where $U$ is the matrix of eigenvectors of the normalized graph Laplacian $L = I_N - D^{-1/2} A D^{-1/2} = U \Lambda U^\top$

Let $\tilde{A} = A + I_N, \tilde{D}_{ii} = \sum_{j} \tilde{A}_{ii}$, we get:

• spectral convolution can only be applied on undirected graphs, assuring that $L$ is a symmetric matrix.

# Non-spectral methods

## GraphSAGE

GraphSAGE is one of the archetype of non-spatial approaches. It samples from the neighborhood in the graph and aggregates the feature information from its local neighborhood.[4]

## Graph Attention Networks (GAT)

Notation:

• Input (node features): $\mathbf{h}= \{ \overrightarrow{h}_1, \overrightarrow{h}_2, \cdots, \overrightarrow{h}_N\}, \overrightarrow{h}_i \in \mathbb{R}^F$, where $N$ is the # of nodes, $F$ is the # of features in each node.
• Output: $\mathbf{h}^\prime= \{ \overrightarrow{h}_1, \overrightarrow{h}_2, \cdots, \overrightarrow{h}_N\}, \overrightarrow{h}_i \in \mathbb{R}^{F^\prime}$, where $F^\prime$ is the output dimension.
1. Pass inputs $\mathbf{h}$ into a learnable shared linear transformation, parameterized by a weight matrix $\mathbf{W} \in \mathbb{R}^{F^\prime \times F}$.
2. Perform self-attention on the nodes
• compute the attention coefficients using a shared attention mechanism $a: \mathbf{R}^{F^\prime \times F} \rightarrow \mathbf{R}$:
• Compute $e_{ij}$ for nodes $j \in \mathcal{N}_i$,where $\mathcal{N}_i$ is the neighborhood of node $i$ (including $i$) in the graph. Let $\Vert$ denote concatenation,
3. Multi-head attention. The final output dimension is $\mathbf{R}^{KF^\prime}$, where $k$ is the attention number.
4. For the final layer, GAT aggregates neighbor nodes by averaging different node representations:

Source code (TensorFlow)

# References

1. 1.Zhou, J., Cui, G., Zhang, Z., Yang, C., Liu, Z., & Sun, M. (2018). Graph Neural Networks: A Review of Methods and Applications. ArXiv, abs/1812.08434.
2. 2.Kipf, T., & Welling, M. (2016). Semi-Supervised Classification with Graph Convolutional Networks. ArXiv, abs/1609.02907.
3. 3.
4. 4.Hamilton, W.L., Ying, Z., & Leskovec, J. (2017). Inductive Representation Learning on Large Graphs. NIPS.
5. 5.Velickovic, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., & Bengio, Y. (2018). Graph Attention Networks. ICLR
6. 6.Yun, S., Jeong, M., Kim, R., Kang, J., & Kim, H.J. (2019). Graph Transformer Networks. NeurIPS.