# InfoGCL: Information-Aware Graph Contrastive Learning

# 1. 简介

## 1.1 摘要

Various graph contrastive learning models have been proposed to improve the performance of learning tasks on graph datasets in recent years. While effective and prevalent, these models are usually carefully customized. In particular, although all recent researches create two contrastive views, they differ greatly in view augmentations, architectures, and objectives. It remains an open question how to build your graph contrastive learning model from scratch for particular graph learning tasks and datasets. In this work, we aim to fill this gap by studying how graph information is transformed and transferred during the contrastive learning process and proposing an information-aware graph contrastive learning framework called InfoGCL. The key point of this framework is to follow the Information Bottleneck principle to reduce the mutual information between contrastive parts while keeping task relevant information intact at both the levels of the individual module and the entire framework so that the information loss during graph representation learning can be minimized. We show for the first time that all recent graph contrastive learning methods can be unified by our framework. We empirically validate our theoretical analysis on both node and graph classification benchmark datasets, and demonstrate that our algorithm significantly outperforms the state-of-the-arts.

# 2. 方法

## 2.1 数据增强

Corollary 1. (Optimal Augmented Views) For a downstream task $T$ whose goal is to predict a semantic label $y$, the optimal views, $v_i^{*}$ , $v_j^{*}$, generated from the input graph G are the solutions to the following optimization problem :

\begin{align} \left(\mathbf{v}_{i}^{*}, \mathbf{v}_{j}^{*}\right)=& \underset{\mathbf{v}_{i}, \mathbf{v}_{j}}{\arg \min } I\left(\mathbf{v}_{i} ; \mathbf{v}_{j}\right)\\ \text { s.t. } & I\left(\mathbf{v}_{i} ; y\right)=I\left(\mathbf{v}_{j} ; y\right) \\ & I\left(\mathbf{v}_{i} ; y\right)=I(\mathcal{G} ; y) \end{align}

## 2.2 编码器

GCL中编码器的作用就是学习两个视角的节点或图嵌入，可以选择的类型有很多，比如GCN、GAT、GIN等等。同样地，为了选取最优编码器，作者提出了一个“optimal view encoder”推论。

Corollary 2. (Optimal View Encoder) Given the optimal views, $v^∗_i$, $v_j^x$ , for a downstream task $T$ whose goal is to predict a semantic label $y$, the optimal view encoder for view $v^∗_i$ is the solution to the following optimization problem:

$\begin{gathered} f_{i}^{*}=\underset{f_{i}}{\arg \min } I\left(f_{i}\left(\mathbf{v}_{i}^{*}\right) ; \mathbf{v}_{i}^{*}\right) \\ \text { s.t. } I\left(f_{i}\left(\mathbf{v}_{i}^{*}\right) ; \mathbf{v}_{j}^{*}\right)=I\left(\mathbf{v}_{i}^{*} ; \mathbf{v}_{j}^{*}\right) \end{gathered}$

## 2.3 对比模式

Corollary 3. (Optimal Contrastive Mode) Given the latent representations, $z^∗_i$ , $z^∗_j$, extracted by the optimal view encoders, i.e., $z^∗_i = f^∗_i (v^∗_i )$, $z^∗_j = f^∗_j (v^∗_j)$ , and a downstream task T with label y, the optimal contrastive mode is the solution to the following optimization problem, where $c_i$, $c_j$ are the aggregation operations applied to the latent representations:

$\left(c_{i}^{*}, c_{j}^{*}\right)=\underset{\left(c_{i}, c_{j}\right)}{\arg \min }-I\left(c_{i}\left(\mathbf{z}_{i}^{*}\right) ; c_{j}\left(\mathbf{z}_{j}^{*}\right)\right)$

## 2.4 InfoGCL

• 命题1： 给定任务$T$，标签为$y$，以及一系列增强策略$\{q_1(\cdot),q_2(\cdot),\cdot\cdot\cdot\}$$\mathbf v_i,\mathbf v_j$表示生成的两个视角，$q_i(\cdot)$$q_j(\cdot)$表示推荐的最优增强策略，这两个策略会最大化$I\left(\mathbf{v}_{i} ; y\right)+I\left(\mathbf{v}_{j} ; y\right)-I\left(\mathbf{v}_{i} ; \mathbf{v}_{j}\right)$，即图2中$A+B+D$区域。
• 命题2： 给定任务$T$，标签为$y$，以及一系列编码器$\{f_i^1(\cdot),f_i^2(\cdot),\cdot\cdot\cdot\}$$\mathbf z_i$表示视角$\mathbf v_i$的输出，最优编码器应该要最大化$\mathbf v_i,\mathbf z_i,y$之间的互信息。对于$\mathbf v_j$视角同理。
• 命题3： 给定任务$T$，标签为$y$，提取出的表示$z_i,z_j$以及一系列聚合操作$\{c_1(\cdot),c_2(\cdot),\cdot\cdot\cdot\}$，最优对比模式$(c_i,c_j)$应该最大化$c_i(\mathbf z_i),c_j(\mathbf z_j),y$之间的互信息。

## 2.5 负采样扮演的角色

$\mathcal{L}=-\frac{1}{N} \sum_{n=1}^{N} \frac{\mathbf{z}_{i, n}}{\left\|\mathbf{z}_{i, n}\right\|} \cdot \frac{\mathbf{z}_{j, n}}{\left\|\mathbf{z}_{j, n}\right\|}$

# 3. 实验

## 3.3 InfoGCL Principle

1. GCA，在下游任务中使用图增强比不使用图增强性能更好，因为使用图增强可以实现更小的$I(v_i;v_j)$，即命题1。
2. GCA，使用不同的增强方式，模型性能更好，因为两种增强方式可以进一步减小$I(v_i;v_j)$，即命题1。
3. GCA，node dropping和subgraph sampling更通用一些，因为和属性mask，边扰动相比，它们对图语义的影响更小，$I(v_i;y)$$I(v_j;y)$更大，即命题1。
4. GCA，edge perturbation对社交网络有效，但是对分子网络有负面影响。因为社交网络的语义信息对边扰动比较robust，但是某些分子网络的语义非常依赖局部结构，对变扰动很敏感，会导致$I(v_i;y)$大幅降低。
5. MVGRL，node-graph对比模式往往比其他对比模式好。因为node-graph对比可以提取更多图结构信息，有利于任务标签的预测，即命题3.（这个不太明白，感觉有点扯）

## 3.4 负样本消融

