work Efficient Self-supervised Vision Transformers Learning visual representation from unlabelled images Self-superivsed sentence embedding Organizing Sentences via Pre-trained Modeling of a Latent Space Vision and Language Pre-training Enrich cross-modal representations by connecting image understanding with rich semantics from language fun Planning reserved for next project