In this post, I will outline some important aspects of this language and vision paper. In a bird’s view, the authors tried to unify different modalities,(language and vision) in the same semantic space, to achieve a generalized model that performed well on different language and vision tasks such as visual…