I presented my candidacy exam on May 1, 2019. The overall theme was word embeddings; the specific topics were word embedding methods, analysis of word embeddings, and word embedding theory.
Slides from my exam can be found here. References are listed below. The link in the title of each paper is to the version available at the time I was preparing for the exam. For preprints I used in the exam that were later published, a link to the publication version is also provided.
Experiments with LSA Scoring: Optimal Rank and Basis. John Caron. 2000.
Distributed Representations of Words and Phrases and their Compositionality. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado and Jeffrey Dean. NeurIPS 2013.
Neural Word Embedding as Implicit Matrix Factorization. Omer Levy and Yoav Goldberg. NeurIPS 2014.
GloVe: Global Vectors for Word Representation. Jeffrey Pennington, Richard Socher and Christopher Manning. EMNLP 2014.
Dependency-Based Word Embeddings. Omer Levy and Yoav Goldberg. ACL 2014.
Word Representations via Gaussian Embedding. Luke Vilnis and Andrew McCallum. ICLR 2015.
Predicting human similarity judgments with distributional models: The value of word associations. Simon De Deyne, Amy Perfors and Daniel Navarro. COLING 2016.
Poincaré Embeddings for Learning Hierarchical Representations. Maximilian Nickel and Douwe Kiela. NeurIPS 2017.
Deep contextualized word representations. Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee and Luke Zettlemoyer. NAACL 2018.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova. Preprint, 2018. (Published at NAACL 2019.)
How much do word embeddings encode about syntax?. Jacob Andreas and Dan Klein. ACL 2014.
Improving Distributional Similarity with Lessons Learned from Word Embeddings. Omer Levy, Yoav Goldberg and Ido Dagan. TACL, 2015.
Orthogonality of Syntax and Semantics within Distributional Spaces. Jeff Mitchell and Mark Steedman. ACL-IJCNLP 2015.
Factorization of Latent Variables in Distributional Semantic Models. Arvid O ̈sterlund, David O ̈dling and Magnus Sahlgren. EMNLP 2015.
The Role of Context Types and Dimensionality in Learning Word Embeddings. Oren Melamud, David McClosky, Siddharth Patwardhan and Mohit Bansal. NAACL 2016.
The (Too Many) Problems of Analogical Reasoning with Word Vectors. Anna Rogers, Aleksandr Drozd and Bofang Li. *SEM 2017.
What Analogies Reveal about Word Vectors and their Compositionality. Gregory Finley, Stephanie Farmer and Serguei Pakhomov. *SEM 2017.
The strange geometry of skip-gram with negative sampling. David Mimno and Laure Thompson. EMNLP 2017.
Factors Influencing the Surprising Instability of Word Embeddings. Laura Wendlandt, Jonathan Kummerfeld and Rada Mihalcea. NAACL 2018.
A Structural Probe for Finding Syntax in Word Representations. John Hewitt and Christopher Manning. NAACL 2019.
Model-based Word Embeddings from Decompositions of Count Matrices. Karl Stratos, Michael Collins and Daniel Hsu. ACL-IJCNLP 2015.
A Latent Variable Model Approach to PMI-based Word Embeddings. Sanjeev Arora, Yuanzhi Li, Yingyu Liang, Tengyu Ma and Andrej Risteski. TACL, 2016.
Word Embeddings as Metric Recovery in Semantic Spaces. Tatsunori Hashimoto, David Alvarez-Melis and Tommi Jaakkola. TACL, 2016.
Skip-Gram – Zipf + Uniform = Vector Additivity. Alex Gittens, Dimitris Achlioptas and Michael Mahoney. ACL 2017.
Linear Algebraic Structure of Word Senses, with Applications to Polysemy. Sanjeev Arora, Yuanzhi Li, Yingyu Liang, Tengyu Ma and Andrej Risteski. TACL, 2018.
On the Dimensionality of Word Embedding. Zi Yin and Yuanyuan Shen. NeurIPS 2018.
Gromov-Wasserstein Alignment of Word Embedding Spaces. David Alvarez-Melis and Tommi Jaakkola. EMNLP 2018.
Towards Understanding Linear Word Analogies. Kawin Ethayarajh, David Duvenaud and Graeme Hirst. Preprint, 2018. (Published at ACL 2019.)
Analogies Explained: Towards Understanding Word Embeddings. Carl Allen and Timothy Hospedales. Preprint, 2019. (Published at ICML 2019.)
Understanding Composition of Word Embeddings via Tensor Decomposition. Abraham Frandsen and Rong Ge. Preprint, 2019. (Published at ICLR 2019.)