Edit distances and sequence alignment
During my senior year, I finally took a class on advanced C++. Surprisingly enough, it didn’t seem nearly as hard as the first one I had to struggle through a few years ago, and I ended up having a lot of fun with it.
As final project, I decided to work on edit distances and implement the Wagner–Fischer algorithm as an instance of dynamic programming. Later on, I expanded the project to also cover the Needleman-Wunsch algorithm for global sequence alignment.
Semantic role labeling using linear-chain CRF
My very last undergrad project for a class on advanced language modeling, where we discussed the theoretical foundations of hidden Markov models, the Viterbi and EM algorithms, log-linear models, maximum entropy models (MEMMs), and as well as conditional random fields (CRFs).
String to semantic graph alignment
For my undergrad thesis, I started working on semantic parsing: the problem of mapping natural language strings to meaning representations. In order to train a semantic parser for English to Abstract Meaning Representation (AMR), we first need to know which phrases in the input sentence invoked which concepts in the corresponding AMR graph. The project aimed at building an English/AMR aligner to solve this task automatically.
At WayBlazer, our product manager kept joking about how we needed a “dish AI” to review our catered lunches every day. This is it, featuring a preprocessed review data set, topic models, a Markov chain generator, and a Flask API to put it all together!
Semantic dependency graph parsing
For a class on semantic dependency graph parsing, I wrote a short script that computes statistics for semantic dependency graphs and generates plots for the distribution of words per indegree and outdegree. As final project, I submitted a comprehensive review on Abstract Meaning Representation (AMR), a set of English sentences paired with simple, readable semantic representations.
Research internship at Textkernel
In 2014, I was a research intern at Textkernel, where we explored new methods of improving resume parsing for multi-lingual documents.
In order to extract structured information in the form of specific phrases like name or address, we adopted the probabilistic conditional random fields (CRF) framework. In addition, we experimented with a novel approach that integrates continuous vector representations of words as input features for such a model.
Word meaning in context
In a nutshell, they attempt to model the intuition that word meaning is represented as a probability distribution over a set of latent senses, and thus modulated by context. They employ two different models: the first based on non-negative matrix factorization (NMF), and the second implementing Latent Dirichlet Allocation (LDA).
I studied abroad and learned some linguistics:
Consider an example where a zombie has died and been reanimated, and John drowns him.
Presentation slides may or may not help to get to the bottom of this!
Shortly after I learned that computational semantics was a thing, I implemented word similarity according to Dekang Lin (1998).
I took some classes on psycholinguistics, where I presented a range of interesting papers including “Expectation-based syntactic comprehension” (Levy, 2008) and “Dependency Locality Theory” (DLT) (Gibson, 2000). Slides below!