Can Generative Language Models Learn to Generate Valid Statements from Premises
- Investigated whether a state-of-the-art (SOTA) pretrained generative language model (Google T5) is able to learn to generate valid statements given two premise statements (e.g., “steel is a metal” + “metal is a thermal conductor” -> “steel is a thermal conductor”).
- We used the QASC dataset because it is in natural language, and the “premises-conclusion” pairs are given.
- We found that some simple statements can be generated well, while the model failed on the complex ones, such as those requiring monotinicity reasoning and rephrasing.
- The project is implemented with python and pytorch.
Hybrid Polarity Classifier for Biomedical Events
- Compared the LSTM-based classifier with the linguistic-knowledge-informed classifier on polarity detection of biomedical events.
- Integrated the LSTM classifier to Reach (a information extraction software for biomedical publications, implemented in scala).
- Implemented a hybrid classifier to combine the LSTM one and the linguistic one. The routing classifier works by counting the negation words in the text: if there are many negation words, use the LSTM classifier; otherwise use the linguistic classifier.
- This project is implemented in python, dynet and scala.