Nguyen Van Hoang, Lee Pei Xuan, Kevin Leonardo, Calvin Tantio, Luong Quoc Trung, Tan Joon Kai Daniel
School of Computing, National University of Singapore
Document Reader estimate mle represent? prior estimate estimate estimate Contributions
Table 1: A sample QA pair in test set.
Signals:
• Constructing LNQA - a QA dataset on
• Input:question q, paragraph p Hypotheses Approach & Results (cont.) lecture notes
• Output: best answer span • Examine transfer learning from
Word representations: • Due to reduced complexity and better Classifier Data-set Rec@1 Rec@5 pre-trained QA model on larger dataset
generalization, training on LNQA from (SQuAD) to a smaller dataset (LNQA)
• Glove Word Embedding (only feature scaled-down warm-start model (sd-wsQA) on fastText LN 0.7231 0.86
• Examine improvement of context
of q) SQuAD improves reader performance
compared to direct inference model (iQA), Table 3: Recall@k of question classifier retrieval when specifying the
• Exact Match: 1 if p can be exactly
cold-start model (csQA), and full warm-start departments of questions
matched to one question word, 0 model (wsQA) Transfer learning on Document reader - • Examine SOTA sentence classifier in
otherwise • Narrowing search space by specifying Experiments: department prediction
• Linguistics features: POS, NER, TF departments improves retriever performance.
• Dataset: SQuAD(S), LNQA(L) • Examine improvement of time taken
• Aligned question embedding: Similarity • A SOTA sentence classifier (fastText) can
obtain relatively good performance in for hyper-parameter tuning when using
between p and q department prediction on question.
Model Pre Train Test EM F1 Tree-Structured Parzen Estimator
p1, ..., pm = BiLST M (pei, ...pen) (2) DrQA S S 69.5 78.8
Approach & Results References
q1, ..., ql = BiLST M (qei, ...qel ) (3) sdDrQA S S 62.9 72.6
2 independent classifiers predicting for the LNQA dataset: iQA S L 13.5 43.3 [1] Danqi Chen, Adam Fisch, Jason Weston, and
answer start and end: csQA L L 9.9 41.6
• Outsourcing the data gathering process Antoine Bordes.
Pstart(i) ∝ exp(piWsq ) (4) to the public using MTurk wsQA DrQA L L 26.1 56.2 Reading wikipedia to answer open-domain
Pend(i) ∝ exp(piWeq ) (5) sd-wsQA sdDrQA L L 28.7 56.9 questions.
Department specification on Document Table 4: Exact match and F1 scores of different
arXiv preprint arXiv:1704.00051, 2017.
Where W is the weight matrix to be Retriever - Experiments: QA models. Acknowledgements
trained.
• Retrieving w/ and w/o department
We choose the best span from token i to Hyper-parameter tuning: We would like to extend our gratitude to the
• fastText (SOTA sentence classifier)
token i0 such that i <= i0 <= i + 15 and • Grid search (GS) CS3244 teaching team for the opportunity to em-
Pstart(i) × Pend(i0) is maximized. Data-set Dept Rec@1 Rec@5 bark on this project, our anonymous reviewers
• Tree-structured Parzen Estimator for their invaluable feedback, and our Mechanical
SQuAD 0.74 0.91 (TPE) Turk respondents for their great work in our HITs.
LN-test 0.93 0.98 Special thanks go out to MIT OpenCourseWare
Method Time EM F1 for making education materials openly accessible,
LN 0.81 0.97
GS 3d, 20h 24.4 55.22 without which LNQA would not have been built.
LN X 0.91 0.98
Table 2: Recall@k of retriever on different datasets. TPE <1d 28.7 57.79
Table 5: Performance of different tuning methods.