Anda di halaman 1dari 2

Statement of purpose

Name: Objective My research interest is in machine learning. I am particularly interested in developing computationally ecient statistical inference methods to solve problems arising in large data settings. My objective for pursuing a PhD is to become an expert researcher in the eld of machine learning with capabilities to identify the problems suitable for machine learning methods and to develop new methods (or extend old methods) to solve the problems. By the end of PhD, I will have become an ideal candidate for a research position in the industry as well as in the academia. Why pursue a PhD in statistics It appears that the current trend in machine learning is in applying probabilistic and statistical modeling techniques for accurate inference. Ecient computation is also a challenge as many of the existing methods do not scale easily to larger data. I realize that to become an expert in machine learning, I would have to build strong foundation in mathematics (specically, probability theory) and computer science as well as statistics. Although it is more popular to approach machine learning from computer science, my decision is to approach machine learning from statistics based on two reasons. First, statisticians use probabilistic models to capture uncertainty for accurate inference; hence, to become a statistician, one needs to be familiar with modern advances of probability theory. Second, in order to deal with abundance of data, there is a growing emphasis on computation in modern statistics. A relatively new eld of computational statistics is at the frontiers of statistics research so that statistics can be applied to large data settings. Therefore, I concluded that the best way to aproach machine learning is from statistics as it is the only discipline that specically focuses on three components (probability theorey, computation, and inference) necessary for becoming an expert researcher in machine learning. Research experience As part of the research curriculum during my Masters degree, I participated in a weekly machine learning reading group where the members took turns to read a paper and present the main results to the group. The topics chosen include computational methods for large dataset, Bayesian computational methods, and non-parametric Bayesian modeling techniques to name a few. We also read many applied research papers where the statistical methods are applied to problems in computational linguistics and phylogenetics. The reading group trained me to read and extract the main points eciently from the research papers. I have one publication titled Entangled Monte Carlo, which was published in the proceedings of the 25th conference of Neural Information Processing Systems (NIPS). I have attended the conference and made a spotlight presentation as well as a poster presentation. The paper proposes a method for eciently distributing computation of the popular Sequential Monte Carlo method over multiple computing nodes. The reading group helped me to shape out my research interest. Currently, I am interested in a problem of inferring evolutionary relationship between (natural) languages using statistical modeling and computational Seong-Hwan Jun

techniques. I intend to tackle many problems arising in this eld of computational linguistics by applying statistical models and machine learning methods. Another interest I have is in non-parametric statistical methods. I gained appreciation for this class of methods while attending the NIPS conference as I noticed that many machine learning researchers apply nonparametric statistical methods, both Bayesian and frequentist, to solve the variety of problems. I have been introduced to Bayesian non-parametric methods through the aforementioned reading group; however, I have never had much contact with the recent developments in non-parametric (frequentist) statistics. Recently, I found myself to often wanting to learn about the non-parametric methods and to extend them so that I can apply them to my research. It is one of new rising interests of mine, which I intend to explore in the future. I gave presentation on Sequential Monte Carlo and Entangled Monte Carlo methods at the SFU-UBC joint seminar in September 2012. I gave two presentations at the UBC Department of Statistics student seminar. In the rst presentation, I introduced basics of C/C++ programming and GNU gsl library to the fellow students. In the second presentation, I gave a walk-through on how to use Amazon EC2 servers for free and provided tips for performing computing on department servers.

Anda mungkin juga menyukai