|
author | Wyatt C Olney |
title | Automatic Summarization of Source Code for Novice Programmers |
abstract |
The process of generating part-of-speech information is a well established problem in the field of computer science. A wide variety of taggers exist, and have been
trained to use english text, and extract this information automatically. However, these taggers are traditionally only used for parsing information from traditional
written English, such as news articles. Many of these taggers are evaluated on the Wall Street Journal corpus, which consists of many such articles. However, natural
language artifacts also appear in the corpus of software source code, such as in method names. This thesis proposes a methodology for comparing these taggers on source
code artifacts, and evaluating their overall accuracy. Additionally, a potential application of part-of-speech tagging source code is presented in this thesis.
Specifically, a tool for novice programmers is developed and shown how this could be improved using this linguistic information to generate better, and more detailed
summaries for novices, by extracting information from method names. These types of summaries would allow beginning programmers to learn how to read and work with code
written by others. This is a major component of learning to work with code, especially with the collaborative nature of many modern software projects. By generating
summaries automatically, the daunting appearance of production level source code becomes easier to broach and understand for a novice.
|
school | The College of Liberal Arts, Drew University |
degree | B.A. (2016) |
advisor | Emily Hill |
committee | Steven Kass Patrick Dolan John Muccigrosso |
full text | WCOlney.pdf |
| |