Iowa State University, Com S 472 project: NewsPet SummaryNewsPet is a news-reader web application that categorizes RSS news items using a trainable engine. GoalsThe application monitors a configurable set of RSS feeds for each user. For each news item retrieved, the application will place a link to it into one of several user-specified categories, depending on text content. There will always be a "trash" category that serves as the category for otherwise uncategorized items. The user is able to view each category's set of news items via a web interface. The user is able to re-categorize news items, (which serves as providing performance feedback to the application). (Application overview): Architecture sketches ApproachDesignCategorizationFor each read news item, a vector of per-category probabilities is retrieved from a Naive Bayes classifier. The most probable category is then assigned to the item, provided it meets some lower bound (depending on the number of categories). FeedbackFor every item in every category, there will be the ability to say that the article is accurately categorized and the system should be more confident in accepting documents like this one, or that the article should be categorized differently. LogisticsFor the main categorization portion of the application, we are utilizing a Naive Bayes classifier, (in particular, we are using Mallet as a library in our application). We are using Java for the classification and classifier training services, and Django, a python web framework, as our front end web-based UI. Informa is used as an RSS retriever and parser. TestingWe have tested the classification logic of our application with data from the Reuters 21578 collection. ReportThe most recent draft of the report for this project can be viewed here. Presentation slides are viewable here.