CSC581 Homepage
Topics in AI: Introduction to Machine Learning with Support Vector Machines
Spring 2016
Instructor:
Dr. Lutz Hamel
Tyler, Rm 251
Office Hours: Tuesday 12:301:30pm and Thursday 23pm
email: hamel@cs.uri.edu
Description:
Support vector machines (SVMs) belong to a new class of machine learning
algorithms with their origins firmly rooted in statistical learning theory. Due
to the strong theoretical foundation these algorithms possess desirable
properties such as the ability to learn from very small sample sets and a firm
estimation of the generalization capacity of the learned model. These
properties make this new class of learning algorithms extremely attractive to
the practitioner who is frequently faced with "not enough data" and needs to
understand "how good a constructed model" actually is. The fact that SVMs have
surpassed the performance of artificial neural networks in many areas such as
text categorization, speech recognition and bioinformatics bears witness to the
power of this new class of learning algorithms.
This course is an introduction to machine learning and SVMs. We begin by
framing the notion of machine learning and then develop basic concepts such as
hyperplanes, features spaces and kernels necessary for the construction of SVMs.
Once the theoretical groundwork has been laid we look at practical examples
where this class of algorithms can been applied. Here we use machine learning as
a knowledge discovery tool. We will use the statistical computing environment R
for our experiments.
The goals of this course are for you,
 To have a basic understanding of machine learning and knowledge discovery.
 To be familiar with the mathematical framework of describing data and constructing models.
 To be able to apply SVM packages in R to realworld problems.
Announcements:
*** The final is due Tuesday 5/10 @ midnight in Sakai ***
[4/14/16] posted assignment #8
[4/7/16] posted assignment #7
[3/30/16] Posted assignment #6
[3/16/16] ** Midterm: due Thursday 3/31 in Sakai **
[3/14/16] ** no class Thursday 3/17 **
[3/14/16] Hint for assignment #5  for data sets with large number of independent variables
you don't have to plot all the distributions, you should only plot the distributions of variables
that are "interesting" ie not normally or close to normally distributed.
[3/9/16] posted solutions to #4.
[3/8/16] posted assignment #5.
[2/29/16] posted assignment #4.
[2/29/16] Posted solutions for assignment #3.
[2/24/16] Hint for assignment #3: b value interval...[20,20] with step .1 is sufficient and
you can plot the decision surface with its corresponding margin as:
plot.decision.surface < function(w,b) {
slope = (w[1]/w[2])
offset = (b)/w[2]
offset1 = (b+1)/w[2]
offset2 = (b1)/w[2]
cat("slope = ", slope, "offset = ", offset,"\n")
# plot the decision surface with supporting hyperplanes
abline(offset,slope,lty="solid",lwd=2,col="green")
abline(offset1,slope,lty="dashed")
abline(offset2,slope,lty="dashed")
}
[2/20/16] posted solutions to assignment #2
[2/17/16] posted assignment #3
[2/3/16] posted assignment #2
[2/2/16] here are two data sets to use for 1.4: mammals and
biomedical
[1/29/16] CSC581 Sakai page is now live.
[1/29/16] A majority label classifier is a model that ignores all other information except
that is counts the number of occurrences of each label in the target attribute. The
model itself is a function that regardless of the object that you hand it always returns the
label with the largest number of occurrences  the majority label.
Here are some R hints for assignment #1 problem 1.4, assume that training.df is a
data frame where the last attribute is a categorical attribute (an attribute with labels), here
is R code that computes the majority label for the attribute:
n < ncol(training.df) # number of columns in a frame
target.attribute < training.df[[n]] # another way of retrieving columns from a frame using the [[ ]] notation
target.levels < table(target.attribute) # tabulate the levels in the target attribute
ix < which.max(target.levels) # find out which level appears most often
majority.label < names(target.levels[ix]) # convert the level descriptor into a string
Here is a learner that constructs a model that
given an appropriate object will always return the first label found in the training data set.
# The file contains a function 'learner' that constructs
# a model function based on the first label of the
# dependent attribute in the training data that it finds.
# The learner makes the assumption that the
# dependent attribute is always the last column
# in the training data.
#
# use:
# model < learner() # to build the model
# model(x) # to make some prediction of object x
learner < function(training.df) {
# make sure we are actually handed a data frame
if (!is.data.frame(training.df))
stop("not a data frame")
# find the number of columns
n < ncol(training.df)
# find the first label in the training data
label < training.df[[1,n]]
# build our model
# our model is a **function** that given any object always returns the label that appeared first in the
# training data
function(x) label
}
Here is an example how to use this learner and the model it builds.
Assume that the above code was saved in the file 'firstlabel.r' in some directory.
### get the current working directory
> getwd()
[1] "/assignment #1"
### my function is saved in the 'code' subfolder
> setwd("code")
### read in the function definition
> source("firstlabel.r")
### let's make sure the learner is what we expect it to be...
> learner
function(training.df) {
# make sure we are actually handed a data frame
if (!is.data.frame(training.df))
stop("not a data frame")
# find the number of columns
n < ncol(training.df)
# find the first label in the training data
label < training.df[[1,n]]
# build our model
# our model always returns the label that appeared first in the
# training data
function(x) label
}
### looks good ... load a data set
> data(iris)
### build a model
> m < learner(iris)
### let's take a look at the model  the model consists of a function with an appropriate
### environment that has the variable 'label' defined.
> m
function(x) label
environment: 0x100d0cdb8
### build a data frame that only has object descriptions, no labels
> objects < subset(iris,select=Species)
### apply the model to the first object ... note the notation!!
> m(objects[[1,]])
[1] setosa
Levels: setosa versicolor virginica
### apply the model to the 100th object
> m(objects[[100,]])
[1] setosa
Levels: setosa versicolor virginica
Now, putting all of this together you should be able to solve 1.4.
[1/28/16] Posted assignment #1  see below
[1/26/16] Welcome!
Documents of Interest:
Data Sets:
Many of the packages above have accompanying data sets. But the
premier source for experimental machine learning data sets is the UCI
Machine Learning Repository. The Statlib library
at CMU is another great place to look for data.
Assignments:

Assignment #1: Read Chapter 1, Read Appendix B. Do problems 1.1, 1.2, and 1.4. For problem 1.4
write your program in R and demonstrate that your program works with at least two different data sets.
Hand your source code, your data sets, and your example runs.
Due in Sakai 2/3.

Assignment #2: Read Chapters 2 through 5. Do problem 3.2 (prove the identities in Table 3.3), problem 3.4, and
problem 5.3. Hand in your code and your runs (you can use this dataset for training).
Due in Sakai 2/17.

Assignment #3: Read Chapter 6. Do problem 6.2. You can reuse your code from assignment #2 for parts 1
and 2 (or the code given on the solutions page). Use the QP solver given in file solveQP.r.
Submit copies of your code, your runs, and your analysis.
Due in Sakai 2/24.

Assignment #4: Read Chapter 7. Do problems 7.1, 7.3, 7.5. Due in Sakai Monday 3/7.

Assignment #5: Midterm Data Proposal. Due in Sakai Monday 3/14.

Assignment #6: Do Problem 11.2.
Use the svm function as implemented
in e1071, but only use it in binary mode. That is, you are supposed to implement the multiclass framework
around the binary svm implementation. For part b compare your multiclass implementation
for the R svm to the multiclass implementation of the e1071 package usign the iris data set.
Due in Sakai Wednesday 4/6.

Assignment #7: Read the The Backprop Algorithm for ANNs
. Install the 'neuralnet' package in R. Transform the iris data set into a 3class numeric data set
appropriate for training MLPs. Experiment with building MLPs. Build an MLP with the smallest **training classification error**. Hint: the plot function
and the model itself report the mse network error on the training set. Can you get the classification error down to 0? What is the relationship between
the MSE and the classification error? You can vary the number of hidden
units to give the network different learning abilities. Write a brief report describing your findings. Due in Sakai Thursday 4/14. Here is some example code
to get you going:
# load our data set
data(iris)
# make sure the ANN library is available
library(neuralnet)
# convert the labels into numeric labels and put them into a data frame
Species.numeric < as.numeric(iris$Species)
iris.df < data.frame(iris,Species.numeric)
# train a neural network with two hidden nodes
net < neuralnet(Species.numeric ~ Sepal.Width+Sepal.Length+Petal.Width+Petal.Length,iris.df,hidden=2)
# display the ANN
plot(net)
# report the ANN
net
# the training predictions from the ANN are numeric values, turn them into labels by rounding
result < round(net$net.result[[1]])
# plot the confusion matrix
table(iris.df$Species.numeric,result)
Assignment #8: Implement the ID3 algorithm as given in class and test it on the Play Tennis data set. Your report should
include: the confusion matrix showing your classification of the training data; a copy of the tree you generated for the data set, and your source code.
Due in Sakai Thursday 4/21 @ midnight. Extra Credit: Extend the basic ID3 algorithm with continuous variable support as discussed in class and
test it on this data set. Show your generated tree and the confusion matrix.