public class SVM extends Object implements Predictor<SVM>
It can be used for binary classification problems, with the labels set as +1.0 to indiciate a positive example and -1.0 to indicate a negative example.
The algorithm solves the following minimization problem:
min_{w in bbb"R"^d} lambda/2 ||w||^2 + 1/n sum_(i=1)^n l_{i}(w^Tx_i)
with w
being the weight vector, lambda
being the regularization constant,
x_{i} in bbb"R"^d
being the data points and l_{i}
being the convex loss functions, which
can also depend on the labels y_{i} in bbb"R"
.
In the current implementation the regularizer is the 2-norm and the loss functions are the
hinge-loss functions:
l_{i} = max(0, 1 - y_{i} * w^Tx_i
With these choices, the problem definition is equivalent to a SVM with soft-margin. Thus, the algorithm allows us to train a SVM with soft-margin.
The minimization problem is solved by applying stochastic dual coordinate ascent (SDCA). In order to make the algorithm efficient in a distributed setting, the CoCoA algorithm calculates several iterations of SDCA locally on a data block before merging the local updates into a valid global state. This state is redistributed to the different data partitions where the next round of local SDCA iterations is then executed. The number of outer iterations and local SDCA iterations control the overall network costs, because there is only network communication required for each outer iteration. The local SDCA iterations are embarrassingly parallel once the individual data partitions have been distributed across the cluster.
Further details of the algorithm can be found here
.
Modifier and Type | Class and Description |
---|---|
static class |
SVM.Blocks$ |
static class |
SVM.Iterations$ |
static class |
SVM.LocalIterations$ |
static class |
SVM.OutputDecisionFunction$ |
static class |
SVM.Regularization$ |
static class |
SVM.Seed$ |
static class |
SVM.Stepsize$ |
static class |
SVM.ThresholdValue$ |
Constructor and Description |
---|
SVM() |
Modifier and Type | Method and Description |
---|---|
static SVM |
apply() |
static Object |
fitSVM()
FitOperation which trains a SVM with soft-margin based on the given training data set. |
static <T extends Vector> |
predictVectors()
Provides the operation that makes the predictions for individual examples.
|
SVM |
setBlocks(int blocks)
Sets the number of data blocks/partitions
|
SVM |
setIterations(int iterations)
Sets the number of outer iterations
|
SVM |
setLocalIterations(int localIterations)
Sets the number of local SDCA iterations
|
SVM |
setOutputDecisionFunction(boolean outputDecisionFunction)
Sets whether the predictions should return the raw decision function value or the
thresholded binary value.
|
SVM |
setRegularization(double regularization)
Sets the regularization constant
|
SVM |
setSeed(long seed)
Sets the seed value for the random number generator
|
SVM |
setStepsize(double stepsize)
Sets the stepsize for the weight vector updates
|
SVM |
setThreshold(double threshold)
Sets the threshold above which elements are classified as positive.
|
static String |
WEIGHT_VECTOR() |
scala.Option<DataSet<DenseVector>> |
weightsOption()
Stores the learned weight vector after the fit operation
|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
parameters
public static String WEIGHT_VECTOR()
public static SVM apply()
public static <T extends Vector> Object predictVectors()
public static Object fitSVM()
FitOperation
which trains a SVM with soft-margin based on the given training data set.
public scala.Option<DataSet<DenseVector>> weightsOption()
public SVM setBlocks(int blocks)
blocks
- public SVM setIterations(int iterations)
iterations
- public SVM setLocalIterations(int localIterations)
localIterations
- public SVM setRegularization(double regularization)
regularization
- public SVM setStepsize(double stepsize)
stepsize
- public SVM setSeed(long seed)
seed
- public SVM setThreshold(double threshold)
The predict
and evaluate
functions will return +1.0 for items with a decision
function value above this threshold, and -1.0 for items below it.
threshold
- public SVM setOutputDecisionFunction(boolean outputDecisionFunction)
When setting this to true, predict and evaluate return the raw decision value, which is the distance from the separating hyperplane. When setting this to false, they return thresholded (+1.0, -1.0) values.
outputDecisionFunction
- When set to true, predict
and evaluate
return the raw
decision function values. When set to false, they return the
thresholded binary values (+1.0, -1.0).Copyright © 2014–2017 The Apache Software Foundation. All rights reserved.