org.carrot2.matrix.factorization

## Class KMeansMatrixFactorization

• All Implemented Interfaces:
IIterativeMatrixFactorization, IMatrixFactorization

```public class KMeansMatrixFactorization
extends Object```
Performs matrix factorization using the K-means clustering algorithm. This kind of factorization is sometimes referred to as Concept Decomposition Factorization.
• ### Field Summary

Fields
Modifier and Type Field and Description
`protected DoubleMatrix2D` `A`
Input matrix
`protected double[]` `aggregates`
Sorting aggregates
`protected double` `approximationError`
Current approximation error
`protected double[]` `approximationErrors`
Approximation errors during subsequent iterations
`protected static int` `DEFAULT_K`
`protected static int` `DEFAULT_MAX_ITERATIONS`
`protected static boolean` `DEFAULT_ORDERED`
`protected static ISeedingStrategy` `DEFAULT_SEEDING_STRATEGY`
`protected static double` `DEFAULT_STOP_THRESHOLD`
`protected int` `iterationsCompleted`
Iteration counter
`protected int` `k`
The desired number of base vectors
`protected int` `maxIterations`
The maximum number of iterations the algorithm is allowed to run
`protected boolean` `ordered`
Order base vectors according to their 'activity'?
`protected ISeedingStrategy` `seedingStrategy`
Seeding strategy
`protected double` `stopThreshold`
If the percentage decrease in approximation error becomes smaller than `stopThreshold`, the algorithm will stop.
`protected DoubleMatrix2D` `U`
Base vector result matrix
`protected DoubleMatrix2D` `V`
Coefficient result matrix
• ### Constructor Summary

Constructors
Constructor and Description
`KMeansMatrixFactorization(DoubleMatrix2D A)`
Creates the KMeansMatrixFactorization object for matrix A.
• ### Method Summary

All Methods
Modifier and Type Method and Description
`void` `compute()`
Computes the factorization.
`double[]` `getAggregates()`
Returns column aggregates for a sorted factorization, and `null` for an unsorted factorization.
`double` `getApproximationError()`
Returns approximation error achieved after the last iteration of the algorithm or -1 if the approximation error is not available.
`double[]` `getApproximationErrors()`
`int` `getIterationsCompleted()`
Returns the number of iterations the algorithm has completed.
`int` `getK()`
Returns the number of base vectors k .
`int` `getMaxIterations()`
Returns the maximum number of iterations the algorithm is allowed to run.
`ISeedingStrategy` `getSeedingStrategy()`
`double` `getStopThreshold()`
Returns the algorithms `stopThreshold`.
`DoubleMatrix2D` `getU()`
Returns the U matrix (base vectors matrix).
`DoubleMatrix2D` `getV()`
Returns the V matrix (coefficient matrix)
`boolean` `isOrdered()`
Returns `true` when the factorization is set to generate an ordered basis.
`protected void` `order()`
Orders U and V matrices according to the 'activity' of base vectors.
`void` `setK(int k)`
Sets the number of base vectors k .
`void` `setMaxIterations(int maxIterations)`
Sets the maximum number of iterations the algorithm is allowed to run.
`void` `setOrdered(boolean ordered)`
Set to `true` to generate an ordered basis.
`void` `setSeedingStrategy(ISeedingStrategy seedingStrategy)`
`void` `setStopThreshold(double stopThreshold)`
Sets the algorithms `stopThreshold`.
`String` `toString()`
`protected boolean` `updateApproximationError()`
• ### Methods inherited from class java.lang.Object

`clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait`
• ### Methods inherited from interface org.carrot2.matrix.factorization.IMatrixFactorization

`getU, getV`
• ### Field Detail

• #### k

`protected int k`
The desired number of base vectors
• #### DEFAULT_K

`protected static int DEFAULT_K`
• #### maxIterations

`protected int maxIterations`
The maximum number of iterations the algorithm is allowed to run
• #### DEFAULT_MAX_ITERATIONS

`protected static final int DEFAULT_MAX_ITERATIONS`
Constant Field Values
• #### stopThreshold

`protected double stopThreshold`
If the percentage decrease in approximation error becomes smaller than `stopThreshold`, the algorithm will stop. Note: calculation of approximation error is quite costly. Setting the threshold to -1 turns off approximation error calculation and hence makes the algorithm do the maximum number of iterations.
• #### DEFAULT_STOP_THRESHOLD

`protected static double DEFAULT_STOP_THRESHOLD`
• #### seedingStrategy

`protected ISeedingStrategy seedingStrategy`
Seeding strategy
• #### DEFAULT_SEEDING_STRATEGY

`protected static final ISeedingStrategy DEFAULT_SEEDING_STRATEGY`
• #### ordered

`protected boolean ordered`
Order base vectors according to their 'activity'?
• #### DEFAULT_ORDERED

`protected static final boolean DEFAULT_ORDERED`
Constant Field Values
• #### approximationError

`protected double approximationError`
Current approximation error
• #### approximationErrors

`protected double[] approximationErrors`
Approximation errors during subsequent iterations
• #### iterationsCompleted

`protected int iterationsCompleted`
Iteration counter
• #### aggregates

`protected double[] aggregates`
Sorting aggregates
• #### A

`protected DoubleMatrix2D A`
Input matrix
• #### U

`protected DoubleMatrix2D U`
Base vector result matrix
• #### V

`protected DoubleMatrix2D V`
Coefficient result matrix
• ### Constructor Detail

• #### KMeansMatrixFactorization

`public KMeansMatrixFactorization(DoubleMatrix2D A)`
Creates the KMeansMatrixFactorization object for matrix A. Before accessing results, perform computations by calling the `compute()` method.
Parameters:
`A` - matrix to be factorized. The matrix must have Euclidean length-normalized columns.
• ### Method Detail

• #### compute

`public void compute()`
Computes the factorization.
• #### toString

`public String toString()`
Overrides:
`toString` in class `Object`
• #### setK

`public void setK(int k)`
Sets the number of base vectors k .
Parameters:
`k` - the number of base vectors
• #### getK

`public int getK()`
Returns the number of base vectors k .
• #### updateApproximationError

`protected boolean updateApproximationError()`
Returns:
true if the decrease in the approximation error is smaller than the `stopThreshold`
• #### order

`protected void order()`
Orders U and V matrices according to the 'activity' of base vectors.
• #### getSeedingStrategy

`public ISeedingStrategy getSeedingStrategy()`
• #### setSeedingStrategy

`public void setSeedingStrategy(ISeedingStrategy seedingStrategy)`
• #### getMaxIterations

`public int getMaxIterations()`
Returns the maximum number of iterations the algorithm is allowed to run.
• #### setMaxIterations

`public void setMaxIterations(int maxIterations)`
Sets the maximum number of iterations the algorithm is allowed to run.
• #### getStopThreshold

`public double getStopThreshold()`
Returns the algorithms `stopThreshold`. If the percentage decrease in approximation error becomes smaller than `stopThreshold`, the algorithm will stop.
• #### setStopThreshold

`public void setStopThreshold(double stopThreshold)`
Sets the algorithms `stopThreshold`. If the percentage decrease in approximation error becomes smaller than `stopThreshold`, the algorithm will stop.

Note: calculation of approximation error is quite costly. Setting the threshold to -1 turns off calculation of the approximation error and hence makes the algorithm do the maximum allowed number of iterations.

• #### getApproximationError

`public double getApproximationError()`
Description copied from interface: `IIterativeMatrixFactorization`
Returns approximation error achieved after the last iteration of the algorithm or -1 if the approximation error is not available.
Specified by:
`getApproximationError` in interface `IIterativeMatrixFactorization`
Returns:
approximation error or -1
• #### getApproximationErrors

`public double[] getApproximationErrors()`
• #### getIterationsCompleted

`public int getIterationsCompleted()`
Description copied from interface: `IIterativeMatrixFactorization`
Returns the number of iterations the algorithm has completed.
Specified by:
`getIterationsCompleted` in interface `IIterativeMatrixFactorization`
Returns:
the number of iterations the algorithm has completed
• #### isOrdered

`public boolean isOrdered()`
Returns `true` when the factorization is set to generate an ordered basis.
• #### setOrdered

`public void setOrdered(boolean ordered)`
Set to `true` to generate an ordered basis.
• #### getAggregates

`public double[] getAggregates()`
Returns column aggregates for a sorted factorization, and `null` for an unsorted factorization.
• #### getU

`public DoubleMatrix2D getU()`
Description copied from interface: `IMatrixFactorization`
Returns the U matrix (base vectors matrix).
Specified by:
`getU` in interface `IMatrixFactorization`
Returns:
U matrix
• #### getV

`public DoubleMatrix2D getV()`
Description copied from interface: `IMatrixFactorization`
Returns the V matrix (coefficient matrix)
Specified by:
`getV` in interface `IMatrixFactorization`
Returns:
V matrix