Mining Tensors

MACH: Fast Randomized Tensor Decompositions

 Tensors naturally model a wide variety of datasets, and thus are used in a wide variety of data mining applications, including anomaly detection, intrusion detection, sensor networks, face recognition etc.  However, tensor decompositions are computationally expensive. MACH is an easy-to-implement sampling method for performing low-rank tensor decompositions. MACH can speed up significantly the computation, at the cost of losing a quantifiable (and typically small) amount of accuracy.


MACH using 10% of the input sensor data. The qualitative analysis remains unaffected by the sampling procedure

The original code was a few-line script implemented in Matlab, using the Tensor Toolbox by Tamara Kolda.  In general, given a tensor decomposition library, MACH is easy to  implement. For instance, using the scikit-tensor library in Python, one can use the following corrected version of MACH from K. Hayashi and Y. Yoshida (see also their NIPS’17 paper and their full code)


def mach(X, ranks, p):
""" Implementation of MACH proposed in C. E. Tsourakakis.
Mach: Fast randomized tensor decompositions. In SDM, pages 689–700, 2010. """
prod_ns = indn = np.random.choice(prod_ns, int(prod_ns * p), replace=False)
multinds = np.unravel_index(indn, X.shape)
X_sp = st.sptensor(multinds, 1/p*X[multinds], shape=X.shape)
_ranks = np.array(ranks)
_shape = np.array(X.shape)
_ind = _ranks>= _shape
_ranks[_ind] = _shape[_ind] - 1
return st.tucker_hooi(X_sp, _ranks.tolist(), init='nvecs') 

Two heads better than one: Pattern Discovery in
Time-evolving Multi-Aspect Data



Tensors naturally model multi-aspect time-series. Consider a sensor monitoring application where at each time-tick each sensor transmits a set of numerical values (e.g., temperature, humidity,  light intensity).  The time-series aspect of the tensor is special, since for instance consecutive values tend to be correlated.  Standard tensor decompositions ignore this fact. We developed a method, called 2-heads Tensor Analysis that combines classic multilinear analysis with wavelets. The proposed method is a powerful multi-dimensional time-series mining tool. For details, check our paper.

%d bloggers like this: