Tensors naturally model a wide variety of datasets, and thus are used in a wide variety of data mining applications, including anomaly detection, intrusion detection, sensor networks, face recognition etc. However, tensor decompositions are computationally expensive. MACH is an easy-to-implement sampling method for performing low-rank tensor decompositions. MACH can speed up significantly the computation, at the cost of losing a quantifiable (and typically small) amount of accuracy.
The original code was a few-line script implemented in Matlab, using the Tensor Toolbox by Tamara Kolda. In general, given a tensor decomposition library, MACH is easy to implement. For instance, using the scikit-tensor library in Python, one can use the following corrected version of MACH from K. Hayashi and Y. Yoshida (see also their NIPS’17 paper and their full code)
def mach(X, ranks, p): """ Implementation of MACH proposed in C. E. Tsourakakis. Mach: Fast randomized tensor decompositions. In SDM, pages 689–700, 2010. """ prod_ns = np.prod(X.shape) indn = np.random.choice(prod_ns, int(prod_ns * p), replace=False) multinds = np.unravel_index(indn, X.shape) X_sp = st.sptensor(multinds, 1/p*X[multinds], shape=X.shape) _ranks = np.array(ranks) _shape = np.array(X.shape) _ind = _ranks>= _shape _ranks[_ind] = _shape[_ind] - 1 return st.tucker_hooi(X_sp, _ranks.tolist(), init='nvecs')
Tensors naturally model multi-aspect time-series. Consider a sensor monitoring application where at each time-tick each sensor transmits a set of numerical values (e.g., temperature, humidity, light intensity). The time-series aspect of the tensor is special, since for instance consecutive values tend to be correlated. Standard tensor decompositions ignore this fact. We developed a method, called 2-heads Tensor Analysis that combines classic multilinear analysis with wavelets. The proposed method is a powerful multi-dimensional time-series mining tool. For details, check our paper.