Linear Algebra Techniques and Tools for selected Web-science Applications
Internet Algorithmics and Information Retrieval have become grand challenge application
areas for Numerical Linear Algebra. Important problems include dimensionality reduction
of very large datasets, and ranking webpages based on link analysis of the Web graph.
Image and text collections are encoded by matrices whose elements are frequently
nonnegative. This motivates the construction of low rank approximations whose
constituent factors are also nonnegative matrices. It has been argued in the literature that
this could lead to better interpretability. Unfortunately, this also causes a severe increase in
the mathematical and computational complexity of the problem relative to the SVD, e.g. it
was recently shown by S. Vavasis that the exact nonnegative matrix factorization (NMF) is
We have been considering means to compute fast nonnegative approximate factorizations
directly from the unit rank terms in the SVD expansion of the matrix. Both the partial and
full SVD of the positive section of each such term is easily computable and can be used to
generate columns and rows for the left and right factors of the NMF. We present some
important features of this approach that can be readily applied to initialize NMF
Researchers are well aware of the fact that the process of algorithm development is made
easier when there exist tools for rapid prototyping. The impact of such tools is particularly
beneficial in fields such as IR, that have a significant interdisciplinary character.
Driven by this need, we will describe the development of TMG, a MATLAB-based toolbox
that enables rapid dataset development and algorithmic prototyping in IR and Web-IR.
TMG is being used by several researchers and educators worldwide.
Finally, if time permits, we will also describe some Linear Algebra methods related to
ranking webpages based on link analysis of the Web graph and sensor network