References

(under construction)

Table of Contents
Books
Surveys
Selected Papers by Topic:

Mixture Models
Topic Models
Clustering
Dimensionality Reduction
ICA
PAC learning
Online learning
Statistical learning
Cortical learning
Deep learning

Books

Foundations of Data Science (A. Blum, J. Hopcroft and R. Kannan)

An Introduction to Computational Learning Theory (M. Kearns and U. Vazirani)

Understanding Machine Learning: From Theory to Algorithms (S. Shalev-Shwartz and S. Ben-David)

Spectral Algorithms (R. Kannan and S. Vempala)

Introduction to Online Convex Optimization (E. Hazan)

Surveys (talks and articles):

Online Learning (by N. Cesa-Bianchi)

The Complexity of Unsupervised Learning (by S. Vempala)

N. Cesa-Bianchi and G. Lugosi.
Prediction, learning, and games. Cambridge University Press, 2006.

A. Blum and Y. Mansour.
Learning, Regret Minimization, and Equilibria.
Book chapter in Algorithmic Game Theory, 2007.

Selected Papers by Topic:

Mixture Models

D. Achlioptas and F. McSherry.
On spectral learning of mixtures of distributions.
In Proc. of COLT, 2005.

A. Anandkumar, D. Foster, D. Hsu, S. M. Kakade, and Y.-K. Liu.
A spectral algorithm for latent dirichlet allocation.
In Advances in Neural Information Processing Systems 25, pp. 926–934, 2012.

S. Arora and R. Kannan.
Learning mixtures of arbitrary gaussians.
Annals of Applied Probability, 15(1A):69-92, 2005.

M. Belkin and K. Sinha.
Polynomial learning of distribution families.
In FOCS, 2010.

S. C. Brubaker.
Robust pca and clustering on noisy mixtures.
In Proc. of SODA, 2009.

S. C. Brubaker and S. Vempala.
Isotropic pca and affine-invariant clustering.
In M. Grötschel and G. Katona, editors, Building Bridges
Between Mathematics and Computer Science, volume 19 of Bolyai Society
Mathematical Studies, 2008.

K. Chaudhuri and S. Rao.
Beyond gaussians: Spectral methods for learning mixtures of
heavy-tailed distributions.
In Proc. of COLT, 2008.

S. DasGupta.
Learning mixtures of gaussians.
In Proc. of FOCS, 1999.

A. Dasgupta, J. Hopcroft, J. Kleinberg, and M. Sandler.
On learning mixtures of heavy-tailed distributions.
In Proc. of FOCS, 2005.

S. DasGupta and L. Schulman.
A two-round variant of em for gaussian mixtures.
In Proc. of UAI, 2000.

J. Feldman and R. O’Donnell.
Learning mixtures of product distributions over discrete domains.
SIAM Journal on Computing, 37(5):1536-1564, 2008.

Y. Freund and Y. Mansour.
Estimating a mixture of two product distributions.
In Proc. of COLT, pages 53-62, 1999.

M. Hardt and E. Price
Sharp bounds for learning a mixture of two gaussians
CoRR abs/1404.4997 (2014).

D. Hsu and S. M. Kakade.
Learning mixtures of spherical gaussians: moment methods and spectral decompositions.
In Proc of ITCS, pp, 11-20, 2013.

R. Kannan, H. Salmasian, and S. Vempala.
The spectral method for general mixture models.
SIAM Journal on Computing, 38(3):1141-1156, 2008.

A. T. Kalai, A. Moitra, and G. Valiant.
Efficiently learning mixtures of two Gaussians.
In Proc of STOC, pp. 553-562, 2010.

A. Moitra, and G. Valiant.
Settling the Polynomial Learnability of Mixtures of Gaussians.
In Proc of FOCS, pp. 93-102, 2010.

S. Vempala and G. Wang.
A spectral algorithm for learning mixtures of distributions.
Journal of Computer and System Sciences, 68(4):841-860, 2004.

Topic models

A. Anandkumar, D. Foster, D. Hsu, S. Kakade and Y. Liu. A Spectral Algorithm for Latent Dirichlet Allocation. NIPS 2012.

S. Arora, R. Ge, and A. Moitra.
Learning Topic Models — Going Beyond SVD.
In FOCS, pp. 1-10, 2012.

T. Bansal, C. Bhattacharyya, and R. Kannan.
A provable SVD-based algorithm for learning topics in dominant admixture corpus.
arXiv preprint arXiv:1410.6991 (2014).

D. M. Blei, A. Y. Ng, and M. I. Jordan.
Latent dirichlet allocation.
The Journal of machine Learning research, 3, pp. 993-1022, 2003.

C. Papadimitriou, P. Raghavan, H. Tamaki, and S. Vempala.
Latent semantic indexing: A probabilistic analysis.
In Proc. of PODS, pp. 159–168, 1998.

Clustering

M. Ackerman and S. Ben-David. Measures of Clustering Quality: A working set of axioms for clustering. NIPS 2008.

D. Arthur and S. Vassilvitskii.
k-means++: The advantages of careful seeding.
In Proc. of SODA, 2007.

O. Awasthi and O. Sheffet.
Improved spectral-norm bounds for clustering.
In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, pp. 37-49, 2012.

A. Blum, M.F. Balcan and S. Vempala. A Discriminative Framework for Clustering via Similarity Functions. STOC 2008.

A. Blum, M.F. Balcan and A. Gupta. Clustering Under Approximation Stability, JACM,Volume 60, Issue 2, 2013.

M. Charikar, S. Guha, É. Tardos, and D. B. Shmoys.A constant-factor approximation algorithm for the k-median problem. In Proc. of the 31st Annual ACM Symposium on Theory of Computing, pages 1-10, 1999.

D. Cheng, R. Kannan, S. Vempala, and G. Wang.
A divide-and-merge methodology for clustering.
ACM Trans. Database Syst., 31(4):1499-1525, 2006.

M. B. Cohen, S. Elder, C. Musco, C. Musco, and M. Persu.
Dimensionality reduction for k-means clustering and low rank approximation.
arXiv preprint arXiv:1410.6801 (2014).

A. Dasgupta, J. Hopcroft, R. Kannan, and P. Mitra.
Spectral clustering with limited independence.
In Proc. of SODA, pages 1036-1045, Philadelphia, PA, USA,
2007. Society for Industrial and Applied Mathematics.

W. Fernandez de la Vega, M. Karpinski, C. Kenyon, and Y. Rabani.
Approximation schemes for clustering problems.
In Proc. 35th ACM STOC, pages 50-58, 2003.

S. Har-Peled and K. R. Varadarajan.
Projective clustering in high dimensions using core-sets.
In Symposium on Computational Geometry, pages 312-318, 2002.

J. A. Hartigan and M. A. Wong.
A k-means clustering algorithm.
Applied Statistics, 28(1):100-108, 1979.

P. Indyk.
A sublinear time approximation scheme for clustering in metric spaces.
In Proc. 40th IEEE FOCS, pages 154-159, 1999.

R. Kannan, S. Vempala, and A. Vetta.
On clusterings: Good, bad and spectral.
J. ACM, 51(3):497-515, 2004.

T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Piatko, R. Silverman and A. Y. Wu.
A Local Search Approximation Algorithm for k-Means Clustering.
In Proceedings of the eighteenth annual symposium on Computational geometry, pp. 10-18. ACM, 2002.

J. Kleinberg.
An impossibility theorem for clustering.
NIPS 2002.

Dimensionality Reduction

N. Ailon and B. Chazelle.
The fast Johnson-Lindenstrauss transform and approximate nearest neighbors.
SIAM Journal on Computing, 39(1), 302-322, 2009

R. I. Arriaga and S. Vempala.
An algorithmic theory of learning: Robust concepts and random projection.
In Foundations of Computer Science, 1999. 40th Annual Symposium on, pp. 616-623, 1999.

S. Dasgupta and A. Gupta.
An elementary proof of a theorem of Johnson and Lindenstrauss.
Random Structures & Algorithms 22, no. 1, pp 60-65, 2003.

W. Johnson, and J. Lindenstrauss.
Extensions of Lipschitz mapping into a Hilbert space.
Conference in modern analysis and probability, pp. 189—206, 1984.

P. Indyk and R. Motwani.
Approximate nearest neighbors: towards removing the curse of dimensionality.
In Proc. 30th ACM STOC, pp. 604-613, 1998.

S. Vempala.
The Random Projection Method.
AMS, 2004.

ICA

A. Anandkumar, D. Foster, D. Hsu, S. M. Kakade, and Y.-K. Liu.
A spectral algorithm for latent dirichlet allocation.
In Advances in Neural Information Processing Systems 25, pp. 926–934, 2012.

A. Anandkumar, R. Ge, D. Hsu, S. M. Kakade, and M. Telgarsky.
Tensor decompositions for learning latent variable models.
CoRR abs/1210.7559 (2012).

S. Arora, R. Ge, A. Moitra, and S. Sachdeva.
Provable ICA with unknown gaussian noise, with implications for gaussian mixtures and autoencoders.
In Proc. of NIPS, pp. 2384–2392, 2012.

M. Belkin, L. Rademacher, and J. Voss.
Blind signal separation in the presence of Gaussian noise.
In Proc. of COLT, 2013.

J. Cardoso.
Source separation using higher order moments.
In International Conference on Acoustics, Speech, and Signal Processing, 1989.

P. Comon, and C. Jutten.
Eds. Handbook of Blind Source Separation.
Academic Press, 2010.

A. M. Frieze, M. Jerrum, and R. Kannan.
Learning linear transformations.
In Proc. of FOCS, pp. 359–368, 1996.

N. Goyal, S. Vempala, and Y. Xiao.
Fourier PCA and robust tensor decomposition.
In Proc. of STOC, pp. 584–593, 2014

A. Hyvarinen, J. Karhunen, and E. Oja.
Independent Component Analysis.
Wiley, 2001.

P. Q. Nguyen, and O. Regev.
Learning a parallelepiped: Cryptanalysis of GGH and NTRU signatures.
J. Cryptology 22, 2, pp. 139–160, 2009.

S. Vempala and Y. Xiao
Max vs Min: Independent Component Analysis with nearly Linear Sample Complexity
CoRR abs/1412.2954 (2014)

Y. Xiao.
Fourier pca package.
GitHuB (2014). http://github.com/yingusxiaous/libFPCA

A. Yeredor.
Blind source separation via the second characteristic function.
Signal Processing 80, 5, pp. 897–902, 2000

PAC learning

D. Angluin, and P. Laird.
Learning from noisy examples.
Machine Learning, 2(4), 343-370,1988.

A. Blumer, A. Ehrenfeucht, D. Haussler, and M. K. Warmuth.
Occam’s razor.
Information processing letters, 24(6), 377-380, 1987.

A. Blumer, A. Ehrenfeucht, D. Haussler, and M. K. Warmuth.
Learnability and the Vapnik-Chervonenkis dimension.
Journal of the ACM (JACM) 36.4, pp. 929-965, 1989.

M. Collins, R. E. Schapire, and Y. Singer.
Logistic regression, AdaBoost and Bregman distances.
Machine Learning, 48(1-3), 253-285, 2002

C. Cortes and V. Vapnik.
Support-vector networks.
Machine learning, 20(3), 273-297, 1995.

A. Ehrenfeucht, D. Haussler, M. Kearns, and L. Valiant.
A general lower bound on the number of examples needed for learning.
Information and Computation, 82(3), 247-261, 1989

Y. Freund, and R. E. Schapire.
A decision-theoretic generalization of on-line learning and an application to boosting.
Journal of computer and system sciences, 55(1), 119-139, 1997.

P. Gopalan, A. T. Kalai, A. R. Klivans.
Agnostically Learning Decision Trees
In the Proceedings of the 40th ACM Symposium on Theory of Computing (STOC), 2008.

D. Haussler.
Learning conjunctive concepts in structural domains.
Machine learning 4, no. 1, pp. 7-40. 1989.

D. Haussler.
Probably approximately correct learning.
University of California, Santa Cruz, Computer Research Laboratory, 1990.

D. Haussler, M. Kearns, N. Littlestone, and M. K. Warmuth.
Equivalence of models for polynomial learnability.
Information and Computation, 95(2), pp 129-161, 1991.

A. T. Kalai, A. R. Klivans, Y. Mansour, and R. A. Servedio.
Agnostically Learning Halfspaces.
In Proceedings of the Foundations of Computer Science (FOCS), 2006.

D. M. Kane.
The Average Sensitivity of an Intersection of Half Spaces.
In Proceedings of STOC, pp. 437-440, 2014.

A. T. Kalai, Y. Mansour, and E. Verbin.
On agnostic boosting and parity learning.
In Proceedings of the fortieth annual ACM symposium on Theory of computing, pp. 629-638, 2008.

M. Kearns and M. Li.
Learning in the presence of malicious errors.
SIAM Journal on Computing, 22(4), 807-837, 1993

M. J. Kearns and R. E. Schapire.
Efficient distribution-free learning of probabilistic concepts.
In Foundations of Computer Science, 1990. Proceedings., 31st Annual Symposium on, pp. 382-391, 1990.

M. J. Kearns, R. E. Schapire, and L. M. Sellie.
Toward efficient agnostic learning.
Mach. Learn., 17(2-3), 1994.

A. R. Klivans, P. M. Long, & R. A. Servedio.
Learning halfspaces with malicious noise.
The Journal of Machine Learning Research, 10, 2715-2740, 2009.

A. R. Klivans and R. A. Servedio.
Toward attribute efficient learning of decision lists and parities.
The Journal of Machine Learning Research, 7, 587-602, 2006.

A. R. Klivans and A. A. Sherstov.
Cryptographic hardness for learning intersections of halfspaces.
In Proc. of FOCS, pp 553-562, 2006.

N. Littlestone.
Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm.
Machine learning, 2(4), 285-318, 1988.

L. Pitt and L. G. Valiant
Computational limitations on learning from examples.
Journal of the ACM (JACM), 35(4), pp. 965-984, 1988.

J. R. Quinlan.
Induction of decision trees.
Machine learning, 1(1), 81-106, 1986.

J. R. Quinlan and R. L. Rivest.
Inferring decision trees using the minimum description length principle.
Information and computation, 80(3), 227-248, 1989.

F. Rosenblatt.
Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms.
Spartan, 1962.

R. E. Schapire.
The strength of weak learnability
Machine learning, 5(2), 197-227, 1990.

L.G. Valiant.
Deductive learning.
Philosophical Transactions of the Royal Society of London A, 312:441-446, 1984.

L.G. Valiant
A Theory of the Learnable
Communications of the ACM 27, no. 11, pp. 1134-1142, 1984.

L.G. Valiant.
Learning disjunctions of conjunctions.
In Proc. of 9th IJCAI, pp. 560-566, 1985.

V. N. Vapnik and A. Y. Chervonenkis.
On the uniform convergence of relative frequencies of events to thier probabilities.
Theory of Probability and its Applications, 16(2):264-280, 1971.

S. Vempala.
A random-sampling-based algorithm for learning intersections of halfspaces.
Journal of the ACM (JACM), 57(6), 32, 2010.

S. Vempala. Learning convex concepts from Gaussian distributions with PCA. FOCS 2010.

Online learning

J. Abernethy, E. Hazan, and A. Rakhlin.
Competing in the dark: An efficient algorithm for bandit linear optimization.
In Proceedings of COLT, 2008.

S. Arora, E. Hazan, and S. Kale.
The Multiplicative Weights Update Method: a Meta-Algorithm and Applications.
Theory of Computing, 8(1), 121-164, 2012.

P. Auer.
Using confidence bounds for exploitation-exploration trade-offs.
Journal of Machine Learning Research, vol. 3, pp. 397–422, 2002.

P. Auer, N. Cesa-Bianchi, and P. Fischer
Finite-time analysis of the multiarmed bandit problem.
Machine Learning Journal, vol. 47, no. 2–3, pp. 235–256, 2002.

P. Auer, N. Cesa-Bianchi, Y. Freund, and R. Schapire.
The non-stochastic multi-armed bandit problem.
SIAM Journal on Computing, vol. 32, no. 1, pp. 48–77, 2002.

N. Cesa-Bianchi, Y. Freund, D. Haussler, D. P. Helmbold, R. E. Schapire, and M. K. Warmuth.
How to use expert advice.
Journal of the ACM (JACM), 44(3), 427-485, 1997.

A. Flaxman, A. Kalai, and B. McMahan
Online convex optimization in the bandit setting: Gradient descent without a gradient
In Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), 2005.

Y. Freund and R. E. Schapire.
Game theory, on-line prediction and boosting.
In Proceedings of the ninth annual conference on Computational learning theory, pp. 325-332, 1996.

C. Gentile.
The robustness of the p-norm algorithms.
Machine Learning, 53(3), 265-299, 2003.

E. Hazan, A. Agarwal, and S. Kale.
Logarithmic regret algorithms for online convex optimization.
Machine Learning, 69(2-3), 169-192, 2007.

A. Kalai and S. Vempala.
Efficient algorithms for online decision problems.
Journal of Computer and System Sciences, 71(3):291–307, 2005.

N. Littlestone and M. K. Warmuth.
The weighted majority algorithm.
Information and computation, 108(2), 212-261, 1994.

M. Zinkevich.
Online convex programming and generalized infinitesimal gradient ascent.
In Proc. of ICML, pp. 928-936, 2003.

Statistical learning

J. A. Aslam and S. E. Decatur.
General bounds on statistical query learning and PAC learning with noise via hypothesis boosting.
Information and Computation, 141(2), 85-118, 1998.

P. Awasthi, M. F. Balcan, and P. M. Long.
The power of localization for efficiently learning linear separators with noise.
In Proceedings of STOC, pp. 449-458, 2014.

A. Blum, A. Frieze, R. Kannan, and S. Vempala.
A polynomial-time algorithm for learning noisy linear threshold functions.
Algorithmica, 22(1-2), 35-52, 1998.

A. Blum, M. Furst, J. Jackson, M. Kearns, Y. Mansour, and S. Rudich.
Weakly learning DNF and characterizing statistical query learning using Fourier analysis.
In Proceedings of STOC, pages 253–262, 1994.

A. Blum, A. Kalai, and H. Wasserman.
Noise-tolerant learning, the parity problem, and the statistical query model.
Journal of the ACM (JACM), 50(4), 506-519, 2003.

V. Feldman, E. Grigorescu, L. Reyzin, S. Vempala, and Y. Xiao.
Statistical algorithms and a lower bound for detecting planted cliques.
In Proceedings of the forty-fifth annual ACM symposium on Theory of computing, pp. 655-664, 2013.

V. Feldman and V. Kanade.
Computational Bounds on Statistical Query Learning.
In Proc. of COLT, pp. 16-1, 2012.

V. Feldman, W. Perkins, and S. Vempala.
On the complexity of random satisfiability problems with planted solutions.
In Proc. of STOC, 2015.

M. Kearns.
Efficient noise-tolerant learning from statistical queries.
Journal of the ACM, 45(6):983–1006, 1998.

Cortical learning

V. Feldman and L. G. Valiant.
Experience-induced neural circuits that achieve high capacity.
Neural computation, 21(10), 2715-2754, 2009

C. H. Papadimitriou and S. Vempala.
Unsupervised Learning through Prediction in a Model of Cortex.
CoRR abs/1412.7955, 2014.

L. G. Valiant.
Circuits of the mind.
Oxford University Press, 1995.

L. G. Valiant.
A neuroidal architecture for cognitive computation.
J. ACM, 47(5):854–882, 2000.

L. G. Valiant.
Memorization and association on a realistic neural model.
Neural Computation, 17(3):527–555, 2005

L. G. Valiant.
A quantitative theory of neural computation.
Biological Cybernetics, 95(3):205–211, 2006.

Leslie G. Valiant.
The hippocampus as a stable memory allocator for cortex.
Neural Computation, 24(11):2873–2899, 2012.

Deep learning

S. Arora, A. Bhaskara, R. Ge, and T. Ma.
Provable bounds for learning some deep representations.
arXiv preprint arXiv:1310.6343, 2013.

Y. Bengio, P. Lamblin, D. Popovici, H. Larochelle.
Greedy layer-wise training of deep networks.
In Proc. of NIPS, 2007.

G. E. Hinton, S. Osindero, and Y.-W. Teh.
A fast learning algorithm for deep belief nets.
Neural Comput., 18(7):1527–1554, 2006.

C. Poultney, S. Chopra, and Y. Lecun.
Efficient learning of sparse representations with an energy-based model.
In Proc. of NIPS 2006, 2006.

Ziv Bar-Yossef, T. S. Jayram, Ravi Kumar, and D. Sivakumar.
An information statistics approach to data stream and communication complexity.
J. Comput. Syst. Sci., 68(4):702-732, 2004.
[ link ]

Noga Alon, Yossi Matias, and Mario Szegedy.
The space complexity of approximating the frequency moments.
J. Comput. Syst. Sci., 58(1):137-147, 1999.
[ link ]

Piotr Indyk and David P. Woodruff.
Optimal approximations of the frequency moments of data streams.
In STOC, pages 202-208, 2005.
[ link ]

Christos Boutsidis, Michael W. Mahoney, and Petros Drineas.
An improved approximation algorithm for the column subset selection
problem.
In SODA, pages 968-977, 2009.
[ link ]

Kenneth L. Clarkson and David P. Woodruff.
Numerical linear algebra in the streaming model.
In STOC, pages 205-214, 2009.
[ link ]

A. Frieze and R. Kannan.
A new approach to the planted clique problem.
In Proc. of FST & TCS, 2008.

N. Alon, M. Krivelevich, and B. Sudakov.
Finding a large hidden clique in a random graph.
Random Structures and Algorithms, 13:457-466, 1998.

R. Karp.
The probabilistic analysis of some combinatorial search algorithms.
In Algorithms and Complexity: New Directions and Recent
Results, pages 1-19. Academic Press, 1976.

L. Kucera.
Expected complexity of graph partitioning problems.
Discrete Applied Mathematics, 57:193-212, 1995.

U. Feige and R. Krauthgamer.
Finding and certifying a large hidden clique in a semirandom graph.
Random Structures and Algorithms, 16(2):195-208, 2000.

F. McSherry.
Spectral partitioning of random graphs.
In FOCS, pages 529-537, 2001.

M. Jerrum.
Large cliques elude the metropolis process.
Random Structures and Algorithms, 3(4):347-360, 1992.

V. H. Vu.
Spectral norm of random matrices.
In Proc. of STOC, pages 423-430, 2005.

Z. Füredi and J. Komlós.
The eigenvalues of random symmetric matrices.
Combinatorica, 1(3):233-241, 1981.

M. Rudelson.
Random vectors in the isotropic position.
Journal of Functional Analysis, 164:60-72, 1999.

M. Rudelson and R. Vershynin.
Sampling from large matrices: An approach through geometric
functional analysis.
Journal of the ACM, 54(4), 2007.

A.P. Dempster, N.M. Laird, and D.B. Rubin.
Maximum likelihood from incomplete data via the em algorithm.
JRSS B, 39:1-38, 1977.

M. Kearns and M. Li.
Learning in the presence of malicious errors.
SIAM Journal on Computing, 22(4):807-837, 1993.

J. B. MacQueen.
Some methods for classification and analysis of multivariate
observations.
In Proceedings of 5-th Berkeley Symposium on Mathematical
Statistics and Probability, volume 1, pages 281-297, 1967.

S. C. Brubaker and S. Vempala.
Random tensors and planted cliques.
In Proc. of RANDOM, 2009.

J. Feldman, R. A. Servedio, and R. O’Donnell.
Pac learning axis-aligned mixtures of gaussians with no separation
assumption.
In Proc. of COLT, pages 20-34, 2006.

L. Lovász and S. Vempala.
The geometry of logconcave functions and sampling algorithms.
Random Structures and Algorithms, 30(3):307-358, 2007.

G.W. Stewart and Ji guang Sun.
Matrix Perturbation Theory.
Academic Press, Inc., 1990.

F. De la Torre and M.J. Black.
A framework for robust subspace learning.
International Journal on Computer Vision, 54:117-142, 2004.

R. O. Duda, P.E. Hart, and D.G. Stork.
Pattern Classification.
John Wiley & Sons, 2001.

K. Fukunaga.
Introduction to Statistical Pattern Recognition.
Academic Press, 1990.

C. Papadimitriou, P. Raghavan, H. Tamaki, and S. Vempala.
Latent semantic indexing: A probabilistic analysis.
In Proc. of PODS, 1998.

M.W. Berry, S.T. Dumais, and G.W. O’Brien.
Using linear algebra for intelligent information retrieval.
SIAM Review, 37(4):573-595, 1995.

S.T. Dumais, G.W. Furnas, T.K. Landauer, and S. Deerwester.
Using latent semantic analysis to improve information retrieval.
In Proc. of CHI, pages 281-285, 1988.

K. Pearson.
Contributions to the mathematical theory of evolution.
Philosophical Transactions of the Royal Society, London, A., page 71, 1894.

K. Pearson.
On lines and planes of closest fit to systems of points in space.
Philosophical Magazine, 2(6):559-572, 1901.

G. Golub and H.A. van der Vorst.
Eigenvalue computation in the 20th century.
Journal of Computational and Applied Mathematics, 123:35-65,
2000.

J.G.F. Francis.
The qr transformation: a unitary analogue to the lr transformation.
Computer Journal, 4(1-2):265-272,332-345, 1961.

F. Lust-Piquard.
Inégalites de khinchin dans c_p (1<p<).
C.R. Acad. Sci., Paris, 303:289-292, 1986.

V.N. Kublanovskaya.
On some algorithms for the solution of the complete eigenvalue
problem.
USSR Computational Mathematics and Mathematical Physics,
3:637-657, 1961.

T. Hawkins.
Cauchy and the spectral theory of matrices.
Historia Mathematica, 2:1-20, 1975.

M. Kline.
Mathematical thought from ancient to modern times.
Oxford University Press, 1971.

R. Karp.
Reducibility among combinatorial problems.
In Complexity of Computer Computation, pages 85-103. Plenum
Press, 1972.

M.X. Goemans and D.P. Willamson.
Improved approximation algorithms for maximum cut and satisfiability
problems using semidefinite programming.
Journal of the ACM, 42(6):1115-1145, 1995.

S. Khot, G. Kindler, E. Mossel, and R. O’Donnell.
Optimal inapproximability results for max-cut and other 2-variable
csps?
SIAM Journal on Computing, 37(1):319-357, 2007.

S. Khot.
On the power of unique 2-prover 1-round games.
In Proc. of IEEE Conference on Computational Complexity, 2002.

M. Turk and A. Pentland.
Faces recognition using eigenfaces.
In Proc. of CVPR, pages 586-591, 1991.

Gilbert Strang.
Linear Algebra and Its Applications.
Brooks Cole, 1988.

G. H. Golub and C. F. van Loan.
Matrix Computations.
Johns Hopkins University Press, 3rd edition, 1996.

J.H. Wilkinson.
The algebraic eigenvalue problem (paperback ed.).
Clarendon Press, Oxford, 1988.

Daniel Aloise, Amit Deshpande, Pierre Hansen, and Preyas Popat.
Np-hardness of euclidean sum-of-squares clustering.
Machine Learning, 75(2):245-248, 2009.

R. Bhatia.
Matrix Analysis.
Springer, 1997.

R. Bhatia.
Matrix factorizations and their perturbations.
Linear Algebra and its applications, 197, 198:245-276, 1994.

Y. Azar, A. Fiat, A. Karlin, and F. McSherry.
Spectral analysis of data.
In Proc. of STOC, pages 619-626, 2001.

P. Drineas, A. Frieze, R. Kannan, S. Vempala, and V. Vinay.
Clustering large graphs via the singular value decomposition.
Machine Learning, 56:9-33, 2004.

M. R. Garey and David S. Johnson.
Computers and Intractability: A Guide to the Theory of
NP-Completeness.
W. H. Freeman, 1979.

F. T. Leighton and S. Rao.
Multicommodity max-flow min-cut theorems and their use in designing
approximation algorithms.
J. ACM, 46(6):787-832, 1999.

Sanjeev Arora, Satish Rao, and Umesh Vazirani.
Expander flows, geometric embeddings and graph partitioning.
In STOC ’04: Proceedings of the thirty-sixth annual ACM
symposium on Theory of computing, pages 222-231, 2004.

N. Megiddo and A. Tamir.
On the complexity of locating facilities in the plane.
Operations Research Letters, I:194-197, 1982.

R. Boppana.
Eigenvalues and graph bisection: An average-case analysis.
Proceedings of the 28th IEEE Symposium on Foundations of
Computer Science, pages 280-285, 1987.

O. Goldreich, S. Goldwasser, and D. Ron.
Property testing and its connection to learning and approximation.
Journal of the ACM, 5(4):653-750, 1998.

A. Frieze and R. Kannan.
Quick approximation to matrices and applications.
Combinatorica, 19(2):175-200, 1999.

W. Fernandez de-la Vega.
MAX-CUT has a Randomized Approximation Scheme in Dense Graphs.
Random Structures and Algorithms, 8:187-199, 1996.

A. Frieze and R. Kannan.
The regularity lemma and approximation schemes for dense problems.
Proceedings of the 37th Annual IEEE Symposium on Foundations of
Computing, pages 12-20, 1996.

P. Drineas, I. Kerenidis, and P. Raghavan.
Competitive Recommendation Systems.
Proceedings of the 34th Annual ACM Symposium on Theory of
Computing, pages 82-90, 2002.

S. Arora, D. Karger, and M. Karpinski.
Polynomial time approximation schemes for dense instances of np-hard
problems.
Proceedings of the 27th Annual ACM Symposium on Theory of
Computing, pages 284-293, 1995.

N. Alon, W.F. DeLaVega, R. Kannan, and M. Karpinski.
Random sub-problems of max-snp problems.
Proceedings of the 34th Annual ACM Symposium on Theory on
Computing, pages 668-677, 2002.

J. Shi and J. Malik.
Normalized cuts and image segmentation.
IEEE Transactions on Pattern Analysis and Machine Intelligence,
22(8):888-905, 2000.

A. Sinclair and M. Jerrum.
Approximate counting, uniform generation and rapidly mixing markov
chains.
Information and Computation, 82:93-133, 1989.

Alan Frieze, Ravi Kannan, and Santosh Vempala.
Fast monte-carlo algorithms for finding low-rank approximations.
J. ACM, 51(6):1025-1041, 2004.

W. Fernandez de la Vega, Marek Karpinski, Ravi Kannan, and Santosh Vempala.
Tensor decomposition and approximation schemes for constraint
satisfaction problems.
In STOC ’05: Proceedings of the thirty-seventh annual ACM
symposium on Theory of computing, pages 747-754, 2005.

Amit Deshpande, Luis Rademacher, Santosh Vempala, and Grant Wang.
Matrix approximation and projective clustering via volume sampling.
Theory of Computing, 2(1):225-247, 2006.

Amit Deshpande and Santosh Vempala.
Adaptive sampling and fast low-rank matrix approximation.
In APPROX-RANDOM, pages 292-303, 2006.

Petros Drineas and Ravi Kannan.
Fast monte-carlo algorithms for approximate matrix multiplication.
In In Proceedings of the 42nd Annual IEEE Symposium on
Foundations of Computer Science, pages 452-459, 2001.

A. Frieze, R. Kannan, and S. Vempala.
Fast monte-carlo algorithms for finding low-rank approximations.
In Proc. of FOCS, pages 370-378, 1998.

Petros Drineas and Ravi Kannan.
Pass efficient algorithms for approximating large matrices.
In SODA ’03: Proceedings of the fourteenth annual ACM-SIAM
symposium on Discrete algorithms, pages 223-232, 2003.

P. Drineas, R. Kannan, and M. Mahoney.
Fast monte carlo algorithms for matrices II: Computing a low-rank
approximation to a matrix.
SIAM J. on Computing, 36:132-157, 2006.

P. Drineas, R. Kannan, and M. Mahoney.
Fast monte carlo algorithms for matrices II: Computing a low-rank
approximation to a matrix.
SIAM J. on Computing, 36:158-183, 2006.

P. Drineas, R. Kannan, and M. Mahoney.
Fast monte carlo algorithms for matrices ii: Computing a low-rank
approximation to a matrix.
SIAM J. on Computing, 36:184-206, 2006.

W. Fernandez de la Vega and M. Karpinski.
Polynomial time approximation of dense weighted instances of max-cut.
Random Structures and Algorithms, 16:314-332, 2000.

W. Fernandez de la Vega and C. Kenyon.
A randomized approximation scheme for metric max-cut.
J. Computer and System Sciences, 63:531-541, 2001.

W. Fernandez de la Vega, M. Karpinski, and C. Kenyon.
Approximation schemes for metric bisection and partitioning.
In Proc. 15th ACM-SIAM SODA, pages 499-508, 2004.

Daniel A. Spielman and Shang-Hua Teng.
Spectral partitioning works: Planar graphs and finite element meshes.
Linear Algebra and its Applications, 421(2-3):284 – 305, 2007.

Dimitris Achlioptas and Frank McSherry.
Fast computation of low-rank matrix approximations.
J. ACM, 54(2), 2007.

Jonathan A. Kelner.
Spectral partitioning, eigenvalue bounds, and circle packings for
graphs of bounded genus.
SIAM J. Comput., 35(4):882-902, 2006.

Machine Learning Theory