# Readings

*Proc. of the IEEE*76 (1988): 869-889.

Though restricted to early vision, it contains an easy-to-read introduction to ill-posed problems and regularization methods.

Girosi, F., M. Jones, and T. Poggio. "Regularization Theory and Neural Network Architectures." *Neural Computation* 7 (1995): 219-269.

A thorough introduction to the connection between learning and Regularization Theory. We will often refer to this paper in this and in the next few classes.

Vapnik, V. *The Nature of Statistical Learning Theory*. Springer, 1995.

Chapter 1 is a readable first-hand introduction to the subject.

**Further Readings:**

Bertero, M. "Regularization Methods for Linear Inverse Problems." In *Inverse Problems*. Edited by G. Talenti. Lecture Notes in Mathematics. Vol. 1225. 1986, pp. 52-112.

Still a very good survey of the subject.

Tikhonov, A. N. and V. Y. Arsenin. *Solutions of Ill-posed Problems*. W. H. Winston, 1977.

Everybody's first book on Regularization Theory.

Vapnik, V. *Statistical Learning Theory*. Wiley, 1998.

Browse the first chapters of this book if you want to go deeper into the foundations of SLT.

*Neural Computation*7 (1995): 219-269.

A thorough introduction to the connection between learning and Regularization Theory. We will often refer to this paper in this and in the next few classes.

Kolmogorov, N., and S.V. Fomine. *Elements of the Theory of Functions and Functional Analysis.* Dover, 1975.

A classic. Though you should be able to follow the class anyway, go through Sec. 5.1, 6.4, and 6.5 of Ch. 2 and Sec. 13.1, 13.2, 13.3, 13.5, 13.6, and 15.1 of Ch. 4 paying particular attention to all what concerns function spaces.

Strang, G. *Calculus.* Wellesley-Cambridge Press, 1991.

Chapter 13 contains an excellent exposition of the Lagrange multipliers technique.

**Further Readings:**

Bertero, M. "Regularization Methods for Linear Inverse Problems" In *Inverse* *Problems.* Edited by G. Talenti. Lecture Notes in Mathematics. Vol. 1225. 1986, pp. 52-112.

Still a very good survey of the subject.

Tikhonov, A. N., and V. Y. Arsenin. *Solutions of Ill-posed Problems*. W. H. Winston, 1977.

Everybody's first book on Regularization Theory.

*Elements of the Theory of Functions and Functional Analysis*. Dover, 1975.

A classic. Though you should be able to follow the class anyway, go through Sec. 5.1, 6.4, and 6.5 of Ch. 2 and Sec. 13.1, 13.2, 13.3, 13.5, 13.6, and 15.1 of Ch. 4 paying particular attention to all what concerns function spaces.

Strang, G. *Introduction to Linear Algebra*. Wellesley-Cambridge Press, 1993.

Chapter 6 contains the matrix algebra used in this class (and more!).

Further Readings:

Aronszajn, N. "Theory of Reproducing Kernels." *Trans. Amer. Math. Soc.* 686 (1950): 337-404.

RKHS the hard way.

Girosi, F. "An Equivalence Between Sparse Approximation and Support Vector Machines." *Neural Computation* 10 (1998): 1455-1480.

In Appendix A of this paper you find a smooth introduction to RKHS.

Wahba, G. *Spline Models for Observational Data.* SIAM, 1990.

Chapter 1 introduces you to the world of RKHS.

*Neural Computation*7 (1995): 219-269.

A thorough introduction to the connection between Learning and Regularization Theory. Most of this class can be found in this paper.

Strang, G. *Calculus*. Wellesley-Cambridge Press, 1991.

Chapter 13 contains an excellent exposition of the Lagrange multipliers technique.

*Neural Computation*7 (1995): 219-269.

Part of this paper is good for this class too!

**Further Readings:**

Vapnik, V. N. *Estimation of Dependences Based on Empirical Data.* Springer, 1982.

Chapter 9 contains a discussion of the Parzen windows method within the framework of Regularization Theory.

*Neural Networks for Pattern Recognition*. Clarendon, 1995.

Chapters 3 and 4 discuss single and multi-layer perceptrons at length.

Girosi, F., M. Jones, and T. Poggio. "Regularization Theory and Neural Network Architectures." *Neural Computation* 7 (1995): 219-269.

Once more a very good source of information about connections between different approximation techniques.

**Further Readings:**

Hertz, J., A. Krogh, and R. G. Palmer. *Introduction to the Theory of Neural Computation.* Addison Wesley, 1991.

A good book on Neural Networks viewed from the physicist perspective.

*Neural Computation*7 (1995): 219-269.

This is really the last time you have to go through it!

**Further Readings:**

Hertz, J., A. Krogh, and R. G. Palmer. *Introduction to the Theory of Neural Computation*. Addison Wesley, 1991.

A good book on Neural Networks viewed from the physicist perspective.

*The Nature of Statistical Learning Theory*. Springer, 1995.

Chapter 1 is a readable first-hand introduction to the subject.

**Further Readings:**

Vapnik, V. *Statistical Learning Theory.* Wiley, 1998.

Browse the first chapters of this book if you want to go deeper into the foundations of SLT.

*Statistical Learning Theory.*Wiley, 1998.

Chapter 3 contains all the material covered in this class (and much more!). Several parts of Chapter 2 give you the perspective behind the theory, but if you want to appreciate the difference between stating a result and proving it, browse chapter 14...

*Statistical Learning Theory.*Wiley, 1998.

Chapter 4 contains all the material covered in this class (and much more!)

*Symposium on Foundation of Computer Science*(1993).

This paper gives the necessary and sufficient conditions for distribution independent uniform convergence for real valued functions.

Evgeniou, T., M. Pontil, and T. Poggio. "Regularization Networks and Support Vector Machines." *Advances in Computational Mathematics* 13 (2000): 1-50.

Most of this class can be found in this paper.

Vapnik, V. *Statistical Learning Theory*. Wiley, 1998.

Chapters 5 and 6 tell you most but not the whole story about the results discussed in this class.

*Calculus.*Wellesley-Cambridge Press, 1991.

Chapter 13 contains an excellent exposition of the Lagrange multipliers technique.

Vapnik, V. *Statistical Learning Theory*. Wiley, 1998.

This class will cover part of chapter 10. You may want to go through chapter 8 for putting SVMs in perspective with respect to other techniques.

*Advances in Computational Mathematics*13 (2000): 1-50.

The discussion on the Bayesian interpretation of RN and SVM can be found in this paper.

Girosi, F. "An Equivalence between Sparse Approximation and Support Vector Machines." *Neural Computation* 10 (1998): 1455-1480.

This is the paper in which the relation between SVM and BPD is studied.

Vapnik, V. *Statistical Learning Theory*. Wiley, 1998.

This class will cover part of chapters 11 and 13.

**Further Readings:**

Chen, S., D. Donoho, and M. Saunders. "Atomic Decomposition by Basis Pursuit." Tech Rep 479. Dept. of Statistics. Stanford University. 1995.

Daubechies, I. "Time Frequency Localization Operators: a Geometric Phase Space Approach." *IEEE Trans. on Information Theory* 34 (1988): 605-612.

Mallat, S., and S. Zhang. "Matching Pursuits with Time-Frequency Dictionaries." *IEEE. Trans. on Signal Proc.* 41 (1993): 3397-3415.

Pontil, M., S. Mukherjee, and F. Girosi. "On the Noise Model of Support Vector Machine Regression." CBCL Paper #168, AI Memo #1651, Massachusetts Institute of Technology, Cambridge, MA (1998).

*Support Vector Machines and Other Kernel-based Learning Methods.*Cambridge, 2000.

Chapter 3 of this book covers kernels in depth.

Vapnik, V. *Statistical Learning Theory*. Wiley, 1998.

You'll find kernels and ideas on kernels throughout chapters 10, 11 and 12.

**Further Readings:**

Berg, C., J. P. R. Christensen, and P. Ressel. "Harmonic Analysis on Semigroups." Springer Verlag.

The title is intimidating, but chapter 3 is easy to read and contains a lucid introduction to positive definite functions.

Jaakkola, T., and D. Haussler. "Exploiting Generative Models in Discriminative Classifiers." *NIPS* (1998).

Niyogi, P., T. Poggio, and F. Girosi. "Incorporating Prior Information in Machine Learning by Creating Virtual Examples." *IEEE Proceedings on Intelligent Signal Processing* 86 (1998): 2196-2209.

Logothetis, N. K., T. Vetter, A. Hulbert, and T. Poggio. "View-Based Models of 3D Object Recognition and Class-Specific Invariances." AI Memo 1473, CBCL Paper 94 (1994).

Riesenhuber, M., and T. Poggio. "Hierarchical Models of Object Recognition in Cortex." *Nature Neuroscience* 2 (1999): 1019-1025.

Not easy but very compact and rigorous introduction to the subject (Chapters 1, 5 and 8-10 in particular).

Niyogi, P., and F. Girosi. "On the Relationship between Generalization Error, Hypothesis Complexity, and Sample Complexity for Radial Basis Functions." *Neural Computation* 8 (1996): 819-842.

Here you find the material for the discussion on the various types of error.

*Nonlinear Programming, Theory and Techniques.*John Wiley & Sons, 1993.

A textbook on Optimization Theory.

**Is the SVM solution unique?**

Burges, and Crisp. *Uniqueness of the SVM Solution* *NIPS* 12 (1999).

**The Decomposition Method for SVMs:
**Osuna, Edgar.

*Support Vector Machines: Training And Applications.*Ph.D. Thesis (1998).

**Optimizing over 2 variables at a time:**

Platt, John C. "Sequential Minimal Optimization: A Fast Algorithm For Training Support Vector Machines." Microsoft Research MST-TR-98-14 (1998).

**Analysis of the Decomposition Method:**

Chang, Chih-Chung, Chih-Wei Hsu, and Chih-Jen Lin. "The Analysis of Decomposition Methods For Support Vector Machines." *Proceedings of IJCAI99*, SVM workshop (1999).

Keerthi, S. S., and E. G. Gilbert. *Convergence of a Generalized SMO Algorithm For SVM Classifier Design Control Division*. Dept. of Mechanical and Production Engineering, National University of Singapore CD-00-01 (2000).

Keerthi, S. S., S. K. Shevade, C. Bhattacharyya, and K. R. K. Murthy. *Improvements to Platt's SMO Algorithm for SVM Classifier Design Control Division.* Dept. of Mechanical and Production Engineering, National University of Singapore CD-99-14 (1999).

**Sparsity Control:**

Osuna, Freund, and Girosi. "Reducing Run-time Complexity in SVMs." *Proceedings of the 14th Int'l Conference on Pattern Recognition.*

*Machine Learning*26 (1996): 123-140.

Schapire, R. E., Y. Freund, P. Bartlett, and W. S. Lee. "Boosting the Margin: A New Explanation for the Effectiveness of Voting Methods." *The Annals of Statistics* 26 (1998): 1651-1686.

A readable introduction to wavelets.

Daubechies, I. "Ten Lectures on Wavelets." *CBMS-NSF Regional Conferences Series in Applied Mathematics*, SIAM, Philadelphia PA (1992).

More advanced but it also contains the basic theoretical results on frames.