Lecture Notes

This section contains documents that could not be made accessible to screen reader software. A "#" symbol is used to denote such documents.

Selected Lecture Notes from Spring 2003 Class

The course notes below are a work in progress. The lecture numbers do not correspond to the class session numbers. The notes are available as a single file (PDF - 4.3 MB)# or as separate chapters below.

Chapter 1: Introduction (PDF - 1.2 MB)

1.1. The Machines
1.2. The Software
1.3. The Reality of High Performance Computing
1.4. Modern Algorithms
1.5. Compilers
1.6. Scientific Algorithms
1.7. History, State-of-Art, and Perspective

  • 1.7.1. Things that are not Traditional Supercomputers

1.8. Analyzing the Top500 List Using Excel

  • 1.8.1. Importing the XML File
  • 1.8.2. Filtering
  • 1.8.3. Pivot Tables

1.9. Parallel Computing: An Example
1.10. Exercises

Chapter 2: MPI, OpenMP, MATLAB®*P (PDF)

2.1. Programming Style
2.2. Message Passing

  • 2.2.1. Who am I?
  • 2.2.2. Sending and Receiving
  • 2.2.3. Tags and Communicators
  • 2.2.4. Performance, and Tolerance
  • 2.2.5. Who's got the floor?

2.3. More on Message Passing

  • 2.3.1. Nomenclature
  • 2.3.2. The Development of Message Passing
  • 2.3.3. Machine Characteristics
  • 2.3.4. Active Messages

2.4. OpenMP for Shared Memory Parallel Programming
2.5. STARP

Chapter 3: Parallel Prefix (PDF)

3.1. Parallel Prefix
3.2. The "Myth" of lg n
3.3. Applications of Parallel Prefix

  • 3.3.1. Segmented Scan
  • 3.3.2. Csanky's Matrix Inversion
  • 3.3.3. Babbage and Carry Look-Ahead Addition

3.4. Parallel Prefix in MPI

Chapter 4: Dense Linear Algebra (PDF)

4.1. Dense Matrices
4.2. Applications

  • 4.2.1. Uncovering the Structure from Seemingly Unstructured Problems

4.3. Records
4.4. Algorithms, and Mapping Matrices to Processors
4.5. The Memory Hierarchy
4.6. Single Processor Condiderations for Dense Linear Algebra

  • 4.6.1. LAPACK and the BLAS
  • 4.6.2. Reinventing Dense Linear Algebra Optimization

4.7. Parallel Computing Considerations for Dense Linear Algebra
4.8. Better Load Balancing

  • 4.8.1. Problems

Chapter 5: Sparse Linear Algebra (PDF)

5.1. Cyclic Reduction for Structured Sparse Linear Systems
5.2. Sparse Direct Methods

  • 5.2.1. LU Decomposition and Gaussian Elimination
  • 5.2.2. Parallel Factorization: The Multifrontal Algorithm

5.3. Basic Iterative Methods

  • 5.3.1. SuperLU-dist
  • 5.3.2. Jacobi Method
  • 5.3.3. Gauss-Seidel Method
  • 5.3.4. Splitting Matrix Method
  • 5.3.5. Weighted Splitting Matrix Method

5.4. Red-Black Ordering for Parallel Implementation
5.5. Conjugate Gradient Method

  • 5.5.1. Parallel Conjugate Gradient

5.6. Preconditioning
5.7. Symmetric Supernodes

  • 5.7.1. Unsymmetric Supernodes
  • 5.7.2. The Column Elimination Tree
  • 5.7.3. Relaxed Supernodes
  • 5.7.4. Supernodal Numeric Factorization

5.8. Efficient Sparse Matrix Algorithms

  • 5.8.1. Scalable Algorithms
  • 5.8.2. Cholesky Factorization
  • 5.8.3. Distributed Sparse Cholesky and the Model Problem
  • 5.8.4. Parallel Block-Oriented Sparse Cholesky Factorization

5.9. Load Balance with Cyclic Mapping

  • 5.9.1. Empirical Load Balance Results

5.10. Heuristic Remapping
5.11. Scheduling Local Computations

Chapter 6: Parallel Machines (PDF)#

  • 6.0.1. More on Private Versus Shared Addressing
  • 6.0.2. Programming Model
  • 6.0.3. Machine Topology
  • 6.0.4. Homogeneous and Heterogeneous Machines
  • 6.0.5. Distributed Computing on the Internet and Akamai Network

Chapter 7: FFT (PDF)

7.1. FFT

  • 7.1.1. Data Motion
  • 7.1.2. FFT on Parallel Machines
  • 7.1.3. Exercises

7.2. Matrix Multiplication
7.3. Basic Data Communication Operations

Chapter 8: Domain Decomposition (PDF)#

8.1. Geometric Issues

  • 8.1.1. Overlapping vs. Non-overlapping Regions
  • 8.2.2. Geometric Discretization

8.2. Algorithmic Issues

  • 8.2.1. Classical Iterations and their Block Equivalents
  • 8.2.2. Schwarz Approaches: Additive vs. Multiplicative
  • 8.2.3. Substructuring Approaches
  • 8.2.4. Accellerants

8.3. Theoretical Issues
8.4. A Domain Decomposition Assignment: Decomposing MIT

Chapter 9: Particle Methods (PDF)#

9.1. Reduce and Broadcast: A Function Viewpoint
9.2. Particle Methods: An Application
9.3. Outline
9.4. What is N-Body Simulation?
9.5. Examples
9.6. The Basic Algorithm

  • 9.6.1. Finite Difference and the Euler Method

9.7. Methods for Force Calculation

  • 9.7.1. Direct Force Calculation
  • 9.7.2. Potential Based Calculation
  • 9.7.3. Poisson Methods
  • 9.7.4. Hierarchical Methods

9.8. Quadtree (2D) and Octtree (3D) : Data Structures for Canonical Clustering
9.9. Barnes-hut Method (1986)

  • 9.9.1. Approximating Potentials

9.10. Outline
9.11. Introduction
9.12. Multipole Algorithm: An Overview
9.13. Multipole Expansion
9.14. Taylor Expansion
9.15. Operation No.1 - SHIFT
9.16. Operation No.2 - FLIP
9.17. Application on Quad Tree
9.18. Expansion from 2-D to 3-D
9.19. Parallel Implementation

Chapter 10: Partitioning and Load Balancing (PDF - 1.1 MB)

10.1. Motivation from the Parallel Sparse Matrix Vector Multiplication
10.2. Separators
10.3. Spectral Partitioning - One way to Slice a Problem in Half

  • 10.3.1. Electrical Networks
  • 10.3.2. Laplacian of a Graph
  • 10.3.3. Spectral Partitioning

10.4. Geometric Methods

  • 10.4.1. Geometric Graphs
  • 10.4.2. Geometric Partitioning: Algorithm and Geometric Modeling
  • 10.4.3. Other Graphs with Small Separators
  • 10.4.4. Other Geometric Methods
  • 10.4.5. Partitioning Software

10.5. Load-Balancing N-body Simulation for Non-uniform Particles

  • 10.5.1. Hierarchical Methods of Non-uniformly Distributed Particles
  • 10.5.2. The Communication Graph for N-body Simulations
  • 10.5.3. Near-field Graphs
  • 10.5.4. N-body Communication Graphs
  • 10.5.5. Geometric Modeling of N-body Graphs

Chapter 11: Mesh Generation (PDF)

11.1. How to Describe a Domain?
11.2. Types of Meshes
11.3. Refinement Methods

  • 11.3.1. Hierarchical Refinement
  • 11.3.2. Delaunay Triangulation
  • 11.3.2. Delaunay Refinement

11.4. Working With Meshes
11.5. Unsolved Problems

Chapter 12: Support Vector Machines and Singular Value Decomposition (PDF )

12.1. Support Vector Machines

  • 12.1.1. Learning Models
  • 12.1.2. Developing SVMs
  • 12.1.3. Applications

12.2. Singular Value Decomposition

Bibliography (PDF)

Selected Lecture Notes from Spring 2003 Class


LEC # TOPICS
1 Introduction (PDF - 1.3 MB)
2 MPI, MATLAB®*P (TXT)
4 Parallel Prefix (PDF)
5 Parallel Computer Architecture I (PDF)
6 Parallel Computer Architecture II (PDF)
7 Dense Linear Algebra I

(PDF 1) (Courtesy of James Demmel. Used with permission.)
(PDF 2)
(PDF 3 - 1.1 MB)
(PDF 4) (Courtesy of Jack Dongarra, University of Tennessee. Used with permission.)
9 FFT (PDF)
11 Graph Partitioning (PDF)
14 N-Body Problem - Barnes Hut (PDF)#
18 Domain Decomposition (PDF)#
23 Support Vector Machines (PDF 1) (PDF 2)# (Courtesy of Tong Wen. Used with permission.)