I’m posting here a message forwarded to me by the NSF-CSSI PI list that I think might be of interest to others at CIG… this is an online, learn-at-your-own-pace course, requiring about 6 hours per week for 5-6 weeks.
[Forwarded on behalf of Robert Van De Geijn <email@example.com>]
Title: New MOOC on Programming for High Performance
We are excited to announce that edX has opened registration for
“Programming for High Performance” . This free-to-audit,
four-week, self-paced course developed by UT-Austin faculty Robert van
de Geijn, Maggie Myers, and Devangi Parikh starts on June 4, 2019.
This material may be of particular interest to the SI2 community:
The techniques that are discussed are key to almost two decades
of our NSF-sponsored research. They underlie our BLAS-like
Library Instantiation Software (BLIS) , which is now the open source
BLAS library of choice for CPUs, was partially funded by two SSI grants,
and has been embraced by industry. BLIS, in our opinion, is an example
of extremely well-structured and high performing software with an effective
development community that is worthy of studying by those who build
software infrastructure for scientific computing.
The course uses the simple but important example of matrix-matrix
multiplication to illustrate fundamental techniques for attaining
high-performance on modern CPUs. A carefully designed sequence of
exercises leads the learner from a naive implementation to one that
effectively utilizes instruction level parallelism and culminates in a
high-performance, multithreaded implementation. Along the way, it is
discovered that careful attention to data movement is key to efficient
computing. In other words, learners are exposed to techniques for
attaining high performance through carefully scaffolded exercises that
illustrate how BLIS implements dgemm, which is itself based on Goto’s
We believe this course is appropriate for a novice yet of interest to an
expert. It may be, for example, a great way to get a summer intern
quite literally up to speed. Others may want to use it as a component
in a class they teach. Some learners may merely come to the
conclusion that they should be using high-performance libraries.
Others may find they enjoy low level optimization.
Please help us spread the word!
Robert van de Geijn
 Field G. Van Zee, Robert A. van de Geijn. BLIS: A Framework for
Rapidly Instantiating BLAS Functionality. ACM TOMS, 2015.
 Kazushige Goto, Robert A. van de Geijn. Anatomy of
high-performance matrix multiplication. ACM TOMS, 2008.