Online course: New MOOC on Programming for High Performance

I’m posting here a message forwarded to me by the NSF-CSSI PI list that I think might be of interest to others at CIG… this is an online, learn-at-your-own-pace course, requiring about 6 hours per week for 5-6 weeks.

[Forwarded on behalf of Robert Van De Geijn <>]

Title: New MOOC on Programming for High Performance

We are excited to announce that edX has opened registration for

“Programming for High Performance” [1]. This free-to-audit,

four-week, self-paced course developed by UT-Austin faculty Robert van

de Geijn, Maggie Myers, and Devangi Parikh starts on June 4, 2019.

This material may be of particular interest to the SI2 community:

The techniques that are discussed are key to almost two decades

of our NSF-sponsored research. They underlie our BLAS-like

Library Instantiation Software (BLIS) [2], which is now the open source

BLAS library of choice for CPUs, was partially funded by two SSI grants,

and has been embraced by industry. BLIS, in our opinion, is an example

of extremely well-structured and high performing software with an effective

development community that is worthy of studying by those who build

software infrastructure for scientific computing.

The course uses the simple but important example of matrix-matrix

multiplication to illustrate fundamental techniques for attaining

high-performance on modern CPUs. A carefully designed sequence of

exercises leads the learner from a naive implementation to one that

effectively utilizes instruction level parallelism and culminates in a

high-performance, multithreaded implementation. Along the way, it is

discovered that careful attention to data movement is key to efficient

computing. In other words, learners are exposed to techniques for

attaining high performance through carefully scaffolded exercises that

illustrate how BLIS implements dgemm, which is itself based on Goto’s

algorithm [3].

We believe this course is appropriate for a novice yet of interest to an

expert. It may be, for example, a great way to get a summer intern

quite literally up to speed. Others may want to use it as a component

in a class they teach. Some learners may merely come to the

conclusion that they should be using high-performance libraries.

Others may find they enjoy low level optimization.

Please help us spread the word!

Robert van de Geijn

Maggie Myers

Devangi Parikh


[2] Field G. Van Zee, Robert A. van de Geijn. BLIS: A Framework for

Rapidly Instantiating BLAS Functionality. ACM TOMS, 2015.

[3] Kazushige Goto, Robert A. van de Geijn. Anatomy of

high-performance matrix multiplication. ACM TOMS, 2008.