Invited talk 1

Title: Just-in-time compilation: Speeding up small linear algebra operations
Presenter: Sarah Knepper (Intel, USA)
Abstract: Small linear algebra operations often form building blocks for modern scientific applications ranging from machine learning to finite element methods to high performance computing. Due to the combinatorial explosion of all possible sizes and other parameters, it is not feasible for a library to contain specific pre-compiled codepaths for every combination of parameters. However, just-in-time compilation can be utilized to build the needed kernels at run-time, relying on pre-existing generators. The cost of the run-time generation can be amortized across many utilizations of the kernels, resulting in significant performance improvements. This talk will explore the use of just-in-time compilation to speed up small linear algebra operations on modern architectures.

Biography: Sarah Knepper is a Software Engineer at Intel, with seven years of experience optimizing dense linear algebra computations for the Intel® Math Kernel Library. She holds a B.A. in computer science from the College of Saint Benedict and a Ph.D. in computer science from Emory University. Her background and research interests include numerical linear algebra, autonomous driving, machine learning, high performance and scientific computing, and numerical solutions to discrete ill-posed problems in image processing.

Invited talk 2

Title: High-Performance GEMM on GPUs is Like a 3-Year Old
Presenter: David E. Tanner (AMD, USA)
Abstract: Messy, complicated and demanding. Enormous difficult-to-realize potential. Programming GPUs is like parenting a 3-year old. The Tensile project seeks to fully automate achieving peak-performance for all GEMMs, tensor contractions and convolutions; for all precisions, transposes, sizes and strides; and for all GPUs. Doing so requires addressing a variety of obstacles such as problem types, GPU performance bottlenecks, skinny matrices, small matrices, slow transposes, edge cases and emitting source and assembly kernels, auto-tuning kernels per problem size and auto-generating host library code. Tensile addresses these issues to achieve high-performing GEMMs and tensor contractions across a wide range of traditionally difficult problem sizes.

Biography: David Tanner has a BS in Physics from Brigham Young University where he studied relativistic physics. He earned his PhD in Biophysics and Computational Biology from the University of Illinois at Urbana-Champaign where he was both a developer and researcher of molecular dynamics on supercomputers. He worked at AMD for 5 years accelerating applications on GPUs, including networking encryption/decryption, 4K video decoding, optical character recognition and, most importantly, matrix-matrix multiplication. David and his wife have 4 children ranging 2-11 years old, and enjoy cycling in Austin, TX, USA.