The iWAPT 2011 will be held in conjunction with ICCS 2011 Conference.

Conference Overview

Recent advances in computer architecture and computing systems (such as, multicore processors and hybrid systems with accelerators) have made it very challenging for programmers to reach the performance potential of current systems. In addition, recent advances in numerical algorithms and software optimizations have tremendously increased the number of alternatives for solving a problem, which further complicates the software tuning process.

These issues have led researchers to the idea of software systems that can automate the performance tuning process by running a large set of empirical evaluations to configure applications and libraries on the targeted computing platform.

The Sixth international Workshop on Automatic Performance Tuning (iWAPT2011) is an International workshop that provides opportunities for researchers and practitioners in all fields related to automatic performance tuning to exchange ideas and experiences on algorithms, libraries, and applications tuned for recent computing platforms. This workshop will consist of a few invited speaker presentations from leading researchers in academia or industry, and several presentations of peer-reviewed papers that report the latest results in auto-tuning research.

Topics of Interest

To be announced.

Papers are solicited in following areas of automatic performance tuning including, but not limited to:

In addition to normal technical papers, please consider submitting "position paper" on any of the topics. For example, a position paper could include your thoughts on future auto-tuning features you consider important, drawbacks of current application tuning techniques or tools, or constructive suggestions how to improve state-of-the-art application tuning methods. The maximum length of position paper is same as technical paper.


Regular program here.

Invited Speakers

"Towards Automating Black Belt Programming"
Dr. Franz Franchetti (Carnegie Mellon University, USA)

Presentation Slide PDF

Only a select few performance programmers in major processor vendors' software divisions, universities, national laboratories, and specialized companies have the skill set to achieve high efficiency on today's processors. Due to the labor-intensive nature of the work, the fast rate of newly arriving platforms, and the fact that performance tuning is more a black art than science, only the most important fundamental computation functions can be fully optimized. Today we are farther away than ever from John Backus' design criterion for the first Fortran compiler to automatically achieve close-to-human performance. Automatic performance tuning has bridged this gap for a few well-understood computational kernels like matrix multiplication, fast Fourier transform, and sparse matrix-vector multiplication, where systems like ATLAS, SPIRAL, FFTW, and OSKI showed that it is possible for automatic systems to compete with code that has been hand-tuned by "black belt programmers", which extracted the full performance potential from the target machines.

In this talk we investigate how black belt performance levels can be obtained automatically for irregular kernels with highly data-dependent control flow that operate on dynamic data structures, which makes the usual optimization methods impossible to apply. We focus on three illustrative examples: The first example is evaluating a logical equation in conjugate normal form, which is expressed as reduction across nested linked lists. The second example is the evolution of an interface surface inside a volume (e.g., a shock wave of an explosion), which translates into stencil operations on a contiguous sparse subset of pixels of a dense regular grid. The third example is a Monte Carlo simulation-based probabilistic power flow computation for distribution networks. In all cases we achieve a high fraction of machine peak, at the cost of applying aggressive optimization techniques that so far are beyond the capabilities of automatic tools. Autotuning and program generation played a major role in obtaining the final optimized implementation, as the necessary optimization techniques have a large parameter space and require extensive code specialization. In all cases the original code consisted of a few lines of C code, while the final code comprises hundreds to thousands of lines of architecture-specific SIMD intrinsic code and OpenMP or PThreads parallelization with custom synchronization. We extract lessons from the three examples on how to design future autotuning and program generation systems that will automate the optimization of such kernels.

Short Bio:

Franz Franchetti is an Assistant Research Professor with the Department of Electrical and Computer Engineering at Carnegie Mellon University. He received the Dipl.-Ing. (M.Sc.) degree in Technical Mathematics and the Dr. techn. (Ph.D.) degree in Computational Mathematics from the Vienna University of Technology in 2000 and 2003, respectively. In 2006 he was member of the team winning the Gordon Bell Prize (Peak Performance Award) and in 2010 he was member of the team winning the HPC Challenge Class II Award (most productive system). Dr. Franchetti's research focuses on automatic performance tuning and program generation for emerging parallel platforms, including multicore CPUs, clusters and high-performance systems (HPC), graphics processors (GPUs), field programmable gate arrays (FPGAs), and FPGA-acceleration for CPUs. As member of the Spiral research team (, his research goal is to enable automatic generation of highly optimized software libraries for important kernel functionality. Moreover, he is investigating the applicability of these techniques across domains: signal- and image processing, numerical linear algebra, communication and compression, and most recently the stability of power grids. In other collaborative research threads Dr. Franchetti is investigating the applicability of domain-specific transformations within standard compilers, and hardware/software co-design based on high-level hardware and algorithm descriptions, as well as the possibility of application-specific logic within memory. Dr. Franchetti is Thrust Leader of the Security Thrust in Carnegie Mellon's SRC Smart Grid Research Center and co-founder of SpiralGen, a Pittsburgh, PA company commercializing the technology developed in the Spiral project.

"The Future of Auto-Tuning"
Dr. Victor Pankratius (Karlsruhe Institute of Technology, Germany)

Multicore processors are standard and software developers are now required to parallelize all sorts of performance-critical applications on desktops, servers, and embedded devices. However, the increasing variety of platforms with different hardware and software configurations complicate performance tuning. In this context, the keynote discusses several new directions for auto-tuning from a software engineer's perspective. It shows that auto-tuning is not only useful for performance optimization, but that it is also a key approach to simplify the development of complex multicore applications and to achieve portability. The talk presents new opportunities to extend the concepts of auto-tuning well beyond the numerical approaches proposed in the past. In particular, new work is sketched on how auto-tuners can be employed to optimize software architectures, database queries, and multicore applications that execute simultaneously at run-time. Finally, the talk will describe what it takes to realize the vision of making every performance-critical application auto-tunable by default.


Dr. Pankratius heads the "Software Engineering for Multicore Systems" group at the Karlsruhe Institute of Technology, Germany. He also serves as the elected chairman of the "Software Engineering for Parallel Systems" (SEPARS) international working group. Dr. Pankratius' current research concentrates on how to make parallel programming easier and covers a range of research topics including auto-tuning, language design, debugging, and empirical studies. Contact him at

Paper Submission Guidelines

Technical or position-style paper length - maximum 10 pages

The submitted paper must be camera-ready and formatted according to the rules of Procedia Computer Science.

Please use this file for a Latex template plus instructions and for an MS word template file.

Submission implies the willingness of at least one of the authors to register and present the paper. PostScript and source versions of your paper must be submitted electronically through the paper submission system.

Please, note that papers must not exceed ten pages in length (Regular paper), when typeset using the Procedia format.


Submission must be made through the web form. See:

The submissions between abstract and full-paper will be performed on the same web page by providing the above web page.

Important Dates

Abstracts submission due (on the web system): January 10, 2011, 11:59pm (Japan Standard Time)
Extended to January 17, 2011, 11:59pm (Japan Standard Time)
Full papers submission due: January 17, 2011, 11:59pm (Japan Standard Time)
Notification of acceptance of papers: February 16, 2011, 11:59pm (Japan Standard Time)
Camera ready papers: March 7, 2011
Early registration opens: February 15, 2011
Early registration closes: March 31, 2011
Conference sessions: Wednesday June 1 - Friday June 3, 2011


Registration fee

To join the workshop, you must register for ICCS2011.
The registration page for ICCS2011 will open via:


Program Chair

Takahiro Katagiri, The University of Tokyo, JAPAN

Program Vice-chair

Richard Vuduc, Georgia Institute of Technology, USA

Program Committee

John Cavazos, University of Delaware, USA
Domingo Jimenez Canovas, University of Murcia, Spain
Toshiyuki Imamura, The University of Electro-communications, Japan
Jakub Kurzak, University of Tennessee, USA
Julien Langou, University of Colorado Denver, USA
Osni Marques, Lawrence Berkeley National Laboratory, USA
Akira Naruse, Fujitsu Laboratories Ltd., Japan
Serge G. Petiton, Laboratoire d'Informatique Fondamentale de Lille, France
Markus Puschel, ETH Zurich, Switzerland
Daisuke Takahashi, University of Tsukuba, Japan
Keita Teranishi Cray, Inc., USA
Yusaku Yamamoto, Kobe University, Japan
Masahiro Yasugi, Kyoto University, Japan
Qing Yi, University of Texas at San Antonio, USA

Organizing Committee

General Chair: Toshiyuki Imamura, The University of Electro-communications, Japan
Publicity Chair: Shoichi Hirasawa, The University of Electro-communications, Japan
Web Chair: Hisayasu Kuroda, Ehime University, Japan
PC Chair: Takahiro Katagiri, The University of Tokyo, Japan

Steering Committee

Victor Eijkhout, Texas Advanced Computing Center, University of Texas, USA
Toshiyuki Imamura, The University of Electro-Communications, Japan
Domingo Jimenez Canovas, University of Murcia, Spain
Takahiro Katagiri, The University of Tokyo, Japan
Ken Naono, Hitachi Ltd., Japan
Markus Puschel, Carnegie Mellon University, USA
Reiji Suda, The University of Tokyo, Japan
Richard Vuduc, Georgia Institute of Technology, USA
R. Clint Whaley, University of Texas at San Antonio, USA
Yusaku Yamamoto, Kobe University, Japan
Jonathan T. Carter, NERSC/Lawrence Berkley National Laboratory, USA
John Cavazos, University of Delaware, USA

Financial Supports


iWAPT2011 Organizing Committee
E-mail: (replace "_at_" by "@" in the email address)