Salon I & Salon II [clear filter]
Thursday, October 29

10:00am PDT

Input Space Splitting for OpenCL

OpenCL programs are prone to memory and control flow divergence. When implementing OpenCL for machines with explicit SIMD instructions, compilers can usually generate more efficient code if they can prove non-divergence of memory and branch instructions. To this end, they leverage a so-called divergence analysis. However, in practice divergence is often input-dependent and exhibited for some, but not all inputs. Hence, static analyses fail to prove non-divergence. To obtain good performance, developers can manually split the input space, however this is a tedious and error prone task. 

In this talk we present a new OpenCL to CPU compiler pipeline that addresses this problem by automatically ensuring divergence free control flow through program specialization.  To this end we represent the full kernel as well as the implicit work item dimensions in the polyhedral model. For data dependent control flow and non-affine expression overapproximation is used. From the polyhedral iteration domains and memory access functions we can then derive conditions for the absence of memory as well as control divergence.  Based one these conditions the input space is split in order to generate specialized kernel versions with beneficial divergence characteristics.  Commonly large parts of the input exhibit regular access and control patterns and only a fixed size boundary of the input space does not. In such cases we can achieve speedups almost as high as the used vectorization with. However, also for non-diverging kernels our technique can improve the performance due to simplifications in the polyhedral model. 

avatar for Johannes Doerfert

Johannes Doerfert

Researcher/PhD Student, Saarland University

Thursday October 29, 2015 10:00am - 10:45am PDT
Salon I & Salon II

11:00am PDT

Beyond Sanitizers: guided fuzzing and security hardening

The Sanitizers (AddressSanitizer & friends) allow you to find many stability and security bugs in C++ code, but they are only as good as your tests are. In this talk we will show how to improve your test coverage with guided fuzzing (libFuzzer) and how to protect your applications in production even if some bugs are still there (Control Flow Integrity and SafeStack).

avatar for Kostya Serebryany

Kostya Serebryany

Software Engineer, Google
Konstantin (Kostya) Serebryany is a Software Engineer at Google. His team develops and deploys dynamic testing tools, such as AddressSanitizer and ThreadSanitizer. Prior to joining Google in 2007, Konstantin spent 4 years at Elbrus/MCST working for Sun compiler lab and then 3 years... Read More →

Thursday October 29, 2015 11:00am - 11:45am PDT
Salon I & Salon II

11:45am PDT

A Heterogeneous Execution Engine for LLVM

Hexe, which stands for Heterogeneous Execution Engine, is an new compiler component that integrates with the LLVM infrastructure. It targets efficient computation on heterogeneous platforms by allowing the automatic offloading of workloads on computational accelerators, such as Graphics Processing Units (GPUs) or Digital Signal Processors(DSPs).


The workloads we consider for offloading are either explicitly annotated by the programmer or automatically detected by static compiler analysis and runtime checks. Our infrastructure operates at the level of LLVM intermediate representation and effectively supports multiple source languages.


Hexe consists of a set of compiler passes and a runtime environment. The compiler passes perform the required code analysis and transformations to enable workload offloading. The runtime environment manages data transfers and synchronization operations, and performs dynamic workload scheduling.


We consider a diverse set of heterogeneous systems ranging from mobile devices equipped with arm based multi-core CPUs, embedded GPUs and DSPs to data center nodes consisting of x86 multi-cores and high-end GPUs. Hexe has a modular design where new accelerator types and programming environments can be supported via a plugin interface. We also consider interoperability between Hexe and modern JIT technologies, such as LLVM MCJIT.

avatar for Christos Margiolas

Christos Margiolas

The University of Edinburgh

Thursday October 29, 2015 11:45am - 12:30pm PDT
Salon I & Salon II

2:00pm PDT

Building, Testing and Debugging a Simple out-of-tree LLVM Pass

This tutorial aims at providing solid ground to develop out-of-tree LLVM passes. It presents all the required building blocks, starting from scratch: cmake integration, llvm pass management, opt / clang integration. It presents the core IR concepts through two simple obfuscating passes: the SSA form, the CFG, PHI nodes, IRBuilder etc. We also take a quick tour on analysis integration through dominators. Finally, it showcases how to use cl and lit to parametrize and test the toy passes developed in the tutorial.


Serge Guelton


Adrien Guinet


Thursday October 29, 2015 2:00pm - 3:00pm PDT
Salon I & Salon II

3:00pm PDT

Creating an SPMD Vectorizer for OpenCL with LLVM

Processors such as CPUs or DSPs often feature SIMD instructions, but are not designed to efficiently support Single Program Multiple Data (SPMD) execution models such as OpenCL. The design of a compiler for such a target therefore needs some form of vectorization to generate the most optimal code for this kind of data-parallel execution model. This is because SPMD programs are most often written in scalar form with the implicit assumption that many instances of the program are executed in parallel. On CPU-like architectures, SIMD vector units can be leveraged for parallelism, such that each SIMD lane is loosely mapped to a program instance. 


This tutorial looks at how to create an SPMD vectorizer that targets CPU-like architectures for use with heterogeneous compute frameworks. OpenCL is used as an example but the concepts should translate to other frameworks such as CUDA, RenderScript or Vulkan Compute. While there are other possible approaches, we have chosen to present one that works at the LLVM IR level and that is essentially an IR pass that creates vectorized functions from the original scalar SPMD function. This allows targetting multiple architectures with very little architecture-specific code. 


We will start by briefly introducing the SPMD execution model, describing how it is used in OpenCL and giving an overview of what a SPMD vectorizer should do and how it differs from other kinds such as LLVM's loop vectorizer and SLP vectorizer. Then we will look at a possible vectorizer design, including the different vectorization stages (analysis, control-flow to data-flow, scalarization, packetization/instantiation and optimization/cleanup). Finally, we will look at some possible optimizations as well as other aspects that do not fit the 'stage-by-stage' presentation (e.g. vectorizing and scalarizing calls to builtin functions, SIMD width detection, interleaved memory access optimizations, SoA to AoS conversions, etc).


Pierre-Andre Saulais

Senior Principal Software Engineer, Codeplay Software
Pierre-Andre is a Senior Principal Software Engineer at Codeplay Software.

Thursday October 29, 2015 3:00pm - 4:00pm PDT
Salon I & Salon II

4:00pm PDT

Polly - Optimistic Loop Nest Optimizations with Schedule Trees

Polly is an advanced LLVM loop nest optimizer that provides precise memory access analyses and implements on top of them advanced loop optimizations based on a memory-access focused program model.

In the first part of this tutorial we introduce the audience to  integer set based schedule trees as a way to model loop programs. We explain how we statically model program behavior on the granularity of individual dynamic computations and discuss different program analyses (memory accesses, data-dependences, computational complexity).

We then learn how to perform complex loop transformations using simple per-node operations on an abstract program schedule tree. Such transformations include most classical loop transformations, but also full/partial tile separation, outer-loop vectorization and other more complex transformations. At the end of the first part of this tutorial, the audience understands the general concepts used in Polly.

The second part of this tutorial is focused on Polly's new optimistic optimization infrastructure that enables non-statically provable transformations to be performed optimistically. Discussing optimization blocking issues such as exception handling code, infinite loops, integer wrapping or out-of-bound memory accesses we introduce the concept of optimistic assumptions. We then discuss how such assumptions can be described in general, how Polly can collect assumptions, how redundant assumptions are eliminated and how a (close to) minimal run-time check to verifying them are generated. At the end of the second part of this tutorial the audience will be able to create optimistic loop optimizations even for cases that lack sufficient static information.

avatar for Johannes Doerfert

Johannes Doerfert

Researcher/PhD Student, Saarland University

Tobias Grosser

ETH Zurich

Thursday October 29, 2015 4:00pm - 5:00pm PDT
Salon I & Salon II

5:00pm PDT

Living Downstream Without Drowning

Have you made changes to your copy of an llvm.org project? Not planning to contribute them back to the open-source project right away? 


Have you noticed that there are actually quite a lot of changes made to the upstream projects? Clang + LLVM together see an average of 50 commits every day. This is a FLOOD. Are you seeing lots of conflicts or test failures when you merge from upstream? Spending too much time patching things back together before you can make any progress on your project?

Then you are DROWNING! 

On a project with lots of local changes, managing the flood can be a half-time job all by itself. It's not _exactly_ unproductive time, but it's time you do not spend on your unique project and customizations. At Sony Computer Entertainment, we were drowning... but we've learned to swim with the current, and we are building a lifeboat.

In this combined tech-talk/BOF session, Paul and Mike will talk about SCE's practices and plans for reducing our merge overhead, including source-patch practices and merge/build/test automation. Then, it becomes a BOF where everyone can share their ideas, suggestions and practices for Living Downstream Without Drowning!


Michael Edwards

Sony Computer Entertainment

Paul Robinson

Sony Computer Entertainment

Thursday October 29, 2015 5:00pm - 6:00pm PDT
Salon I & Salon II
Friday, October 30

10:00am PDT

LLVM Performance Improvements and Headroom

While LLVM is known for very fast compile-time, many developers in the community also push for improving run-time performance of generated code. This talk highlights this year’s performance gains on AArch64 in key benchmarks like SPEC2006, Kernels and also the llvm test suite. While progress has been impressive more work needs to be done. Therefore we will discuss future performance headroom which involves both expanding existing and architecting new optimizations.


Friday October 30, 2015 10:00am - 10:45am PDT
Salon I & Salon II

11:15am PDT

Exception handling in LLVM, from Itanium to MSVC

This talk covers the design and implementation of MSVC-compatible exception handling in Clang and LLVM. Unlike the Itanium C++ exception handling model, the Windows exception handling model is not designed around successive unwinding. As a result, the existing LLVM landingpad instruction is insufficient for expressing how Windows exceptions should be handled. To support Windows exceptions, we added the new token type and a family of new EH pad instructions to LLVM. This talk describes the final design of the new representation and the tradeoffs we made along the way.


Reid Kleckner

Software Engineer, Google
I work on Clang, the C++ compiler. I specifically work on C++ ABI compatibility with MSVC, and other Windows-related issues in Clang.

Friday October 30, 2015 11:15am - 12:00pm PDT
Salon I & Salon II

12:00pm PDT

An update on Clang-based C++ Tooling

This talk is going to give an update of the C++ tooling we are building on top of clang. Among others, it will focus on clang-tidy, a tool to statically analyze source code to diagnose and fix typical programming errors like style violations, interface misuse, or bugs. We'll give an update on the direction this project is taking, new checks that are being integrated and challenges we are facing. 

In a live demo, we'll show how we can fix specific problems throughout LLVM's own codebase. We'll also show how a new check can be added in a matter of minutes and how other Clang-based tools can help with its development.


Manuel Klimek

Software Engineer, Google
Manuel Klimek is a software engineer at Google since 2008 and a professional code monkey since 2003. After developing embedded Linux terminals for the payment industry and distributed storage technology at Google in C++, he decided that C++ productivity lags significantly behind other... Read More →

Friday October 30, 2015 12:00pm - 12:45pm PDT
Salon I & Salon II

2:00pm PDT

Throttling Automatic Vectorization: When Less Is More

SIMD vectors are widely adopted in modern general purpose processors as they can boost performance and energy efficiency for certain applications. 

Compiler-based automatic vectorization is one approach for generating code that makes efficient use of the SIMD units, and has the benefit of avoiding hand development and platform-specific optimizations. 

The Superword-Level Parallelism (SLP) vectorization algorithm is the most well-known implementation of automatic vectorization when starting from straight-line scalar code, and is implemented in several major compilers. 


The existing SLP algorithm greedily packs scalar instructions into vectors starting from stores and traversing the data dependence graph upwards until it reaches loads or non-vectorizable instructions. 

Choosing whether to vectorize is a one-off decision for the whole graph that has been generated. 

This, however, is suboptimal because the graph may contain code that is harmful to vectorization due to the need to move data from scalar registers into vectors. 

The decision does not consider the potential benefits of throttling the graph by removing this harmful code. 

In this work we propose a solution to overcome this limitation by introducing Throttled SLP (TSLP), a novel vectorization algorithm that finds the optimal graph to vectorize, forcing vectorization to stop earlier whenever this is beneficial. 

Our experiments show that TSLP improves performance across a number of kernels extracted from widely-used benchmark suites, decreasing execution time compared to SLP by 9% on average and up to 14% in the best case. 


Vasileios Porpodas

University of Cambridge

Friday October 30, 2015 2:00pm - 2:45pm PDT
Salon I & Salon II

2:45pm PDT

LoopVersioning LICM

Loop invariant code motion is an important compiler optimization and it moves invariant instructions out of a loop without affecting the semantics of a program. 

For safety it ensures the alias dependencies before moving invariant out of loop. 

In some cases memory aliasing may make this optimization ineffective. This results in possible missed opportunities in speeding up applications. 


LoopVersioning LICM is a step to exploit those missed opportunities where memory aliasing may make LICM optimization ineffective.

avatar for Ashutosh Nema

Ashutosh Nema

Compiler Engineer, AMD

Friday October 30, 2015 2:45pm - 3:30pm PDT
Salon I & Salon II

4:30pm PDT

Debug Info: From Metadata to Modules

The efficiency of debug info in LLVM and Clang improved dramatically this year.  This talk is about what it took to get here and what work remains.


We'll talk about how Metadata was redesigned to make the debug info IR memory-efficient (with a human-readable assembly syntax).  We'll go into the implications for other Metadata graphs, and what a more expressive Metadata future could look like.  We'll also include an overview of what's left to scale debug info for LTO.


We'll also talk about Clang's new module debugging feature, which reduces the size of debug info on disk, improves compile time, and makes full type information available to debuggers.  We'll highlight how Clang-based debuggers like LLDB can use module debug information to enhance expression evaluation.

avatar for Adrian Prantl

Adrian Prantl

Ask me about debug information in LLVM, Clang and Swift!

Friday October 30, 2015 4:30pm - 5:15pm PDT
Salon I & Salon II

5:15pm PDT

Lightning Talks
This session consists of a series of talks, each 5 minutes long. Here is the lightning talk schedule:

The recent switch lowering improvements - Hans Wennborg, Google

Earlier this year, the DAG switch lowering was rewritten to improve the performance of code generated for switches. The new algorithm always generates balanced trees, is better at finding jump tables, and can exploit profile information. This lightning talk would give a walk-through of the new switch lowering.

ds2, a tiny debug server used with lldb - Stephane Sezer, Facebook

This talk will present ds2, a debug server that we use in conjunction with lldb to do remote debugging at Facebook. It currently supports remote debugging on Linux/android/Tizen and Windows as well as FreeBSD support is under development. 

This debug server's small size and the fact that it depends only on libc++ make it an ideal candidate to include in embedded platforms where space is limited. Source is available here: https://github.com/facebook/ds2

Accelerating Stateflow with LLVM - Dale Martin, Mathworks, Inc., Ramkumar Ramachandra, MathWorks, Inc.

Learn how MathWorks has improved the customer experience for Stateflow users using LLVM for JIT-based simulation. We will discuss how we translate from our high-level IR to LLVM's low-level IR and use it for fast starting, high-performance simulation. We will also discuss a number of challenges and shortcomings we faced with the LLVM infrastructure

Putting Debug Info on a Diet
- David Blaikie, Google Inc

Debug info size is... sizable. Two years ago, Clang's debug info was up to twice as large as GCC's, after 9 months, it was nearly half the size. Where and how did we cut the fat?

Large scale libc++ deployment
- Evgenii Stepanov & Ivan Krasin, Google

This talk presents author’s experience switching a large, quickly changing codebase from libstdc++ to libc++. We list common problems, solutions and ideas for future libc++ improvements.

To LLVM Bytecode Obfuscation and Beyond - Serge Guelton, Quarkslab, Adrien Guinet, Quarkslab

An introduction to LLVM pass building when working out-of-tree, through 3 simple obfuscating passes. Featuring building and using a custom analysis, tracing your passes, and test them with lit!

An Implementation of Swing Modulo Scheduling in a Production Compiler - Brendon Cahoon, Qualcomm

In this talk, we present the implementation and evaluation of a machine level software pipelining optimization pass based on Swing Modulo Scheduling. Our software pipelining implementation improves performance by 20% on a set of image processing kernels.

Integer Vector Optimizations and "Usual Arithmetic Conversions
- Stephen Rogers, Movidius


David Blaikie

Software Engineer, Google Inc.

Serge Guelton


Dale Martin

avatar for Stephane Sezer

Stephane Sezer

Software Engineer, Facebook

Friday October 30, 2015 5:15pm - 6:00pm PDT
Salon I & Salon II
Filter sessions
Apply filters to sessions.