Tech Report ASPLOS-19 #208

Digital Methods for Closing Analog Side Channels
Tech Report for ASPLOS Submission #208(6th August, 2018) – Confidential Draft – Do Not Distribute!
Abstract Unfortunately, these solutions are not compatible with mi-

In the face of strong encryption, side channels have become croarchitectural optimizations—such as out-of-order execu-
an important attack vector, but analog side channels, such as tion, caches, or branch predictors—and are not compatible
the power channel, are particularly difficult to close. Prior with complex control- and data-flow, so they can eliminate
defenses are limited to tiny cryptographic kernels running fine-grain power variations—which arise from differences in
on specially designed hardware that precludes most modern the Hamming Weight or the Hamming Distance of secrets in
optimizations such as caches and out-of-order execution. the processor’s Arithmetic Unit—but cannot eliminate coarse-
This paper presents VANTAGE, a customizable compiler grain variations—which arise from microarchitectural opti-
that extends the capabilities of power channel defenses to a mizations and important programming features, such as loops
broad class of applications running on modern microproces- and floating-point operations. Thus, existing techniques do not
sors. The key idea is to use power models, which provide provide defenses for important programs, such as databases,
digital knobs that can be manipulated in software to force the social media platforms, and machine learning codes, nor for
program’s power profile to be independent of its secrets. codes that run on modern commercial microprocessors.
Using two distinct power models, we evaluate VANTAGE These limitations suggest a role for compilers, which offer
for x64, ARM32, and ARM64 microprocessors. Our results several benefits. First, because they can reason about and
on 12 diverse programs show that VANTAGE successfully foils close whole-program information flow, compilers can elimi-
power channel attacks, while achieving performance that is nate coarse-grained power variations, thereby complementing
33× to 137× faster than a baseline hardware solution that existing fine-grained defenses. Second, compilers offer lower
provides the same level of security. When compared against performance overheads than a possible hardware solution, be-
non-secure hardware, VANTAGE incurs a mean slowdown cause they can selectively apply defenses to just those parts of
that varies from few percent to 7×, depending on the target the program that manipulate sensitive data. Finally, compilers
processor and the power model. can protect existing hardware and all of their power and perfor-
mance optimizations. The key barrier that prevents compilers
1. Introduction from closing analog side channels is the ISA, the functional
interface between the compiler and the hardware, which hides
To protect the massive volumes of sensitive data that are pro- from the compiler the very implementation details—the analog
cessed by today’s computers, vendors have introduced mecha- channels—that we wish to regulate.
nisms such as Intel’s SGX enclaves [46], which use strong en-
cryption to prevent malicious operating systems or co-resident In this paper, we introduce VANTAGE, the first digital—that
attackers from directly reading secrets. In the face of these is, compiler-based—defense that closes coarse-grain analog
privacy-preserving facilities, attackers are forced to learn se- side channels; in particular, we focus on the power side chan-
cret values from computational side effects, which could be nel on x64, ARM32, and ARM64 processors. Our key idea is
anything from instruction counts to cache misses or power to use a power model to relate program execution to the proces-
consumption. These so-called side-channel attacks have been sor’s power consumption. The power model essentially helps
successful in attacking programs by leaking intellectual prop- us selectively break the ISA barrier, allowing the VANTAGE
erty [36], database queries [14], financial information [16], compiler to deliberately change the program’s power consump-
document contents [31], etc. tion for the sake of closing the coarse-grained power channel.
While mechanisms for closing digital side-channels—those Our compiler is not tied to any particular power model, so it
that carry information through discrete bits—have been well can be customized to use more precise and accurate power
studied for a variety of programs and architectures [21, 66, 72, models as they become available. To illustrate the flexibility
35, 58, 42, 54, 55, 10, 47, 41, 3], there are far fewer techniques of our approach, this paper evaluates versions of the VAN -
for closing analog side channels, such as power draw and TAGE compiler that use two very different power models—the
temperature. Instead, most analog side-channel defenses have open-source McPAT [39] model and the closed-source Intel
focused on protecting small cryptographic programs [51, 50, RAPL [23] model.
48, 6, 18, 25, 20, 33, 40, 11, 32] running on custom hardware While VANTAGE is the first solution for eliminating coarse-
that has value-independent power draw [22, 62, 63, 53, 52] grain power variations for a broad class of programs running
or that have capacitors that hide power consumption for short on modern processors, it does not represent the endpoint of
bursts of time [64, 70, 1]. this research. It is expensive to eliminate coarse-grain power
Label #1 Label #2
Dynamic Power (Watts)

Power Consumption of Non−Secure Execution #1 (~2.1W)
(Malicious Packet) (Benign Packet)
2.1
Mean 2003.0 1893.1 Power Consumption of Secure Executions: #1 and #2 (~2.1W)
Stdev 29.8 30.8
2.0
Table 1: Energy consumption (measured using Intel RAPL)
while running the LibSVM classifier [13] that labels data from Power Consumption of Non−Secure Execution: #2 (~1.9W)
1.9
the KDD Cup dataset [5]. We observe that energy consump-
tion is a reasonable indicator of the label of the input data.
0 50 100
Time (microseconds)
variations, because the defense needs to ensure that secret
values do not affect the running time of the program, but we
Figure 1: Power consumption of the LibSVM classifier before
believe that aggressive static analysis and simple hardware
and after using our solution (measured using McPAT). The
optimizations have the potential to significantly reduce perfor-
non-secure executions (shown in red and orange), produce a
mance overheads.
visually distinct profile of power consumption, whereas after
This paper makes the following contributions:
using our solution, for all secrets, the power profiles are iden-
tical (shown as a single line in blue), effectively mitigating the
1. Power Channel Attacks on Diverse Programs. We
power channel attack.
demonstrate that a broad class of applications, including
graph kernels, font renderers, and machine-learning differences in the processor’s energy consumption.
libraries are vulnerable to power channel attacks. In addition to the total energy consumption, the profile of
the energy consumption depends on the inputs as well, en-
2. Coarse-Grain Power Channel Defense. We describe our abling the adversary to infer the secret through just a partial
customizable compiler, VANTAGE, which extends power observation of the program execution. Figure 1 shows that
channel defenses beyond small cryptographic programs the power consumption (measured using the McPAT power
running on simple processors to include a broad class of modeling framework [39]) during a 140µs time window dif-
sensitive programs running on modern highly-optimized fers between the two non-secure executions (shown in red and
microprocessors. orange), where one execution produces more peaks than the
other. Thus, we see that the power channel is dangerous and
3. Performance Comparison. We show that, depending on easy to exploit, in applications that may operate on private or
the underlying power model and the target microprocessor, confidential information.
VANTAGE imposes a mean slowdown that ranges from Existing software solutions only apply to straight-line code
a few percent to about 7×, which is 33× to 137× more that references a fixed set of memory locations in a static pat-
efficient than a baseline hardware-only solution. tern, while existing hardware solutions enforce peak power
consumption for every operation. Effectively, these solutions
2. Overview of This Paper
are unable to eliminate power variations in non-cryptographic
Before we explain the technical details of this paper, we first applications running on modern processors. Such power vari-
demonstrate how the coarse-grain power channel leaks infor- ations arise due to microarchitectural optimizations (such as
mation in a sample non-cryptographic application running on out-of-order execution, branch prediction, caches, etc.), from
optimized processors. We then describe the high-level ideas useful programming features (such as loops and memory ac-
behind our solution. cesses), and even from the use of computation units such as
floating point units.
2.1. Problem Demonstration
2.2. Our Solution
Power side channels leak secret information through not just
arithmetic and logic operations, but also through branches, In this work, we present an approach that produces a customiz-
memory accesses, and through microarchitectural optimiza- able compiler, VANTAGE, that can close power variations
tions. Table 1 shows, for two secret inputs passed to the Lib- arising from complex control- and data-flow, microarchitec-
SVM classifier [13], the total energy consumption measured tural optimizations, and floating-point execution on different
using Intel’s Running Average Power Limit (RAPL) [23]. We microarchitectures. The key insight behind constructing the
see that when running the classifier on a reference network VANTAGE compiler is that we can enable the compiler to un-
intrusion dataset [5], there exists a correlation between the derstand the program’s power consumption by augmenting
processor’s energy consumption and the classification of net- the compiler with a power model, which captures the link be-
work packets as either benign or malicious. Thus, we see that tween digital events and the processor’s power trace. We use
an adversary can infer the packet label by simply observing this link to find instructions that can leak information through
2
Interval 5 ms 10 ms 30 ms 50 ms 100 ms accesses can leak operands through power consumption. We
Error 4.7% 3.5% 1.4% 1.0% 0.8% also find microprocessor-specific sources of power channel
Table 2: Observer effect in energy measurements of a syn-
leakage. Specifically, on Gem5’s x64 implementation, we find
thetic benchmark (hackbench) for different intervals of RAPL
that 22 instructions (see Appendix Table 6) leak information
measurement granularity. The baseline execution measures
about their operands, while on Gem5’s ARM32 implementa-
energy only once at program termination.
tion, integer and floating-point comparisons as well as various
software runtime functions (e.g. integer division and floating-
their power consumption, and our compiler then automati- point operations) also leak information about their operands
cally replaces these instructions with functionally equivalent through power consumption.
instruction sequences that do not leak information through Validation of Our Analysis. We validate the results of our
coarse-grain power variations. VANTAGE can also use princi- analysis through random testing and by comparing the pre-
pled techniques to improve the performance of the transformed dicted versus actual energy consumption. For the open-source
code without compromising security. power model, we generate multiple random instruction se-
2.2.1. Flexibility of Our Approach quences chosen from a pool of 277 different integer and SSE
Our approach is not restricted to any specific power model or x64 instructions and five different addressing modes, and we
processor, so we illustrate the flexibility of our approach by check the power consumption from executing these instruc-
building upon (1) the (open source) McPAT power modeling tions. For the closed-source power model, we use programs
framework for four different targets (64-bit x86, 32-bit ARM from the PARSEC benchmark suite [7] to compare the pre-
with software floating-point ABI, 32-bit ARM with hardware dicted versus actual energy consumption. In both cases, we
floating-point ABI, and 64-bit ARM) and (2) upon the (closed find our analysis results to be correct.
source) Intel RAPL for x86 platform. 2.2.3. Our Compiler Implementation
McPAT and Intel RAPL represent two distinct power mod- The VANTAGE compiler uses the results from our previous
els. The McPAT power model precisely documents all its analysis to generate code that is functionally equivalent to
inputs (i.e. microarchitectural events) and output (i.e. power the original (non-secure) code, but which does not leak in-
consumption), enabling us to design a comprehensive power formation through the power consumption. Since the power
channel defense. On the other hand, models such as Intel consumption is closely tied to both the power model and the
RAPL are closed source systems whose exact details are un- microarchitecture, we make VANTAGE a customizable com-
known. For such closed source systems, we demonstrate our piler whose transformations are chosen based on the target
approach by creating a new, flexible regression model that microarchitecture.
relates the energy consumption with microarchitectural events,
where the regression model enables a tradeoff between preci- 3. Threat Model
sion of the model and performance overhead of the defense.
McPAT and RAPL also differ in their measurement granu- VANTAGE prevents an attacker from correlating secrets with
larity, since McPAT operates within a microarchitectural simu- the processor’s coarse-grain power consumption. We assume
lator, which enables detailed power measurements without per- that the power model analyzed using VANTAGE is at least as
turbing program execution, whereas RAPL’s energy events are accurate as the power model that will be used by the adversary,
updated roughly once every millisecond, placing a hard limit but VANTAGE is flexible enough to permit richer power mod-
on the measurement granularity. For example, RAPL measure- els as they get developed. The user of VANTAGE can select
ments gathered more frequently than once every 100 ms result the set of relevant power model depending on the measure-
in a noticeable energy overhead (see Table 2). ment techniques available to the adversary, such as physical
2.2.2. Summary of Our Analysis and Results oscilloscope probes used in the vicinity of the processor or
remotely measured energy-related performance events such as
Analysis Steps. For the open-source power model (McPAT), Intel Running Average Power Limit (RAPL). VANTAGE pre-
we trace the source code of McPAT, the Gem5 microarchitec- vents power and energy variations at the level of instructions
ture simulator [8], and the LLVM compiler [37], so that we and microcode operations but not cycle-level (i.e. fine-grain)
can determine the impact of LLVM IR instructions on the power variations; we assume that cycle-level power variations
processor’s power consumption. For the closed-source power are eliminated using existing techniques such as blinding or
model (Intel RAPL), since no source code is available, we masking [51, 50, 48, 6], Converter Gating [64], Converter
use regression analysis to determine the microarchitecture Reshuffling [70], or using transistors that enforce peak power
events (such as cache misses) that affect the power consump- consumption [22, 62, 63, 53, 52]. VANTAGE then removes
tion, before finding the LLVM IR instructions that affect the variations in the power consumption at the level of instructions
microarchitecture events. and microcode operations, effectively complementing (instead
Our Findings. Across all analyzed microprocessors and of replacing) existing power channel defenses. VANTAGE does
power models, we find that branch instructions and memory not protect the DRAM from power channel attacks.
3
01: void foo() { 12: void foo() {
Other Side Channels. VANTAGE does not close digital side 02: // conditional branch 13: // save branch predicate
channels since they can be closed using orthogonal solu- // depends on secret. 14: int pred = secret != 0;
tions such as resource partitioning [66, 57], anomaly detec- // hence power
tion [15, 71, 28], Oblivious RAM [58, 42], or compiler trans- // differences reveal 15: // execute both paths
// secret value. 16: incr_count(pred);
formations [47, 54, 55]. VANTAGE uses new compiler transfor- 17: count = cmov(pred == 0,
mations that are specifically aimed at closing the coarse-grain 03: if (secret != 0) { 0, count);
power channel. We do not claim that VANTAGE closes other 04: incr_count(); 18: }
analog side channels such as heat, electromagnetic radiation, 05: } else {
06: count = 0; 19: void incr_count(int pred) {
and sound, but some of these side channels may be incidentally 07: } 20: count = cmov(pred,
related to the power consumption. 08: } count + 1, count);
21: }
Input Programs. The VANTAGE compiler accepts pro- 09: void incr_count() {
grams that can be compiled to the LLVM Intermediate Rep- 10: count = count + 1;
(b) Transformed code, which con-
resentation. VANTAGE’s compiler cannot transform recursive 11: }
verts conditional branches into un-
programs, programs that contain system calls, and programs
conditional branches, with pred-
that contain irreducible control flow graphs (CFGs)—such (a) Naive (non-secure)
ication using cmov() (see Ap-
programs are rejected by the compiler. Nevertheless, standard execution.
pendix Figure 7).
compiler techniques [49] can convert irreducible CFGs into
reducible CFGs and recursive programs into non-recursive Figure 2: VANTAGE’s transformation of conditional branches
programs. Similarly, Library OSes [4] eliminate or reduce the whose predicate depends on secrets.
use of system calls. The VANTAGE compiler assumes that pro-
grams do not contain inline assembly instructions. VANTAGE
also assumes that orthogonal debugging techniques have been
used so that the program does not contain undefined behavior,
data races, and errors that throw exceptions.
4. Design of the VANTAGE Compiler

We first describe the portions of the VANTAGE compiler that
apply to all power models and architectures, before describing
parts that apply specifically to the given power model and
architecture.
Identifying Candidate Instructions. Our VANTAGE com- Figure 3: Dependences among runtime components for com-
piler requires the user to annotate only those variables that piling programs using VANTAGE.
store secret information. Our compiler then uses an inter-
procedural, flow-sensitive, and context-insensitive taint track- Microprocessor-Specific Transformations. VANTAGE
ing analysis to identify the relevant portions of the program transforms various microprocessor-specific operations, which
that should be transformed. we explain in the design of VANTAGE-McPAT (Section 4.1.4).
Microprocessor-Agnostic Transformations. Since branch Automatically Generating Transformed Instructions.

instructions and memory accesses leak information for all Fortunately, many transformed instructions can be generated
analyzed processors, our compiler transforms them to avoid incrementally using the VANTAGE compiler themselves,
leaking secrets. Specifically, VANTAGE transforms conditional thus eliminating the need to manually write all transformed
branches into unconditional branches, while also preserving instructions in assembly code. As Figure 3 shows, we only
correctness. Figure 2 shows an example of the original and need a handful of operations in assembly code, while the
the transformed code, when the branch predicate of an if- VANTAGE compiler generates the remaining operations
statement is secret. VANTAGE transforms loops differently, by from C code and the handwritten assembly code. Using the
padding the iteration count to the maximum possible value. VANTAGE compiler, we also transform the Berkeley SoftFloat
Since maximum iteration counts cannot be determined auto- library1 for elementary floating-point operations (32- and
matically (see Halting Problem), VANTAGE relies on annota- 64-bit floating-point addition, subtraction, multiplication,
tions provided by the user. For memory accesses using secret division, and square root) and Musl C library2 for higher-level
pointers, the VANTAGE compiler replaces the pointer deref- floating-point operations (sine, cosine, logarithm and other
erence with accesses to all cache lines (i.e. strided, 64-byte similar operations). Finally, these transformed components
accesses) of the array, while saving the value from only the are linked with the transformed version of the benchmarks to
desired location. Thus, VANTAGE accesses the same cache 1 http://www.jhauser.us/arithmetic/SoftFloat.html
lines and TLB entries, regardless of the secret addresses. 2 https://www.musl-libc.org
4
produce the executable file. goal is to check whether there exist other sources of variations
in the power consumption, beyond the ones that we discovered
4.1. VANTAGE for Open Source Power Model: McPAT through our manual analysis. We thus use several different
We now describe the details of VANTAGE that are pertinent to inputs to execute randomly-generated instruction sequences
our open-source power model, McPAT. that do not exhibit variation among the factors listed above
4.1.1. Code Analysis of LLVM, Gem5, and McPAT (for instance, the generated instructions always access a fixed
Through our analysis of McPAT, we find that the McPAT set of memory locations), and we observe the power profile
power modeling framework estimates the processor’s power of the instruction sequences across the different inputs. If our
consumption based on the processor’s physical characteristics manual analysis is incorrect (i.e. if additional factors exist that
(e.g. supply voltage), its features (e.g. cache sizes), and its affect power consumption), we expect to see differences in the
microarchitectural events (e.g. cache hits and misses). Since power profile when the input values differ.
the processor’s physical characteristics and its features are Validation for x64 Target. We implement this experiment
constant for the program lifetime, we focus our effort on the for the x64 platform using the Intel XED (X86 Encoder De-
impact of microarchitectural events on power consumption. coder) library3 . From the roughly 400 instructions supported
We then find the assembly instructions whose operands may by the Gem5 simulator, our test generator emits 277 instruc-
cause variations in the microarchitectural events produced by tions (187 SSE instruction and 90 integer instructions). The
the Gem5 simulator, by analyzing the microcode and excep- instructions not emitted by our randomized tests include 71
tions for the x64, ARM32, and ARM64 microprocessors and x87 (coprocessor) instructions, 38 ring 0 (or system man-
also the common portions of the microarchitectures such as agement) instructions, 17 branch instructions, and 7 string
out-of-order execution, caches and TLBs. Finally, we ana- instructions; we do not include these instructions in our ran-
lyze the translation of LLVM IR instructions into assembly domized test since they are either removed by our compiler
programs to understand the impact of IR instruction operands during the code transformation or because these instructions
on power consumption. Although we perform this analysis are not used or rarely used in modern user-level code. Our
manually, existing static analysis techniques such as Informa- randomized test generates instructions using five addressing
tion Flow Analysis [24] can be used to derive the same results. modes, whenever supported by the instructions: immediate
Besides branch instructions and memory accesses, we find that mode, register mode, indirect mode, base-relative mode, and
the x64 and ARM32 targets leak information through specific offset-scaled-base-relative mode. Each instruction sequence
instructions that we describe next. is 1,000 instructions long, and in each execution we iterate
4.1.2. Our Findings 10,000 times over the generated instruction sequence. At every
1µs, we measure the power consumption, and we find that re-
Vulnerable Instructions for x64 Target. As mentioned in
gardless of the input to the instruction sequence, all executions
Section 2.2.2, we find that 22 instructions (see Appendix Ta-
produce an exactly identical power profile.
ble 6) execute varying number of microcode operations de-
Since the Intel XED library does not support encoding
pending on their operands, thus affecting the power consump-
instructions for ARM32 and ARM64 microprocessors, we
tion. Among these, 19 instructions are either used only in
are currently in the process of manually encoding instruction
kernel mode (thus being outside of VANTAGE’s threat model)
mnemonics into bits to port this experiment to the ARM plat-
or are rarely used in code generated by modern compilers
forms.
(e.g. string instructions). Hence, we focus our attention on the
4.1.4. Microprocessor-Specific Transformations The fol-
remaining three instructions (IDIV, BSF, and BSR).
lowing microprocessor-specific transformations of VANTAGE-
Vulnerable Instructions for ARM32 Target. As we al- McPAT apply in addition to the microprocessor-agnostic trans-
luded earlier, we also find that predicated instructions on formations (control flow and memory access transformations)
ARM32 (generated by LLVM for compare and select IR described earlier in Section 2.2.3. To design VANTAGE-
instructions) can cause variations in power consumption de- McPAT’s microprocessor-specific transformations, we devise
pending on their operands, and that various software runtime alternate instruction sequences in software, which are func-
library functions (e.g. integer division) cause variations in tionally equivalent to the original instructions but which do
power consumption due to their use of conditional branches not leak information through the power consumption.
and predicated instructions. For processors with software
Alternate Instruction Sequences for x64. We implement
floating-point ABI, we find that the elementary floating-point
software versions of BSF and BSR (see Appendix Figure 8)
operations (i.e. add, sub, mul, div, and sqrt) and conversion
and a binary version of long division IDIV (see Appendix
operations between integer and floating-point numbers use
Figure 11), both of which execute using a fixed number of
conditional branches and predicated instructions, causing the
microcode operations regardless of the operands. VANTAGE-
inputs to create variations in power consumption.
McPAT transforms division operations at run time so that a
4.1.3. Validation of Analysis Results We validate the results
of our manual analysis using randomized testing, where our 3 https://intelxed.github.io/
5
zero divisor is replaced with a non-zero value. Our threat 4.2.1. Constructing A Flexible Regression Model We con-
model assumes that the input program does not divide by zero, struct a flexible regression model whose precision can be
hence VANTAGE-McPAT’s transformation of the zero divisor controlled using a tuning parameter. Unlike previous regres-
does not change the semantics of the user’s program. sion models for power consumption [9, 59, 29, 19, 30, 68],
the coefficients in our regression model (based on the Elas-
Alternate Instruction Sequences for ARM32. We use the tic Net regression technique [75]) depend on a parameter λ ,
GNU Superoptimizer4 for discovering alternate instruction which enables a tradeoff between the precision and the sim-
sequences that are functionally equivalent to the integer com- plicity of the regression model. More precisely, the value
parison operations, and we port the results of the GNU Su- of λ affects the number of zero coefficients (i.e. coefficients
peroptimizer from PowerPC to ARM32 assembly code (see whose value is zero), which indicates that the corresponding
Appendix Figures 9 and 10 for examples). For floating-point microarchitectural events have no impact on the estimated
comparisons, we use transformed code from the Berkeley Soft- power consumption. So a model with more zero coefficients
Float library. Finally, VANTAGE-McPAT replaces the select reduces the number of microarchitectural events that need to
IR instruction with a transformed comparison operation fol- be controlled using our compiler, effectively reducing the per-
lowed by the conditional move operation. For the ARM32 formance overhead of our defense. At the same time, however,
microprocessor with a software floating-point ABI, we use a model with more zero coefficients can also be imprecise,
transformed versions of 32- and 64-bit floating-point opera- since fewer microarchitectural events are used to estimate
tions from the Berkeley SoftFloat library. power. This tradeoff between precision and the number of
4.1.5. Optimizing Memory Accesses We find that our trans- zero coefficients is crucial since it allows us to construct a
formation of secret pointer dereferences introduces large per- defense that is tailored to the desired threat model and to the
formance overheads, since our compiler replaces such accesses performance envelope.
with instructions that touch every cache line occupied by the We compute the linear regression between microarchitec-
array. Instead of iterating over all possible cache lines, we re- tural events and power consumption using the Elastic Net
duce the performance impact of our compiler’s memory access regression technique using the glmnet R package, with α set
transformation by accessing only the desired memory location to 0.5, so that the regression model combines the benefits of
within the array while forcibly bypassing the cache during both the LASSO [61] and the Ridge regression [67] methods.
such memory accesses, thus preventing the cache events from For every 100 ms of program execution, we measure perfor-
being influenced by the secret address. We implement such mance events and energy consumption using the perf_events
accesses by modifying the Gem5 simulator, and our implemen- tool. For our test platform (an Intel x86 Haswell processor) we
tation adds the correct latency from such memory accesses choose 21 raw performance events (see Table 3) that form a
to the program’s execution time. Such accesses can also be superset of the microarchitectural events that can be controlled
implemented using uncacheable memory. using the x86 ISA. We use programs from the SPEC CPU
2006 benchmark suite with their training inputs, and we run
4.2. VANTAGE for Closed Source Power Model: RAPL them to completion for measuring performance and power,
resulting in approximately 420,000 power and performance
We now turn our attention to using a closed source power
measurements. Since our x86 processor supports the measure-
model (Intel RAPL) for discovering the instructions that may
ment of only four simultaneous performance events, we run
leak information through the power consumption. We create a
each SPEC program 21 times, measuring one performance
regression model between microarchitectural events and power
event in every instance of the execution.
consumption5 , and we use the model to devise code transfor-
mations that prevent the program from leaking information The Elastic Net regression technique offers the choice of
through the power consumption. Our regression model re- the λ parameter, whose value affects the number of non-zero
lates power consumption to microarchitectural events instead coefficients of the regression model. We select the value of
of instructions because multiple instructions may map to the λ after measuring its impact on the prediction error, which
same subset of microarchitectural events, making our analysis is strictly convex, thus permitting an iterative approach to
tractable for large instruction sets like x64. Moreover, indi- discover the value of λ that results in the smallest prediction
vidual instructions often result in multiple microarchitectural error. Figure 4 shows the prediction error after using cross-
events, making a model based on microarchitectural events validation on the SPEC CPU 2006 measurements. The digits
more accurate than one based on instructions. on the curve indicate the number of non-zero coefficients. We
observe that when λ = λmin = 0.00013, the Root Mean Square
4 https://github.com/embecosm/gnu-superopt (RMS) prediction error is close to 25 mJ, and that the cross
5 Indeed, correlation identified by the regression model does not imply validation error continues to be under the 25 mJ threshold until
causation. However, constructing our compiler-based defense using correla- λ = λthr = 0.00413.
tion (instead of causation) does not alter our solution’s security guarantees,
although it worsens the performance impact since the compiler is forced to 4.2.2. Validating the Regression Model We validate the ac-
control a larger set of microarchitectural events. curacy of the regression coefficients for both λ = λmin and
6
Performance Coeff. Coeff.
Type 0.100
Event for λmin for λthr
Root Mean Square Error (RMSE) in Prediction (Joules)

Branch Inst. 1.3×10−09 1.3×10−09
Mispred. Br. 8.9×10−09 3.0×10−09 ●
●
●
●
●●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●
●●
●
●●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
4.8×10−11 5.3×10−11
●
●
Instr- Executed u-ops 1●

●
●
●
●
●
●
0 0 0 0 0 0
0.075 1●
●
●
●
7.5×10−11 1.0×10−10
●
●
3
uction Issued u-ops 3
●
3
●
●
●
●
●
●
●
●
●
8.3×10−11 2.5×10−11
●
●
3●
CPU Clock ●
●
●
●
●
●
3
●
●
●
3
1.2×10−10 7.0×10−11
●●
3
Retired Inst. ●
●
●
●
●
●
●
●
3
3
●
●
●
4
4.3×10−08
●
●
DTLB LD Miss 0 0.050 ●
●
●
●
●
●
●
4
4
●
● 4
DTLB ST Miss 0 0 ●
●
●
●
●
●
●
4
4
1.5×10−06 8.6×10−07
●
●
5
ITLB Miss ●●
●●
●●
●
●
5
5
5
2.4×10−10 2.0×10−10
●●
ICache Read ●●
●●
●●
●
5
5
●●
ICache Miss 9.2×10−09 1.1×10−08 0.025 ● ●

●●
● ●●
7
6
Mem. ● ● ● ● ●
8
L1 Cache Hit 8.4×10−11 2.8×10−11 13 12 12 11
Access
L2 Cache Hit 1.2×10−08 8.0×10−09
L3 Cache Hit 0 0
L1 Cache Miss 0 0
0.000
L2 Cache Miss 0 0
log(λmin) log(λthr) −4 −3 −2 −1 0
L3 Cache Miss 0 0
log(λ)
Arith. u-ops 3.0×10 −10 0
AVX Inst 0 0
Math Figure 4: The prediction error increases as λ increases be-
AVX to SSE
Op. 0 0 yond λmin , which produces the smallest prediction error. We
Transitions
are interested in the value of λ for which the prediction error
SSE to AVX
0 0 is close to the smallest prediction error. Digits on the curve in-
Transitions
dicate the number of non-zero coefficients for the correspond-
Intercept ing values of λ .
— 1.1 1.2
(Avg Energy)
every secret array accesses can be substantially high. Hence,
Table 3: The 21 chosen performance events for computing the
we optimize accesses to small arrays by reading all cache
regression between microarchitectural events and energy con-
lines of the array, which achieves the same effect of making
sumption and their corresponding coefficients (after rounding
data cache hits and misses independent of the secret pointer.
to one decimal).
Appendix Figure 12 compares the performance overhead of
reading all bytes, reading all cache lines, and forcing cache
λ = λthr regression models by predicting the energy consump- misses for different array sizes. For secret arrays smaller than
tion of the processor while it executes the PARSEC bench- 16 KB, VANTAGE-RAPL reads all cache lines of the array,
mark applications [7]. Table 4 shows the prediction accuracy, further improving performance.
computed using the RMS error for every 100 ms of program
execution. The prediction accuracy of our regression models is 5. Evaluation
higher than 96%, so we believe that our models are sufficiently
We now evaluate the performance of our code transforma-
accurate.
tions using microbenchmarks (which test integer division, bit
4.2.3. Transforming Programs using VANTAGE-RAPL By scan operations, integer and floating-point comparisons, and
leveraging the flexibility of our regression model, we create floating-point arithmetic operations), and we also evaluate the
a power channel defense for λ = λthr , and we call the corre- security and performance of our full benchmarks.
sponding solution as VANTAGE-RAPL. The VANTAGE-RAPL
compiler extends the microprocessor-agnostic transformations 5.1. Performance of Microbenchmarks
of VANTAGE, by ignoring the transformation of TLB events, Appendix Figures 13, 14, and 15 evaluate the performance
since the data TLB coefficients are equal to zero. Specifi- impact of VANTAGE’s microprocessor-specific code transfor-
cally, VANTAGE-RAPL forces cache misses on every access mations. We observe that the slowdowns are substantial, es-
to a secret address using the clflush instruction. VANTAGE- pecially for the elementary floating-point operations on the
RAPL’s use of clflush is limited to only the secret addresses ARM32 target. However, software elementary floating-point
in the program, thus mitigating the chances of malicious use operations are only used on ARM32 targets without a hardware
of the instruction. floating-point unit. We also observe that floating-point instruc-
The performance penalty from forcing cache misses for tions do not dominate the dynamic instruction count in our
7
Accuracy of Accuracy of least three distinct inputs for each application. We use smaller
Benchmark Model when Model when data sizes for McPAT-based measurements since McPAT-based
λ = λmin λ = λthr simulations run many times slower than RAPL-based execu-
blackscholes 98.96 % 97.19 % tions. Our compiler detects vulnerabilities in, and accordingly
bodytrack 94.92 % 95.28 % transforms, all applications except the cryptographic kernels,
canneal 96.72 % 96.15 % since, as per our power models, the cryptographic kernels
facesim 97.72 % 98.21 % do not include instructions that leak information through the
ferret 98.06 % 95.52 % coarse-grain power channel.
fluidanimate 97.03 % 97.30 % We emphasize that the evaluation results are closely tied to
freqmine 94.61 % 95.85 % each power model, so results from one power model are not
raytrace 96.23 % 97.14 % directly comparable with results from the other power model.
streamcluster 96.46 % 95.74 % 5.2.1. Experimental Setup
vips 98.84 % 96.72 % VANTAGE uses the LLVM compiler [37] version 4.0 for trans-
Geo. Mean 96.75 % 96.51 % forming programs. For Intel RAPL measurements, we gather
performance and energy measurements every 100 ms on an
Table 4: Accuracy of the new regression models based on 8-core Intel Haswell processor clocked at 3.4 GHz. The pro-
100 ms measurements of a subset of the PARSEC benchmarks cessor contains 32 KB private L1 instruction and data caches,
using Intel RAPL. The remaining benchmarks failed to either 256 KB private L2 unified caches, and a shared 8 MB unified
compile or run on our platform. L3 cache. The processor runs Ubuntu 16.04 with Linux kernel
version 4.4.0. For measurements based on McPAT, we gather
benchmark applications, so their impact on the performance performance and power measurements every 1µs using the
of the full applications is relatively low. Gem5 microarchitectural simulator that models 1 GHz out-
of-order x64, ARM32, and ARM64 processors with 32 KB
5.2. Benchmark Applications L1 instruction and data caches, a 256 KB L2 cache, and an
We now evaluate the security and performance of VANTAGE 8 MB L3 cache. As a reference for performance comparison of
on x64, ARM32, and ARM64 targets with McPAT-based and VANTAGE-McPAT, we use a hypothetical single-issue in-order
RAPL-based measurements using 12 benchmarks. Since there fixed-function processor that does not use contextual infor-
are no standardized benchmarks for evaluating side channel mation from a compiler like VANTAGE, forcing the processor
defenses, we use commonly used programs whose inputs rep- to consume worst-case power and execution time for every
resent private or confidential information. These benchmarks operation. Our fixed-function processor consumes 1 cycle for
represent applications from four diverse categories: (1) gen- arithmetic operations, 2 cycles for branches, 3 cycles for CALL
eral user applications (comprising of a Font Renderer6 , a Hash instructions, and 400 cycles for memory references14 .
Table implementation7 , and a Bloom Filter implementation8 ), 5.2.2. Security Evaluation
(2) machine-learning kernels (comprising of Disparity Map
Methodology for McPAT-Based Measurements. Our
computer vision benchmark [65], the LibSVM Support Vector
McPAT-based measurements are obtained using the Gem5
Machine Classifier9 , and an implementation of the K-Means
simulator, enabling precisely reproducible results on every
clustering algorithm10 ), (3) graph kernels (which includes an
execution. To determine whether the power trace of the trans-
implementation of Top-k Search, the Bellman-Ford shortest
formed programs is independent of the secrets, we compute
path algorithm, and the Pagerank algorithm), and (4) crypto-
the SHA1 checksum for the power trace obtained using Gem5
graphic kernels (the Microsoft Lattice Cryptography Library11 ,
and McPAT, and we check whether the checksums are exactly
a Curve25519 elliptic curve implementation12 , and a Poly1305
identical even when the secret inputs differ. If identical, we
message authentication code implementation13 ). Nine of the
conclude that the power trace is independent of the secrets.
total 12 benchmarks are written by third party developers. For
each application, we mark its inputs as secret, and we use at Results for McPAT-Based Measurements. For all but the
cryptographic benchmarks, we observe that the non-secure
6 https://github.com/nothings/stb/blob/master/stb_easy_
execution produces a different SHA1 checksum (elided for
font.h
7 https://github.com/watmough/jwHash space) for each secret input, whereas programs transformed us-
8 https://github.com/bitly/dablooms ing VANTAGE-McPAT produce exactly identical SHA1 check-
9 https://github.com/cjlin1/libsvm
sums of the power traces regardless of the programs’ secret
10 https://wikicoding.org/wiki/c/k-means_clustering_
inputs. Indeed, as per our power model analysis, the crypto-
algorithm/
11 https://www.microsoft.com/en-us/research/project/ graphic kernels do not use vulnerable instructions that leak
lattice-cryptography-library secrets through the coarse-grain power channel.
12 https://github.com/agl/curve25519-donna
13 https://github.com/floodyberry/poly1305-donna 14 https://www.agner.org/optimize/instruction_tables.pdf
8
Fixed−Function Hardware Vantage−McPAT
x64 Target arm64 Target
1000 1000 494
Slowdown (X)
Slowdown (X)
461 480
246 266 258 338
135 174 130 171 137 135 166 137
122 123 91 96 109
80 65 47 53
26 30
10 4
7 10 4
3 3
2 2 2 2 2
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Font Renderer
Hash Table
Bloom Filter
Disparity Map
LibSVM
K−means
Top−K
Bellman Ford
Pagerank
Lattice Crypto
Curve25519
Poly1305
GEO−MEAN
Font Renderer
Hash Table
Bloom Filter
Disparity Map
LibSVM
K−means
Top−K
Bellman Ford
Pagerank
Lattice Crypto
Curve25519
Poly1305
GEO−MEAN
arm32 with Hardware Floating−Point Support arm32 with Software Floating−Point Support
1000 1000
Slowdown (X)
Slowdown (X)
472 367 386 332
178 140 161 146 141 174 186 161 144
105 100 109 91 79 99
78
56 60 46 49 67
35 28 35 28
10 4
6 10 4
7 7
3
2 2 2 2 2 2
1 1 1 1 1 1 1 1 1 1 1
Font Renderer
Hash Table
Bloom Filter
Disparity Map
LibSVM
K−means
Top−K
Bellman Ford
Pagerank
Lattice Crypto
Curve25519
Poly1305
GEO−MEAN
Font Renderer
Hash Table
Bloom Filter
Disparity Map
LibSVM
K−means
Top−K
Bellman Ford
Pagerank
Lattice Crypto
Curve25519
Poly1305
GEO−MEAN
Figure 5: Performance overhead of VANTAGE-McPAT on x64, ARM32, and ARM64 targets.
Methodology for RAPL Measurements. We collect 50

218
power profiles for every combination of the application and
Slowdown (X)
100
the secret input, and we feed these profiles to an Extreme 46
35
Gradient Boosting classifier, which we implement using the 17

10 8
6
xgboost [17] R package. We first randomly shuffle the power
2 2 2 2 2
profiles, before using one-third of the profiles for training the 1 1
classifier. We measure the accuracy of the classification on 1
Font Renderer
Hash Table
Bloom Filter
Disparity Map
LibSVM
K−means
Top−K
Bellman Ford
Pagerank
Lattice Crypto
Curve25519
Poly1305
GEO−MEAN
the remaining two-thirds of the profiles using the Area Under
the Curve (AUC) metric of the Receiver Operating Charac-
teristic (ROC) Curve. The random shuffling step perturbs the
classification accuracy on each execution, so if AUC is close
to 0.5, then we conclude that the adversary is unsuccessful
at launching a power channel attack. Consequently, if the
AUC for programs transformed using VANTAGE drops close
to 0.5, then we conclude that VANTAGE successfully defeats Figure 6: Performance overhead of VANTAGE-RAPL.
the power channel attack.
We observe that programs transformed using VANTAGE-RAPL
Results for RAPL Measurements. Table 5 shows the Area thwart the power channel attack.
Under the Curve (AUC) metric for ROC curves for the original 5.2.3. Performance Evaluation
and transformed programs using Intel RAPL measurements.
We find that 6 of the 12 benchmarks (shown in the top half of Results for McPAT Measurements. Figure 5 compares
the table) are vulnerable to power channel attacks. In particu- the performance overhead of programs running on the fixed-
lar, we observe that benchmarks whose dynamic instruction function processor versus that of programs transformed by
count depends on the secrets are more susceptible to power VANTAGE-McPAT. Since the fixed-function processor lacks
channel attacks using RAPL, while benchmarks whose mem- contextual information about the program, it needs to treat
ory address trace depends on the secrets are harder to attack. every operation as secret, thus forcing the worst-case execu-
9
Non-Secure VANTAGE see a 6× slowdown from the use of VANTAGE-RAPL.
Benchmark
Execution Execution
Application 6. Discussion
AUC AUC
Hash Table 0.94 0.51 Performance Overhead of VANTAGE. The performance
Disparity Map 0.86 0.46 overhead of VANTAGE stems from its strong security property
LibSVM Classifier 0.92 0.48 of making the running time of the application independent
Top-k Search 0.95 0.55 of the secrets, so as to not leak information through the to-
Page Rank 0.90 0.56 tal energy consumption. Like VANTAGE, any solution that
Bellman Ford 1.00 0.46 enforces a fixed energy consumption will need to enforce a
Font Renderer 0.57 0.50 worst-case execution time. However, we believe that VAN -
Bloom Filter 0.53 0.50 TAGE’s performance overhead can be reduced using aggressive
K-Means Clustering 0.56 0.54 compiler optimizations combined with modest microarchitec-
Lattice Cryto Key Exch. 0.44 0.53 tural changes.
Curve25519 ECC 0.55 0.42 Accuracy of Power Models. VANTAGE’s defense relies cru-
Poly1305 MAC 0.44 0.44 cially on the accuracy of the power model. However, many
Table 5: Area Under the Curve (AUC) metric for ROC curves
power models exist whose predictions are close to the actual
corresponding to non-secure and secure (VANTAGE) execution.
power consumption [12, 39, 56, 60, 34, 73]. For those power
We observe that six benchmark applications are vulnerable to
models that do not compute cycle-level and sub-cycle power
power channel attacks, and VANTAGE thwarts the attack in the
variations, existing hardware techniques such as Dual-Rail
transformed (VANTAGE) execution. Appendix Figure 16 shows
Precharge Logic [22, 62, 63, 53, 52] or capacitors [64, 70, 1]
the full ROC plots.
can be used to complement VANTAGE’s defense for closing
the power channel. Without VANTAGE’s defense, existing
techniques for power channel defenses fail to protect a large
tion time for every operation. In contrast, programs trans- majority of applications that operate on secret or private infor-
formed using the VANTAGE-McPAT compiler can transform mation, and which run on modern processors.
only those sections of the code that may leak secrets, thus
executing programs two to three orders of magnitude faster Other Physical Side Channels. Beyond power, there exist
than the fixed-function hardware. Among programs trans- systems that model other aspects of the program execution
formed by VANTAGE-McPAT, we observe that benchmarks such as heat [26, 27] and electromagnetic radiation [74, 38],
that access memory using secret pointers (Font Renderer and so our approach could be useful for mitigating other analog
Top-k) incur high overhead. We also find that on the ARM32 side channel attacks besides power channel attacks.
target with software floating-point ABI, benchmarks that use 7. Related Work
floating-point arithmetic (LibSVM, K-Means, and Pagerank)
experience substantial overheads due to their use of software We now compare our work to prior attempts in closing or
floating-point arithmetic. The Top-k application uses several mitigating the power channel.
comparisons (again, implemented in VANTAGE-McPAT in
Transistor-Level Solutions. Fine-grain power variations in
software), which results in higher overhead on the ARM32
any circuit can be eliminated at the level of transistors by either
targets compared to the ARM64 target. Across all analyzed
normalizing [22, 62, 63, 53] or randomizing [52] power con-
targets, we find that the mean overhead from using VANTAGE-
sumption. Unfortunately, these solutions do not protect from
McPAT is atmost 3×, while that from using a fixed-function
coarse-grain power variations occurring due to microarchitec-
hardware ranges between 99× and 141×.
tural optimizations including caching, prefetching, variable-
latency instructions, and predication. These solutions also
Results for RAPL Measurements. Figure 6 shows the per-
cannot be selectively turned off, so non-secret programs exe-
formance overhead of programs transformed by VANTAGE-
cute with the same overhead as secret programs. By contrast,
RAPL, where we observe that benchmarks which access mem-
VANTAGE is a compiler-based solution for eliminating coarse-
ory using secret pointers (Font Renderer and Top-k) or which
grain power variations that selectively transforms portions of
perform floating-point arithmetic on secret values (LibSVM,
programs for execution on modern processors based on the
K-Means, and Pagerank) experience the most slowdowns.
programs’ security requirements. VANTAGE complements
These results validate our understanding that VANTAGE’s
existing transistor-level solutions so that a broad-class of pro-
transformation of control flow is substantially cheaper than
grams can be protected from power-channel attacks.
its transformation of memory references and floating-point
computation, since it is more expensive to force cache misses Code Modifications. Several solutions [25, 20, 33, 40, 11,
or to perform dummy subnormal floating-point computation 32] eliminate fine-grain power variations by manually chang-
compared to executing dummy instructions. On average, we ing the source code of vulnerable programs, but they only
10
support programs that execute a fixed sequence of instruc- 8. Conclusion and Future Work
tions and which access a fixed set of memory locations in a
Until now, power channel defenses have only protected a lim-
pre-defined sequence. As we explain in our Threat Model (Sec-
ited class of applications, mostly cryptographic applications,
tion 3), VANTAGE imposes far fewer fundamental restrictions
on processors that disabled common microarchitectural opti-
on the program.
mizations. This paper shows how compiler-based techniques,
Virtual Secure Circuits (VSCs) [18] execute the original
which have been effective in closing digital side-channel at-
program concurrently with a so-called shadow program com-
tacks, can—in combination with existing defenses—be used
prising of instructions that are complementary to the original
to mitigate power side-channel attacks for a broad class of
program in such a way that the original and complementary
applications running on modern processors. The key observa-
instructions exercise different paths of logic circuits. Unfortu-
tion is that to bridge the gap between the program execution
nately, VSCs assume tight synchronization between programs
and the power consumption, we need a mapping from soft-
running on separate cores, they require caches and branch
ware events to power consumption, which can be provided by
predictors to be disabled, and VSCs are difficult to use with
existing power models.
programs containing branches, function calls, and floating-
By mapping software events to power consumption, VAN -
point arithmetic. Due to these limitations, unlike VANTAGE,
TAGE can eliminate variations in all software events—and
VSCs cannot be run on modern processors and with a broad
only those software events—that affect power consumption.
class a programs.
At the same time, by reasoning about power consumption at
Microarchitectural Solutions. Yang [69], Ambrose [2], the program level, VANTAGE can selectively apply mitigation
and May [43, 44] suggest closing the power channel by adding techniques only where needed.
noise using dynamic voltage and frequency scaling, out-of- Looking to the future, we can combine our compiler-based
order execution, register renaming, or by randomly inserting code transformations with existing cycle-level software power
random instructions. Unfortunately, these approaches merely side-channel defenses such as masking to provide a compre-
fix the symptoms of the problem, and they provide weak guar- hensive power channel defense. We also plan to improve the
antees for closing the power channel, since the running time performance impact of our solutions through microarchitec-
of programs can vary more than the noise introduced by per- tural enhancements.
turbations. In comparison, our approach fixes the root of the
problem (i.e. variations in the source code) and our evaluation References
is stronger than the evaluation used in the above approaches. [1] Alric Althoff, Joseph McMahan, Luis Vega, Scott Davidson, Timothy
Sherwood, Michael Taylor, and Ryan Kastner. Hiding Intermittent
Masking or Blinding Secrets. Secrets can be hidden or Information Leakage with Architectural Support for Blinking. In
“blinded”, either manually [51, 50] or semi-automatically us- International Symposium on Computer Architecture (ISCA), 2018.
[2] J. A. Ambrose, R. G. Ragel, and Parameswaran. RIJID: Random Code
ing a compiler [48, 6], by XOR-ing them with a randomly- Injection to Mask Power Analysis Based Side Channel Attacks. In
generated bit mask. The blinding process effectively random- ACM/IEEE Design Automation Conference, pages 489–492, 2007.
izes the power consumption. However, blinding schemes are [3] Marc Andrysco, David Kohlbrenner, Keaton Mowery, Ranjit Jhala,
Sorin Lerner, and Hovav Shacham. On Subnormal Floating Point and
inherently limited to a small class of application programs, Abnormal Timing. In Symposium on Security and Privacy (Oakland),
since instructions such as floating-point operations and con- pages 623–639, 2015.
[4] Andrew Baumann, Marcus Peinado, and Galen Hunt. Shielding Appli-
ditional branch operations cannot be transformed using blind- cations from an Untrusted Cloud with Haven. In Operating Systems
ing. Our solution protects applications that use conditional Design and Implementation (OSDI), pages 267–283, 2014.
branches and floating-point operations using code transforma- [5] Stephen D. Bay, Dennis Kibler, Michael J. Pazzani, and Padhraic
Smyth. The UCI KDD Archive of Large Data Sets for Data Mining
tions. In our approach, we rely on the processor’s ability to Research and Experimentation. SIGKDD Explorations Newsletter,
encrypt off-chip communication, thus obviating the need for 2(2):81–85, December 2000.
[6] A. G. Bayrak, F. Regazzoni, D. Novo, P. Brisk, F. X. Standaert, and
blinding off-chip communication. P. Ienne. Automatic Application of Power Analysis Countermeasures.
IEEE Transactions on Computers, 64(2):329–341, 2015.
Analysis of Power Models. Several models [12, 39, 56, 60, [7] Christian Bienia. Benchmarking Modern Multiprocessors. PhD thesis,
34, 73] of the processor’s power consumption exist but these Princeton University, January 2011.
models focus on characterizing the execution of programs for [8] Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Rein-
hardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R. Hower,
efficiency, whereas in this work, we devise a technique to elim- Tushar Krishna, Somayeh Sardashti, Rathijit Sen, Korey Sewell,
inate variations in power consumption. More specifically, we Muhammad Shoaib, Nilay Vaish, Mark D. Hill, and David A. Wood.
The Gem5 Simulator. SIGARCH Compututer Architecture News,
augment the compiler with information about power consump- 39(2):1–7, August 2011.
tion, so that the compiler can generate code that consumes [9] William Lloyd Bircher and Lizy K. John. Complete System Power
constant power regardless of application secrets. Unlike the Estimation Using Processor Performance Events. IEEE Transactions
on Computers, 61(4):563–577, April 2012.
regression model by McCann et al. [45], which models power [10] Barry Bond, Chris Hawblitzel, Manos Kapritsos, K. Rustan M. Leino,
consumption of instructions, VANTAGE’s regression model is Jacob R. Lorch, Bryan Parno, Ashay Rane, Srinath Setty, and Laure
Thompson. Vale: Verifying High-Performance Cryptographic Assem-
based on microarchitectural events, which are more generic bly Code. In USENIX Security Symposium (SEC), pages 917–934,
than instructions. 2017.
11
[11] Eric Brier and Marc Joye. Weierstrass Elliptic Curves and Side- [32] Marc Joye and Jean-Jacques Quisquater. Hessian Elliptic Curves and
Channel Attacks. In Practice and Theory in Public Key Cryptosystems: Side-Channel Attacks. In Cryptographic Hardware and Embedded
Public Key Cryptography (PKC), pages 335–345, 2002. Systems (CHES), pages 402–410, 2001.
[12] David Brooks, Vivek Tiwari, and Margaret Martonosi. Wattch: A [33] Marc Joye and Christophe Tymen. Protections Against Differential
Framework for Architectural-Level Power Analysis and Optimizations. Analysis for Elliptic Curve Cryptography. In Cryptographic Hardware
In International Symposium on Computer Architecture (ISCA), pages and Embedded Systems (CHES), pages 377–390, 2001.
83–94, 2000. [34] Andrew B. Kahng, Bin Li, Li-Shiuan Peh, and Kambiz Samadi.
[13] Chih-Chung Chang and Chih-Jen Lin. LIBSVM: A Library for Sup- ORION 2.0: A Fast and Accurate NoC Power and Area Model for
port Vector Machines. ACM Transactions on Intelligent Systems and Early-Stage Design Space Exploration. In Design, Automation and
Technology, 2:27:1–27:27, 2011. Test in Europe (DATE), pages 423–428, 2009.
[14] J. Chen, O. Olivo, I. Dillig, and C. Lin. Static Detection of Asymptotic [35] J. Kong, O. Aciicmez, J. Seifert, and H. Zhou. Hardware-Software
Resource Side-Channel Vulnerabilities in Web Applications. In Inter- Integrated Approaches to Defend Against Software Cache-Based Side
national Conference on Automated Software Engineering (ASE), pages Channel Attacks. In International Symposium on High Performance
229–239, 2017. Computer Architecture (HPCA), pages 393–404, 2009.
[15] Jie Chen and Guru Venkataramani. CC-Hunter: Uncovering Covert [36] Markus G. Kuhn. Cipher Instruction Search Attack on the Bus-
Timing Channels on Shared Processor Hardware. In International Encryption Security Microcontroller DS5002FP. IEEE Transactions
Symposium on Microarchitecture (MICRO), pages 216–228, 2014. on Computers, 47(10):1153–1157, October 1998.
[16] S. Chen, R. Wang, X. Wang, and K. Zhang. Side-Channel Leaks in [37] Chris Lattner and Vikram Adve. LLVM: A Compilation Framework
Web Applications: A Reality Today, a Challenge Tomorrow. In IEEE for Lifelong Program Analysis & Transformation. In Code Generation
Symposium on Security and Privacy (Oakland), pages 191–206, 2010. and Optimization (CGO), pages 75–86, 2004.
[17] Tianqi Chen and Carlos Guestrin. XGBoost: A scalable tree boosting [38] Bing Li, Mingzhu Lei, Meiyuan Chen, and Lanyong Zhang. Electro-
system. In International Conference on Knowledge Discovery and Magnetic Analysis of High-Frequency Digital Signal Processors.
Data Mining (KDD), pages 785–794, 2016. SpringerPlus, 5(1):1313, 2016.
[18] Zhimin Chen and Patrick Schaumont. Virtual Secure Circuit: Porting [39] Sheng Li, Jung Ho Ahn, Richard D. Strong, Jay B. Brockman, Dean M.
Dual-Rail Pre-charge Technique into Software on Multicore. IACR Tullsen, and Norman P. Jouppi. McPAT: An Integrated Power, Area,
Cryptology ePrint Archive, page 272, 2010. and Timing Modeling Framework for Multicore and Manycore Archi-
[19] Gilberto Contreras and Margaret Martonosi. Power Prediction for tectures. In International Symposium on Microarchitecture (MICRO),
Intel XScale Processors Using Performance Monitoring Unit Events. pages 469–480, 2009.
In International Symposium on Low Power Electronics and Design [40] Pierre-Yvan Liardet and Nigel P. Smart. Preventing SPA/DPA in ECC
(ISLPED), pages 221–226, 2005. Systems Using the Jacobi Form. In Cryptographic Hardware and
[20] Jean-Sébastien Coron. Resistance Against Differential Power Analysis Embedded Systems (CHES), pages 391–401, 2001.
for Elliptic Curve Cryptosystems. In Cryptographic Hardware and [41] Chang Liu, Austin Harris, Martin Maas, Michael Hicks, Mohit Ti-
Embedded Systems (CHES), pages 292–302, 1999. wari, and Elaine Shi. GhostRider: A Hardware-Software System for
[21] Stephen Crane, Andrei Homescu, Stefan Brunthaler, Per Larsen, and Memory Trace Oblivious Computation. In International Conference
Michael Franz. Thwarting Cache Side-Channel Attacks Through Dy- on Architectural Support for Programming Languages and Operating
namic Software Diversity. In Network and Distributed System Security Systems (ASPLOS), pages 87–101, 2015.
Symposium (NDSS), 2015. [42] Martin Maas, Eric Love, Emil Stefanov, Mohit Tiwari, Elaine Shi,
[22] J. L. Danger, S. Guilley, S. Bhasin, and M. Nassar. Overview of Krste Asanovic, John Kubiatowicz, and Dawn Song. PHANTOM:
Dual Rail with Precharge Logic Styles to Thwart Implementation- Practical Oblivious Computation in a Secure Processor. In Computer
Level Attacks on Hardware Cryptoprocessors. In Signals, Circuits and and Communications Security (CCS), pages 311–324, 2013.
Systems (SCS), pages 1–8, 2009. [43] David May, Henk L. Muller, and Nigel P. Smart. Non-Deterministic
[23] Howard David, Eugene Gorbatov, Ulf R. Hanebutte, Rahul Khanna, Processors. In Australasian Conference on Information Security and
and Christian Le. RAPL: Memory Power Estimation and Capping. Privacy (ACISP), pages 115–129, 2001.
In International Symposium on Low Power Electronics and Design [44] David May, Henk L. Muller, and Nigel P. Smart. Random Register
(ISLPED), pages 189–194, 2010. Renaming to Foil DPA. In Cryptographic Hardware and Embedded
[24] Dorothy E Denning and Peter J Denning. Certification of Programs Systems (CHES), pages 28–38, 2001.
for Secure Information Flow. Communications of the ACM, 20(7):504– [45] David McCann, Elisabeth Oswald, and Carolyn Whitnall. Towards
513, 1977. Practical Tools for Side Channel Aware Software Engineering: ’Grey
[25] Louis Goubin and Jacques Patarin. DES and Differential Power Anal- Box’ Modelling for Instruction Leakages. In USENIX Security Sympo-
ysis (The “Duplication” Method). In Cryptographic Hardware and sium, pages 199–216, 2017.
Embedded Systems (CHES), pages 158–172, 1999.
[46] Frank McKeen, Ilya Alexandrovich, Alex Berenzon, Carlos V. Rozas,
[26] Taliver Heath, Ana Paula Centeno, Pradeep George, Luiz Ramos, Yo- Hisham Shafi, Vedvyas Shanbhogue, and Uday R. Savagaonkar. In-
gesh Jaluria, and Ricardo Bianchini. Mercury and Freon: Temperature novative Instructions and Software Model for Isolated Execution. In
Emulation and Management for Server Systems. In Architectural Sup- International Workshop on Hardware and Architectural Support for
port for Programming Languages and Operating Systems (ASPLOS), Security and Privacy (HASP), pages 10:1–10:1, 2013.
pages 106–116, 2006. [47] David Molnar, Matt Piotrowski, David Schultz, and David Wagner. The
[27] Wei Huang, S. Ghosh, S. Velusamy, K. Sankaranarayanan, K. Skadron, Program Counter Security Model: Automatic Detection and Removal
and M. R. Stan. HotSpot: A Compact Thermal Modeling Methodology of Control-Flow Side Channel Attacks. In International Conference
for Early-Stage VLSI Design. IEEE Transactions on Very Large Scale on Information Security and Cryptology, pages 156–168, 2005.
Integration (VLSI) Systems, 14(5):501–513, 2006. [48] Andrew Moss, Elisabeth Oswald, Dan Page, and Michael Tunstall.
[28] C. Hunger, M. Kazdagli, A. Rawat, A. Dimakis, S. Vishwanath, and Compiler Assisted Masking. In Cryptographic Hardware and Embed-
M. Tiwari. Understanding Contention-Based Channels and Using ded Systems (CHES), pages 58–75, 2012.
Them for Defense. In International Symposium on High Performance [49] Steven Muchnick. Advanced Compiler Design and Implementation.
Computer Architecture (HPCA), pages 639–650, 2015. Morgan Kaufmann Publishers Inc., 1997.
[29] C. Isci and M. Martonosi. Phase Characterization for Power: Evalu- [50] Svetla Nikova, Christian Rechberger, and Vincent Rijmen. Threshold
ating Control-Flow-Based and Event-Counter-Based Techniques. In
High-Performance Computer Architecture (HPCA), pages 121–132, Implementations Against Side-Channel Attacks and Glitches. In In-
Feb 2006. ternational Conference on Information and Communications Security
(ICICS), pages 529–545, 2006.
[30] Canturk Isci and Margaret Martonosi. Runtime Power Monitoring
in High-End Processors: Methodology and Empirical Data. In Inter- [51] Elisabeth Oswald, Stefan Mangard, Norbert Pramstaller, and Vincent
national Symposium on Microarchitecture (MICRO), pages 93–104, Rijmen. A Side-Channel Analysis Resistant Description of the AES
2003. S-Box. In Fast Software Encryption (FSE), pages 413–423, 2005.
[31] Mohammad Saiful Islam, Mehmet Kuzu, and Murat Kantarcioglu. [52] Thomas Popp and Stefan Mangard. Masked Dual-Rail Pre-charge
Access Pattern Disclosure on Searchable Encryption: Ramification, Logic: DPA-Resistance Without Routing Constraints. In Crypto-
Attack and Mitigation. In Network and Distributed System Security graphic Hardware and Embedded Systems (CHES), pages 172–186,
Symposium (NDSS), 2012. 2005.
12
[53] S. Rammohan, V. Sundaresan, and R. Vemuri. Reduced Comple- [75] Hui Zou and Trevor Hastie. Regularization and Variable Selection
mentary Dynamic and Differential Logic: A CMOS Logic Style for via the Elastic Net. Journal of the Royal Statistical Society, Series B,
DPA-Resistant Secure IC Design. In VLSI Design (VLSID), pages 67:301–320, 2005.
699–705, 2008.
[54] Ashay Rane, Calvin Lin, and Mohit Tiwari. Raccoon: Closing Digital
Side-Channels Through Obfuscated Execution. In USENIX Conference Appendix
on Security Symposium, pages 431–446, 2015.
[55] Ashay Rane, Calvin Lin, and Mohit Tiwari. Secure, Precise, and Fast
Floating-Point Operations on x86 Processors. In USENIX Security 01: cmov(uint8_t cond, uint32_t __t, uint32_t __f) {
Symposium, pages 71–86, 2016. 02: uint32_t ret, tmp;
[56] Paul Rosenfeld, Elliott Cooper-Balis, and Bruce Jacob. DRAMSim2: 03: __asm__ volatile (
A Cycle Accurate Memory System Simulator. IEEE Computer Archi-
tecture Letters, 10(1):16–19, January 2011. 04: #if defined(__aarch64__)
[57] Daniel Sanchez and Christos Kozyrakis. Vantage: Scalable and Effi- 05: "tst %w[con], #0xff;"
cient Fine-grain Cache Partitioning. In International Symposium on 06: "csel %w[ret], %w[f], %w[t], eq;"
Computer Architecture (ISCA), pages 57–68, 2011. 07: #else
[58] Emil Stefanov, Marten van Dijk, Elaine Shi, Christopher W. Fletcher, 08: #if defined(__arm__)
Ling Ren, Xiangyao Yu, and Srinivas Devadas. Path ORAM: An Ex- 09: "neg %[tmp], %[con];"
tremely Simple Oblivious RAM Protocol. In Conference on Computer 10: "and %[ret], %[tmp], %[t];"
and Communications Security (CCS), pages 299–310, 2013.
11: "mvn %[tmp], %[tmp];"
[59] Bo Su, Junli Gu, Li Shen, Wei Huang, Joseph L. Greathouse, and Zhiy-
ing Wang. PPEP: Online Performance, Power, and Energy Prediction 12: "and %[tmp], %[tmp], %[f];"
Framework and DVFS Space Exploration. In International Symposium 13: "orr %[ret], %[ret], %[tmp];"
on Microarchitecture (MICRO), pages 445–457, 2014. 14: #else
[60] Shyamkumar Thoziyoor, Jung Ho Ahn, Matteo Monchiero, Jay B. 15: #if defined(__x86_64__)
Brockman, and Norman P. Jouppi. A Comprehensive Memory Mod- 16: "mov %[t], %[ret];"
eling Tool and Its Application to the Design and Analysis of Future
Memory Hierarchies. In International Symposium on Computer Archi- 17: "test %[con], %[con];"
tecture (ISCA), pages 51–62, 2008. 18: "cmove %[f], %[ret];"
[61] Robert Tibshirani. Regression Shrinkage and Selection via the Lasso. 19: #endif
Journal of the Royal Statistical Society, 58:267–288, 1994. 20: #endif
[62] K. Tiri, M. Akmal, and I. Verbauwhede. A Dynamic and Differen- 21: #endif
tial CMOS Logic with Signal Independent Power Consumption to
Withstand Differential Power Analysis on Smart Cards. In European 22: : [ret] "=&r" (ret), [tmp] "=&r" (tmp)
Solid-State Circuits Conference, pages 403–406, 2002. 23: : [con] "r" (cond), [t] "r" (__t),
[63] Kris Tiri and Ingrid Verbauwhede. A Logic Level Design Methodology 24: [f] "r" (__f)
for a Secure DPA Resistant ASIC or FPGA Implementation. In Design, 25: : "cc" );
Automation and Test in Europe (DATE), pages 246–251, 2004. 26: return ret;
[64] O. A. Uzun and S. Köse. Converter-Gating: A Power Efficient and 27: }
Secure On-Chip Power Delivery System. IEEE Journal on Emerging
and Selected Topics in Circuits and Systems, 4(2):169–179, June 2014.
[65] Sravanthi Kota Venkata, Ikkjin Ahn, Donghwan Jeon, Anshuman Figure 7: Conditional move operation for x64, ARM32, and
Gupta, Christopher Louie, Saturnino Garcia, Serge Belongie, and ARM64. The code does not leak the secret condition through
Michael Bedford Taylor. SD-VBS: The San Diego Vision Benchmark
Suite. In IEEE International Symposium on Workload Characterization power consumption.
(IISWC), pages 55–64, 2009.
[66] Zhenghong Wang and Ruby B. Lee. New Cache Designs for Thwart-
ing Software Cache-Based Side Channel Attacks. In International
Symposium on Computer Architecture (ISCA), pages 494–505, 2007. 01: uint32_t bit_scan_forward(uint32_t input) {
[67] Ralph A. Willoughby. Solutions of Ill-Posed Problems (A. N. Tikhonov 02: uint8_t n = 1;
and V. Y. Arsenin). Society for Industrial and Applied Mathematics 03: if ((input & 0xffff) == 0) {
(SIAM) Review, 21(2):266–267, 1979. 04: n += 16; input >>= 16; }
[68] Wei Wu, Lingling Jin, Jun Yang, Pu Liu, and Sheldon X.-D. Tan. A 05: if ((input & 0x00ff) == 0) {
Systematic Method for Functional Unit Power Estimation in Micro- 06: n += 8; input >>= 8; }
processors. In Design Automation Conference (DAC), pages 554–557,
2006. 07: if ((input & 0x000f) == 0) {
[69] Shengqi Yang, Wayne Wolf, N. Vijaykrishnan, D. N. Serpanos, and 08: n += 4; input >>= 4; }
Yuan Xie. Power Attack Resistant Cryptosystem Design: A Dynamic 09: if ((input & 0x0003) == 0) {
Voltage and Frequency Switching Approach. In Design, Automation 10: n += 2; input >>= 2; }
and Test in Europe (DATE), pages 64–69, 2005. 11: if (input == 0) {
[70] W. Yu and S. Köse. Time-Delayed Converter-Reshuffling: An Efficient 12: return 0; }
and Secure Power Delivery Architecture. IEEE Embedded Systems
13: return n + ((input + 1) & 0x01);
Letters, 7(3):73–76, Sept 2015.
[71] T. Zhang and R. B. Lee. CloudMonatt: An Architecture for Security
Health Monitoring and Attestation of Virtual Machines in Cloud Com- Figure 8: C Code for bit scan forward operation that is later
puting. In International Symposium on Computer Architecture (ISCA),
pages 362–374, 2015. transformed using the VANTAGE compiler.
[72] Yinqian Zhang and Michael K. Reiter. Düppel: Retrofitting commodity
operating systems to mitigate cache side channels in the cloud. In ACM
Conference on Computer and Communications Security (CCS), pages
827–838, 2013.
[73] Xinnian Zheng, Lizy K. John, and Andreas Gerstlauer. Accurate Phase-
Level Cross-Platform Power and Performance Estimation. In Design
Automation Conference (DAC), pages 1–6, 2016.
[74] Boyuan Zhu, Junwei Lu, and Erping Li. Electromagnetic Radiation
Study of Intel Dual Die CPU with Heatsink. In Symposium on Antennas,
Propagation and EM Theory, pages 949–952, 2008.
13
01: uint8_t cmp_ne(uint32_t x, uint32_t y) {
02: #if defined(__arm__)
03: register uint32_t ret, tmp;
__asm__ volatile (
Reason for executing variable
04: Instruction
05: "sub %[t], %[x], %[y];" number of micro operations
06: "sub %[r], %[y], %[x];" ENTER Value of the nesting depth
07: "orr %[r], %[r], %[t];" If the register operand is a
08: "lsr %[r], %[r], #31;" MOV
09: : [r] "=r" (ret), [t] "=&r" (tmp) segment selector register
10: : [x] "r" (x), [y] "r" (y) LODS, STOS,
11: : "cc" ); If instructions are prefixed
SCAS, CMPS,
12: return ret & 1; with REP
13: #else
MOVS, INS, OUTS
14: return x != y; IDIV Value of dividend (RAX)
15: #endif If processor executes far
16: } JMP, RET
jump or far return
}
If processor returns to
IRET
virtual 8086 mode
Figure 9: Not-equals comparison without predication.
INT, INT3 If processor is in long mode
Early termination if input is
01: uint8_t cmp_ugt(uint32_t x, uint32_t y) { BSR, BSF
zero
02: #if defined(__arm__)
03: register uint32_t ret;
Early termination if EAX
CMPXCHG8B,
04: __asm__ volatile ( or RAX does not match
CMPXCHG16B
05: "subs %[r], %[y], %[x];" with m64 or m128 respectively
06: "sbc %[r], %[r], %[r];" Early termination if
07: "neg %[r], %[r];" LLDT
08: : [r] "=r" (ret) invalid operand
09: : [x] "r" (x), [y] "r" (y) PSRLDQ, Value of immediate
10: : "cc" ); PSLLDQ (constant) operand
11: return ret;
12: #else Table 6: x64 instructions whose operand values trigger vari-
13: return x > y; able number of microcode operations.
14: #endif
15: }
}
Figure 10: Greater-than comparison without predication.
01: udivrem_32(uint32_t numerator, uint32_t denominator,

Cycles per 100 accesses
105 Stream over all bytes

02: uint32_t* quotient, uint32_t* remainder) { ●
03: uint32_t __quo = 0, __rem = 0;

● Stream over cache lines ●
4
04: int32_t i; 10 Force cache misses ●
●
05: for (i = sizeof(uint32_t) * 8 - 1; i >= 0; i--) { Combination ●
06: __rem <<= 1;
103
●
07: uint8_t q_bit = 1,num_i = (numerator >> i) & 1; ●

●
08: __rem |= num_i; ●
09: if (__rem >= denominator) { 102 ●
10: __rem -= denominator; ●

●
11: } else { 1
12: q_bit = (__quo >> i) & 1; 10 ●
●
13: }
1 KB
2 KB
4 KB
8 KB
16 KB
32 KB
64 KB
128 KB
256 KB
512 KB
1024 KB
2048 KB
4096 KB
8192 KB
14: __quo |= (q_bit << i);

15: }
16: if (quotient != NULL) {
17: *quotient = __quo; } Array Size
18: if (remainder != NULL) {
19: *remainder = __rem; }
20: }
Figure 12: Performance comparison of different memory ac-
cess options. ‘Combination’ represents the approach that se-
Figure 11: C Code for unsigned integer division that is later lects the best optimization based on data size.
transformed using the VANTAGE compiler. We mark the numer-
ator and the denominator inputs as secret.
14
32.3
Slowdown on x64 (X)
30
23.9
20
12.7
10 7.2
4.8
0
Int Division
Int Remainder
Bit Scan Fwd
Bit Scan Rev
GEO−MEAN
Figure 13: Performance overhead of transformed 32-bit divi-
sion, remainder, and bit scan code on the x64 target.
Slowdown on ARM32 (X)
Slowdown Relative to Software ABI

Slowdown Relative to Hardware ABI
773 774 865
1000 692 639
298
118 105 108
59 76
31
10
add−32
sub−32
mul−32
div−32
sqrt−32
GEO−MEAN
Figure 14: Performance overhead of transformed 32-bit ele-

mentary floating-point operations on ARM32 software ABI tar-
get.
Slowdown on ARM32
16.2
15
11.9
10 9.2 9.1
7.2
5.7
5 2.6
1.7 2.1
0
Int Div
Int Rem
Int Not Eq.
Int Greater Than
Int Greater or Eq.
FP Not Eq.
FP Greater Than
FP Greater or Eq.
GEO−MEAN
Figure 15: Performance overhead of transformed 32-bit divi-

sion, remainder, and comparison on ARM32 target.
15
Font Renderer Hash Table Bloom Filter
1.00 1.00 1.00
True Positive Rate
True Positive Rate
True Positive Rate

0.75 0.75 0.75
0.50 0.50 0.50
0.25 Non−Secure: 0.57 0.25 Non−Secure: 0.94 0.25 Non−Secure: 0.53

Vantage−RAPL: 0.5 Vantage−RAPL: 0.51 Vantage−RAPL: 0.5
0.00 0.00 0.00

0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
False Positive Rate False Positive Rate False Positive Rate
Disparity Map LibSVM Classifier K−means Clustering
1.00 1.00 1.00
True Positive Rate
True Positive Rate
True Positive Rate

0.75 0.75 0.75
0.50 0.50 0.50

0.00 0.00 0.00

0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
Top−K Search Page Rank Bellman Ford
1.00 1.00 1.00
True Positive Rate
True Positive Rate
True Positive Rate
0.75 0.75 0.75
0.50 0.50 0.50
0.25 Non−Secure: 0.95 0.25 Non−Secure: 0.9 0.25 Non−Secure: 1

0.00 0.00 0.00

0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
Lattice Crypto Key Exchange Curve25519 ECC Poly1305 MAC
1.00 1.00 1.00
True Positive Rate
True Positive Rate
True Positive Rate
0.75 0.75 0.75
0.50 0.50 0.50

0.00 0.00 0.00

0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
Figure 16: ROC curves with the AUC metric for non-secure execution and VANTAGE-RAPL execution.
16

Tech Report ASPLOS-19 #208

Diunggah oleh

Informasi Dokumen

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Tech Report ASPLOS-19 #208

Diunggah oleh

Hak Cipta:

Format Tersedia

Digital Methods for Closing Analog Side Channels

Abstract Unfortunately, these solutions are not compatible with mi-

Dynamic Power (Watts)

4. Design of the VANTAGE Compiler

Microprocessor-Agnostic Transformations. Since branch Automatically Generating Transformed Instructions.

lines and TLB entries, regardless of the secret addresses. 2 https://www.musl-libc.org

Root Mean Square Error (RMSE) in Prediction (Joules)

Instr- Executed u-ops 1●

ICache Miss 9.2×10−09 1.1×10−08 0.025 ● ●

Methodology for RAPL Measurements. We collect 50

Gradient Boosting classifier, which we implement using the 17

Figure 10: Greater-than comparison without predication.

01: udivrem_32(uint32_t numerator, uint32_t denominator,

105 Stream over all bytes

03: uint32_t __quo = 0, __rem = 0;

07: uint8_t q_bit = 1,num_i = (numerator >> i) & 1; ●

10: __rem -= denominator; ●

14: __quo |= (q_bit << i);

Bit Scan Fwd

Bit Scan Rev

Slowdown Relative to Software ABI

Figure 14: Performance overhead of transformed 32-bit ele-

Int Not Eq.

Int Greater Than

Int Greater or Eq.

Figure 15: Performance overhead of transformed 32-bit divi-

True Positive Rate

True Positive Rate

0.50 0.50 0.50

0.25 Non−Secure: 0.57 0.25 Non−Secure: 0.94 0.25 Non−Secure: 0.53

0.00 0.00 0.00

True Positive Rate

True Positive Rate

0.50 0.50 0.50

0.25 Non−Secure: 0.86 0.25 Non−Secure: 0.92 0.25 Non−Secure: 0.56

0.00 0.00 0.00

True Positive Rate

True Positive Rate

0.75 0.75 0.75

0.50 0.50 0.50

0.25 Non−Secure: 0.95 0.25 Non−Secure: 0.9 0.25 Non−Secure: 1

0.00 0.00 0.00

True Positive Rate

True Positive Rate

0.75 0.75 0.75

0.50 0.50 0.50

0.25 Non−Secure: 0.44 0.25 Non−Secure: 0.55 0.25 Non−Secure: 0.44

0.00 0.00 0.00

Anda mungkin juga menyukai

03: uint32_t quo = 0, rem = 0;