Endurance-Aware Security Enhancement in Non-Volatile Memories Using Compression and Selective Encryption

This article has been accepted for publication in a future issue of this journal, but has not been
fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TC.2016.2642180, IEEE
Transactions on Computers
IEEE TRANSACTIONS ON COMPUTERS
Endurance-Aware Security Enhancement in

Non-Volatile Memories Using Compression and
Selective Encryption
Majid Jalili, Student Member, IEEE, Hamid Sarbazi-Azad
AbstractEmerging non-volatile memories (NVMs) are notable candidates for replacing traditional DRAMs. Although NVMs are
scalable, dissipate lower power, and do not require refreshes, they face new challenges including shorter lifetime and security issues.
Efforts toward securing the NVMs against probe attacks pose a serious downside in terms of lifetime. Cryptography algorithms increase
the information density of data blocks and consequently handicap the existing lifetime enhancement solutions like Flip-N-Write.
In this paper, based on the insight that compression can relax the constraints of lifetime-security trade-off, we propose CryptoComp,
an architecture that, taking the advantage of block size reduction after compression, aims to enhance the memory system lifetime and
security. Our idea is to limit the avalanche effect caused by encryption algorithms in a lower space through compression and selective
encryption. This way, for highly compressible data blocks, we follow a fully-encryption approach while for poorly compressible data
blocks, we rely on a non-deterministic fine-grain selective-encryption mechanism. Additionally, a simple and block-oriented wearleveling scheme is presented to fairly distribute the bit flips on memory cells. Our experimental results show 3.59 and 3.66 lifetime
improvements over two state-of-the-art schemes, DEUCE and i-NVMM, while imposing a negligible performance degradation of 2.1%,
on average.
Index TermsNon-volatile memory, main memory, phase change memory, hard error, lifetime, security.
I NTRODUCTION
RAM memory has been the popular choice of architects for designing computer systems over the past
decades. In recent years, to overcome the problems of
traditional memory systems such as scalability and high
static power consumption, other memory technologies,
including emerging non-volatile memories (NVMs) have
attracted much attention. Phase change memory (PCM),
flash memory, spin transfer torque (STT-RAM) and ferromagnetic RAM (FeRAM) have been employed in various
levels of memory hierarchy [1][6]. These memories have
some advantages and disadvantages. Low static power
consumption, non-volatility and proper scalability are the
main prominent features of NVMs, while long access latency, complicated peripheral circuitry, short lifetime and
security vulnerabilities are considered as their shortcomings [2], [7][12].
Among the mentioned problems associated with these
memories, short lifetime is a main concern and needed to
be addressed properly [13][18]. In this regard, NVM designers proposed the read-before-write (RBW) technique
(also known as Data-Comparison Write) [19] to mask the
unchanged bits during write operation in order to reduce
bit flips per write [7]. RBW is a simple and effective
M. Jalili is with the Department of Computer Engineering, Sharif

University of Technology, Teran, Iran E-mail: majalili@ce.sharif.edu.
H. Sarbazi-Azad is with the Department of Computer Engineering,

Sharif University of Technology, and the School of Computer Science,
Institute for Research in Fundamental Sciences (IPM).
Manuscript received June 19, 2016; revised September 29, 2016.
solution and reduces cell updating rate to 15%, on average

[7], [17], [19], [20].
Moreover, due to advances in Full Disk Encryption
paradigm [21][25], the probability of information leakage
at disk level is reduced and memory-centric attacks have
been pushed to higher levels of memory hierarchy where
security concerns do not exist traditionally. Hence, higher
levels of memory need to be secured especially when
NVMs are used [26][31]. The short lifetime of NVMs in
one hand, and cost of encryption (which can increase the
cell update rate to more than 50% [20]) on the other hand,
severely damage the endurance of memory system [20],
[31]. In this paper, we attempt to improve the security
level of a non-volatile main memory without reducing
its lifetime. The idea is to use compression algorithms
to reduce the size of memory data blocks in order to
protect most of the cells from updating overheads caused
by encryption. Then, using a selective encryption scheme,
we convert the content of the compressed data blocks to
cipher-text. Throughout the paper, we show that the proposed selective encryption algorithm, taking the advantage of data block size reduction and a random scheme,
converts the plain-text to cipher-text with fine granularity
and high coverage. Furthermore, to remove the pressure
from certain cells facing more bit flips, we adopt a shiftbased mechanism for uniformly distributing bit flips over
the cells in a memory line. The main contributions of this
research can be summarized as follows:
Exploiting compression to bound the avalanche effect.
In cryptography, a well-built encryption algorithm should
change at least half of the cypher-text for a slight change
0018-9340 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TC.2016.2642180, IEEE
TABLE 1
Lifetime of different memory technology prototypes fabricated during 2000-2014 [2].
in the source text. Such a desirable feature of an encryption algorithm, called avalanche effect, increases the bit-flip
rate of memory blocks and thus reduces NVM lifetime.
We show that conventional compression algorithms can
control the avalanche effect and provide a lifetime-security
trade-off.
Proposing the Compressed Block Selective Encryption
(CBSE) scheme. We show that for a large fraction of data
blocks, we can follow a fully-encryption approach while
for the remaining blocks we can selectively encrypt a
proper amount of data to ensure an acceptable security
level of the main memory. A more aggressive scheme
(named CBSE+) can also fully encrypt the data blocks
with high-entropy which are not compressed efficiently,
and thus enter into the selective encryption process in the
proposed algorithm.
Employing the Rotational Block Starting Segment
(RBSS) scheme. Due to the variation in bit flips locations,
we propose a technique to remove the stress from certain
cells. This scheme partitions the block locations into some
segments and tries to uniform the writes over them.
Based on above techniques, we propose CryptoComp,
an architecture that addresses the security and lifetime
concerns in a non-volatile main memory. Our evaluation
results, taken from a full-system simulator, reveal that
CryptoComp can effectively outperform all previous stateof-the-art architectures in terms of lifetime. More precisely, CryptoComp improves the lifetime by 3.59 and
3.66 over DEUCE [20] and i-NVMM [32], respectively.
Additionally, CryptoComp covers 99% of memory blocks
(i.e. 99% of memory blocks are encrypted when power
failure occurs) which is 11% higher than the most secured
configuration of i-NVMM with 88% coverage.
The rest of the paper is organized as follows. In Section
2 we discuss the lifetime problem of NVMs and the effects
of encryption. Section 4 presents the evaluation methodology. Section 3 describes previous works and highlights
their pros and cons. Section 5 explains our proposal.
Section 6 reports the evaluation results and comparison
to the state-of-the-art architectures. Finally, we conclude
the paper in Section 10.
P ROBLEM D EFINITION
The primary objective of this work is to address the short

lifetime problem of NVMs when encryption is employed.
Here, we clarify the problem by answering the following
questions: 1) why a non-volatile main memory need to
Write Per Block (billion)
DRAM
SRAM
Flash (NAND):18
Flash (NOR):35
Phase Change Memory (PCM) :42
Spin-Transfer Torque Magnetic RAM (STT-MRAM):9
Resistive Random-Access Memory (ReRAM):65
Endurance (cycles)
Max Min
Average
1016 1016
1016
16
16
10
10
1016
8
3
10
10
6.16 106
107
103
5.32 105
11
3
10
10
5.48 109
16
5
10
10
1.25 1015
12
3
10
10
1.92 1010
1.1
1
0.9
0.8
Baseline
0.7
0.6
0.5
3.31X
0.4
0.3
Full-Encryption
0.2
0.1
10 20 30 40 50
Bit Flips Per Write (%)
Average
1016
1
1
6.16 1010
5.32 1011
5.48 107
1.25 101
1.92 106
3
Write Per Block (billion)
Memory Technology :#Number of chips
60
128 bits
2.5
2
2.25
256 bits
1.5
1
0.5
0
384 bits
512 bits
10
20
0.255
30
40
50
60
Bit Flips Per Write (%)
Fig. 1. Bit-flip rate vs lifetime. When an encryption algorithm is employed, bit flips per write are increased from 12.5% to 50% (left)
resulting in 3.31 lifetime reduction. Effect of restricting the avalanche
effect on the number of tolerable write (right). Limiting the damage
caused by the avalanche effect, can effectively improve the lifetime
under different bit-flip rates.
be secured? And 2) why efforts toward a secure NVM

degrade the lifetime?
Adopting NVMs imposes new security challenges that
did not exist before. As an example, at the time of a
power event [32], whole memory content (with traditional
technologies e.g. SRAM and DRAM) is lost due to power
outrage. Hence, if either the memory DIMM or CPU chip
is stolen, there is no information on it that can be used by
the attackers [20]. However, for NVMs, the lack of power
is harmless and the whole memory content is persistent
and it opens a door for an attacker to access the stored
data. Besides, recent efforts toward securing the lower
level storage devices (e.g. disks) pushes the memorycentric attacks to higher levels (e.g. main memory) [21]
[25]. Therefore, securing the content of non-volatile main
memories is essential to protect them against memorycentric attacks [30], [31], [33], [34].
To answer the second question, note that NVMs naturally tolerate a limited number of write operations [1].
Table 1 compares the lifetime of 7 different volatile and
non-volatile memories based on real measurements in
prototypes fabricated during 2000-2014 [2]. For volatile
memories, e.g. SRAM and DRAM, each cell tolerates at
least 1016 write operations. However, for NVMs (e.g.
PCM, ReRAM and Flash), the amount of tolerable write
operations is relatively low. For example, a PCM has
5.48 107 shorter lifetime than a typical DRAM [2].
The reason behind lifetime degradation after employing encryption is due to the main feature of encryption
algorithms [20], [31]. According to [35], a good encryption
S TATE - OF - THE - ART A RCHITECTURES
Because of asymmetric read and write latencies (read is

4-5 faster than write [1]) and non-destructive nature of
read operation, it is advantageous to read the content of
a block before the actual write to mask the unchanged
bits. Based on this fact, several works tried to increase
the number of masked cells by manipulating the block
content through mapping and coding [7], [17], [36], [37].
Unfortunately, since the content of a block fundamentally
changes after applying encryption, the effectiveness of
Start B0
Bank1
Bank0
Rot (i+1)%512
131KB
B1
B2
Start B0
Start B0
Gap
Gap
Bank3
Bank2
B1
Rot i % 512
64B
Gap
Rot (i+1)%512
Gap
(a)
(b)
Start B0
Start B0
i Start B0
Gap
Gap
Memory Rank
Rot 0
8M
algorithm in the presence of a slight modification to the

source data should change at least half of the data bits
in the encrypted data. This phenomenon, first discovered
by Shanon [35], refers to avalanche effect in cryptography
literature. Although avalanche effect is considered as a
desirable property of an encryption algorithm, it is very
harmful to NVMs. Since changing at least half of the bits
in cipher-text means higher activity of non-volatile storage
cells, it makes the read-before-write (RBW) technique inefficient and consequently degrades the memory lifetime
[20].
In order to observe the volume of damage caused by
the avalanche effect, we conducted a statistical experiment based on the well-known methodology presented
in [13]. In this experiment, a main memory system with
low number of blocks (10K blocks), each of 512 cells,
is evaluated. The process variation is not modeled and
each cell can tolerate exactly 1 billion writes. Based on a
uniform random generator, the lifetime counter for each
cell is decreased to consider the bit flips per write. We
continue to issue the write operations to the memory until
the first cell error appears. Finally, the last write number
is reported as the lifetime indicator. Fig.1(left) shows the
results of this experiment where the bit-flip rate is varied
from 10% to 60%. As can be seen in the figure, when we
increase the bit flips per write, the number of tolerable
write operations decreases. Typically, bit-flip rate in a unencrypted and a fully-encrypted memory system is 12.5%
and 50%, respectively [20]. This growth in bit-flip rate
decreases the system lifetime by a factor of 3. Hence,
securing a non-volatile memory comes with the cost of
lifetime reduction.
If the damage caused by the avalanche effect is limited,
we can enjoy the advantages of encryption without lifetime overhead. One way to do so is to use compression as
it reduces the size of each data block and therefore lowers
the number of cells involved in encryption. To examine
the effectiveness of limiting the avalanche effect on the
lifetime, some modifications are applied to the employed
statistical simulator. We assumed for each experiment, the
size of data blocks is reduced to less than 512 bits, i.e.
384, 256 and 128 bits, and bit-flips occur uniformly among
them. For various bit-flip rates, the write number at which
the first cell error happens is reported in Fig.1(right). As
can be seen in the figure, if we restrict the avalanche effect
window from 512 to 128 bits (e.g. using a compression
method with a compression ratio of 4) for a bit-flip rate of
40% per write, the lifetime can be ideally improved from
0.225 to 2.25 billion writes, i.e. a lifetime boost of 10.
3
4KB
Bank4
Bank5
Gap
Start B0
Start B0
Gap
Bank7
Bank6
Gap
Start B0
Gap
Start B0
(c)
Fig. 2. Illustration of HWL mechanism: (a) Initially, all blocks are written
to the memory system with no rotation; (b) The state of blocks and
their corresponding RotationAmount after some Start-Gap rounds.
(c) The agility of HWL can be increased using a per-bank scheme.
these methods is reduced significantly. As a matter of

fact, [20] showed that after encryption, Flip-N-Write [7]
decreases the bit-flips per write by only 5% [20] while it is
very effective for an un-encrypted baseline.
DEUCE [20] is the most recent work that tries to solve
lifetime/security problem using the observation that a
typical writeback only changes a few words in the written
block; so DEUCE re-encrypts only the words that have
changed. DEUCE, based on prior works [38] [39], removes
the encryption process from the critical path using the
counter mode AES (Advanced Encryption System) and
one-time pad (OTP). Then, it reduces bit-flips per write
by bounding the avalanche effect to the modified bytes
using the same key with different counters for a certain
number of writes. Additionally, to uniform the variation at
block and cell level, Start-Gap [40] and Horizontal WearLeveling (HWL) are used, respectively. The system lifetime
is doubled compared to the fully-encrypted baseline when
DEUCE, Start-Gap and HWL are employed.
HWL rotates the content of each block, based on an
algebraic equation over time. This equation determines
the amount of rotation according to the behavior of the
external wear-leveling algorithm, i.e. Start-Gap. In fact,
Start-Gap moves a gap gradually over the block locations
in the memory and spreads hot blocks across the whole
memory space.
Fig. 2(a) and (b) show the top-level overview of StartGap. In this scheme, the content of the block at location
Gap 1 moves to location Gap and its state changes to
Invalid every N writes. Hence, a typical hot block cannot
stay at a fixed location over a long period of time and
consequently, the whole memory space is used almost
uniformly. Besides, the location of each block in the memory can be determined easily by knowing the location of
Start and Gap. The key insight behind HWL is to rotate
the content of each memory block according to Start and
Gap. The number of bit rotations is given by:
TABLE 2
Baseline configurations.
Processor
L1 Cache
L1 Coherency
L2 Cache
DRAM Cache
Main Memory
Flash SSD
4-core ALPHA21264, 2.0GHz.

Split I and D cache; 32KB private; 4-way; 64B line size; LRU; write-back; 1 port; 2ns latency.
MOESI directory; 42 grid packet switched NoC; XY routing; 3 cycle router; 1 cycle link.
4MB; UCA shared; 16-way; 64B line size; LRU; write-back; 8 ports; 4ns latency.
16MB; 4-way; 64B line size; LRU; write-back; 8 ports; 26ns latency.
8 GB: 16 banks, 64 B, open page, SLC: Read Latency 80 ns (6ns tPRE + 69ns tSENSE + 5ns tBUS),Write Latency 250ns.
25s latency.
consequently, it opens a 25-minute window for an attacker

to obtain the plain-text. Worse, since i-NVMM is a pagelevel technique, the un-protected portions are in 4KB granularity (size of a typical page in memory system) which
means more security vulnerabilities.
Exploiting the benefits of compression for energy and
performance goals in NVMs was originally proposed in
[41]. More clearly, authors used compression to change
the storage type of an MLC PCM system to SLC in order
to reduce the heavy cost of MLC programming in terms
of latency and energy consumption. Using compression
for improving different aspects of PCM reliability was
proposed in [42].
Based on the compression ratio, a new cache policy
management was introduced in [43][45]. Using compression for removing memory bottleneck in GPUs was
suggested in [46]. LCP [47] tried to solve the misalignment problem in a compressed cache by modifying the
compression engine to generate a same-size block. In [48],
a hybrid DRAM/PCM architecture was proposed that exploits compression to enhance lifetime and performance.
TABLE 3
Workloads characteristics under baseline configuration.
Memory Traffic (GB/s)

Read
Write
Memory Intensive-Highly Compressible
milc
5.56
3.17
1.56
mcf
5.26
18.64
6.22
Memory Intensive-Moderately Compressible
gems
3.66
9.34
3.82
libq
2.91
7.49
3.04
Memory Intensive-Poorly Compressible
leslie
1.37
10.01
3.32
lbm
1.04
13.23
9.7
Not Memory Intensive-Highly Compressible
sjeng
8.44
0.76
0.73
go
6.47
0.23
0.19
astar
4.54
0.26
0.23
gcc
4.01
1.49
0.74
Not Memory Intensive-Moderately Compressible
zeusmp
2.97
0.01
0.01
deal
2.89
0.24
0.1
gamess
2.82
1.02
0.55
omnetpp
2.82
2.57
0.1
sphinx
2.72
2.16
0.24
Not Memory Intensive-Poorly Compressible
bzip2
1.24
0.95
0.69
perl
1.87
1.26
0.25
namd
1.75
0.07
0.06
gromacs
1.6
0.21
0.09
hmmer
1.02
0.94
0.88
Application
Comp. Ratio
RotationAmount = Start mod BitsInLine

Start,
if Gap has crossed the line
Start=
Start+1, Otherwise
(1)
(2)
i-NVMM [32] postpones the encryption process to a

proper time by determining the hot blocks. Proper time
is the moment that i-NVMM predicts the page that will
not be accessed soon. It scrubs the memory system periodically and by predicting the cold pages, it incrementally
encrypts the memory content. A drawback of i-NVMM is
that it does not fully secure the memory at the time of
a power event. However, compared to a fully-encrypted
system, it lowers the lifetime degradation since highly
reusable blocks do not involve in the encryption and
decryption processes. By scrubbing the memory system
every 5 billion cycles, i-NVMM keeps 76% of the memory
content encrypted. Although this technique is helpful in
terms of system performance and lifetime, in average
about 24% of memory content remains un-encrypted and
E VALUATION M ETHODOLOGY
We perform micro-architectural level simulation of an outof-order processor model with ALPHA ISA using gem5
simulator [49]. NVSim [50] is used for detailed area, power
and timing models of the memory hierarchy. A 4-core
CMP system with 3 levels of caches and a PCM main
memory is considered. L1 is configured to be a private
cache while L2 and L3 are shared among the 4 cores. To
accommodate the long write latency of the PCM memory,
a large DRAM cache (16MB) and a SSD of unlimited size
as the lowest storage device with 25s response time are
employed. The line size of all caches and main memory is equal to 64B. Table 2 summarizes the evaluated
system. A set of multi-program applications from SPECCPU 2006 suite [51] are selected and characterized in
Table 3. All workloads were compiled with -O3 flag on
an Ubuntu 64-bit Linux system using GCC and Fortran
compilers. Since CryptoComps performance depends on
memory access rate and compression ratio, we selected
workloads with different combinations of memory access
intensity and compression ratio values. Main Memory
Write Traffic (WT) and Compression Ratio (CR) of the
accessed data are used to categorize the workloads. Applications with WT>1GB/S are called Memory-Intensive
(MI); Not-Memory-Intensive (NMI) applications are those
with WT<1GB/S . Workloads with CR>4 are categorized
as Highly-Compressible (HC), while applications with
2<CR<4 are known as Moderately-Compressible (CM);
60
50
40
30
20
10
0
6
5
4
3
2
1
0
(a) Bit-Flip Percentage
Baseline
Full-Encryption
Comp.-Encryp.(active)
Comp.-Encryp.(all)
(b) Average Entropy
lib
b
g
z
g
lc zip2 ame eusm o
ss
p
mi
gem na hm les dea mc sph lbm gro gcc om per ast sje Av
ar ng g
f
net l
ma
l
inx
s md mer lie
pp
cs
Fig. 3. Bit-flip rate and average information entropy in 4 systems: i) baseline, ii) baseline+ecryption, iii) compression+encryption (only active
cells are considered), and iv) compression+encryption (all cells are considered). Increasing the information entropy, increases the bit-flip rate
except in systems iii and iv that limit the avalanche effect.
E=|
CBSE
4
LLC Port
AES
AES
d
Decomp.
AES
AES
c
Rotator
b
Fig. 4. Memory controller in the proposed scheme. Arrows 16 indicate the write path and ae path indicates the read path in memory
controller.
finally, applications with CR<2 are considered as PoorlyCompressible (PC). Hence, we have 6 workload classes,
namely MI-HC, MI-CM, MI-PC, NMI-HC, NMI-MC and
NMI-PC. Simulations are conducted for 8 billion instructions and the results of first 4 billion instructions are
ignored as warm-up.
T HE P ROPOSED S CHEME
In this section, we take three steps to present the proposed scheme: 1) we investigate the reason behind the
inefficiency of RBW technique for an encrypted PCM main
memory and show how RBW technique can be revived
to work well when encryption is employed; 2) a detailed
plan to progress toward our goal is then introduced,
and finally 3) the modified architecture is shown. Toward
these goals, we want to modify the memory controller to
employ a selective encryption scheme, as shown in Fig.
4. More clearly, we devise a compression based selective
encryption scheme along with data rotation unit to achieve
both the security and uniformity in cell usage. So, we
compress the data block first and then apply our selective
encryption. Then, using a shift register we rotate the
content of the memory line.
5.1
is given by:
Rotator
Read Buffer
Write Buffer
Comp.
DIMM Port
Reviving RBW
Uncertainty level of the source data is mainly measured by

information entropy, also known as Shanon entropy, which
M
ax
X
i=1
pi log
1
|,
pi
(3)
where pi is the appearance probability of character i in the

given input text. There exists an inverse-relation between
information entropy and redundancy: higher information
entropy implies lower redundancy and vice versa [35]. Redundancy is a desirable feature in NVMs, since it usually
results in less bit-flips per block and favors RBW scheme
[20].
However, it is expected that by applying a solid encryption algorithm to a plain-text, its corresponding entropy (redundancy) increases (decreases) significantly [35].
As a matter of fact, without encryption the bit-flip rate is
about 12.5% (measured over a rich set of workloads), but
applying encryption increases it to almost 50% [20].
To further illustrate this phenomenon, Fig.3 investigates the average entropy for different applications along
with their corresponding bit-flip rates for 3 systems. As
can be seen, the lifetime of a system that runs an application with low-entropy content is longer than one with
higher entropy. For the baseline system, while there are
high amounts of redundancy (average entropy = 0.95), bitflip rate is less than 15%. However, when bit concentration
increases by adopting an encryption algorithm (average
entropy = 5), bit-flip rate increases (about 50%). Hence,
increasing the security level of a memory system is associated with bit-flip rate growth (e.g. RBW is 4 less
effective).
Based on the above discussion, information entropy
is increased as a result of encryption which it leads to
inefficiency of RBW and, hence, shorter lifetime. There
are two ways to relax the security-lifetime trade-off: i)
using a weaker encryption algorithm with an acceptable
information entropy, or ii) limiting the avalanche effect.
Obviously, to have a secure main memory, the former
approach is not desirable; however, the latter can be
adopted using compression. Shanon indicated that in a
secure communication system, it is better to compress
the source before encryption, in order to, first, reduce
the computation and storage rquirements, and second, to
eliminate redundancies [35]. Applying compression before
% of Blocks
100
80
60
40
20
0
0B-16B
lib mi bz2
lc
q
gam zeu go
ess smp
17B-32B
33B-48B
49B-64B
gem na hm les dea mc sph lbm gro gcc

f
ma
l
inx
s md mer lie
cs
om per ast sje Av

ng
ar
g
net l
pp
Fig. 5. Block size distribution after compression with 16B resolution (minimum length for an AES block).
Write# Size
2
3
4
0
1
1
0
Inactive
Counter
0
1
Active
Fig. 7. The timeline of activities realized on a block when RBSS is

employed. The starting segment is changed based on the counter
value to uniform bit-flip distribution.
encryption has also another important benefit in NVMs. If

compression unit successfully reduces the size, less bits
are encrypted and thus less cells in a cache line patriciate
in the write operation. In other words, compression helps
RBW to exempt more cells from write operation.
In order to investigate the effectiveness of compression
for relaxing the security-lifetime trade-off, we compress
each data block using FPC algorithm [52] during write
and then encrypt the compressed block and write it in
its location. Fig.6 shows a typical compressed block after
compression and the definition of Active and Inactive cells.
Inactive Bits=n
Active Bits=m
Compressed Block
1 0
Fig. 6. A block of m+n bits is compressed to m bits (as active bits).

The remaining m inactive bits are masked during the write.
Fig.3 shows the bit-flip rate and average entropy for 4

systems: i) baseline, ii) baseline+ecryption, iii) compression+encryption (only active cells are considered), and iv)
compression+encryption (all cells are inspected for calculating the information entropy regardless of their validity).
As shown in the figure, the bit-flip rates in the 4 systems
are 12.5%, 50%, 24% and 24% while their information
entropies are 0.98, 5.5, 5.5 and 2.8, respectively. That is
a system with compression followed by encryption has
almost the same security level of the baseline+encryption
system while it halves the bit-flip rate. Besides, the calculated information entropy along the whole cache line (system IV) reveals a lower entropy than system III, meaning
that it does not provide any significant hint to attackers
for analyzing the memory content based on information
entropy.
5.2
Making the cells usage more uniform
Compression reduces the data block size and, as shown in

Fig. 6, results in some Inactive cells in each write. If active
cells are mapped to some fixed space along the memory
line, then cells will be used non-uniformly, that is the fixed
space hosting active cells take more writes. Besides, as
explained in Section.2, HWL mechanism works on a perbank basis and is not agile, since the amount of rotation
is determined based on the number of write operations
arrived to each memory bank. Here, we need an agile
wear-leveling scheme to spread bit-flips uniformly over
the memory lines. To this end, we use a customized perblock shifting scheme.
Thus, we consider two important issues in our wearleveling scheme: i) since the minimum size of the AES
block is 16B, we do not partition a block to smaller than
16B segments, and ii) since encryption increases the bitflip rate, cell usage becomes more uniform within each
segment [20]. Fig.5 shows the occurrence rate of different
block sizes after compression with 16B resolution. As can
be seen in the figure, on average, about 53%, 14%, 8% and
24% of data data blocks are converted to 0B-16B, 17B-32B,
33B-48B and 49B-64B, respectively. In other words, for 53%
of writes, about 75% of cells are inactive and in the worst
case, in 24% of writes there are 25% inactive cells. Hence,
in order to uniform the cells usage, block locations are
partitioned into some segments and upon each write, the
content is written into different segments rotationally.
Rotational Block Starting Segment (RBSS). To uniformly
distribute bit-flips over all cells of a block, each line is
partitioned into N segments and then based on a counter,
the starting segment is slipped periodically. Since the
destiny of each block is not coupled with others (as in
HWL), uniformity occurs eagerly. In other words, hot
blocks with higher write rates, rotate their starting segments more frequently which results to more uniform
bit-flip distribution. By keeping the number of partitions
and frequency of rotating the starting segment low, better
uniformity can be achieved. Hence, based on a counter,
called Starting Segment Counter (SSC), the block content
is written from the starting segment indicated by SSC. The
starting segment index increases by a rate called Starting
Segment Changing Rate (SSCR). The length of SSC counter
and the value of SSCR are determined by extensive experiments on real workloads in the next section. To better
understand our wear-leveling scheme, Fig.7 shows the
time-line of consecutive writes arrived at a specific block
location. Setting SSC = 2 and SSCR = 1, means the line
ETH=64
ETH=48
ETH=32
1 23
32
16B
SSC%2
AES
AES
AES
AES
AES
AES
AES
16B
ETH=16
1 2 3
16
16B
SSC%2
SSC%4
AES
AES
AES
AES
AES
AES
AES
Encrypted
Un-encrypted
AES
AES
Fig. 8. Selection procedure in CBSE algorithm: (a) for ETH=64, each block is encrypted regardless of its compressed size; (b) for ETH=48 and
compressed-block size = 64, 48 most-significant bytes are selected; (c),(d) for ETH=32 and 16, the block is divided into 32 and 16 segments and
from each segment byte#=SSC%2 and SSC%4 is selected, respectively.
Algorithm 1: Compressed block selective encryption

input : Compressed Block (CB), ETH, POISE
output: Encrypted Block
1
2
3
4
5
6
7
8
9
/* ETH: encryption threshold (16,32,48 or 64)

*/
/* POISE: percentage of ignored blocks in
selective-encryption
*/
if CB.size() <= ET H then
/* Encrypting whole memory block
*/
Encrypted Block AES(CB);
else
if RAND() > P OISE then
/* Defining an array with ETH length */
Char[ETH] ArraySE = N U LL;
/* Select ETH bytes for encryption
*/
ArraySE SelectBytes(CB, ETH);
Encrypted Block AES(ArraySE);
else
/* Encrypting whole memory block
*/
Encrypted Block AES(CB);
is partitioned into 4 segments and the starting segment

location increases on each write. Starting from the first
segment, the block content is written in its place. Likewise,
for the next writes, based on the counter, the starting
segment of the block is determined.
5.3
Selective encryption
Despite i-NVMM (with a coverage of 76% for 4KB granularity), a good selective-encryption scheme should encrypt the plain-text with finer granularity and higher
coverage to increase security level [53], [54]. To do so,
Compressed Block Selective Encryption (CBSE) algorithm
adopts a byte-level, accurate and non-deterministic selective encryption approach that provides high coverage. The
details of the algorithm is as follows.
Compressed-block selective encryption (CBSE). To increase the coverage on one hand and extend the lifetime
on the other hand, we categorize the blocks into two sets:
fully- or partially-encrypted. This categorization is done
based on block size after compression and a random scheme.
Two thresholds are defined in CBSE algorithm for such
a categorization. One is Encryption Threshold (ETH) and
the other is Percentage Of Ignored blocks in Selective Encryption (POISE). Using ETH, any block whose size is less
than or equal to ETH is fully encrypted. For the remaining
blocks (size > ETH), a random scheme is applied based on
POISE to decide whether the block regardless of its size is
encrypted or not. More clearly, by generating a random
number and comparing it with POISE, some blocks are

fully encrypted regardless of their size and others are
picked up randomly for selective-encryption. In fact, we
use POISE in order to randomly encrypt incompressible
data blocks and to keep the amount of un-encrypted data
low.
If a block is not selected for full-encryption (size >
ETH and POISE is not successful), we must select some
bytes of the data block for encryption. The selection policy
depends on ETH and block size after compression, so all
possible values of ETH must be considered. Since we use a
16B block-based AES, the size of cypher-text is a multiple
of 16B. Therefore, possible values of ETH are 16B, 32B, 48B
and 64B. The selection policy for each ETH is explained
separately.
For ETH = 64B (Fig.8 (a)), since all data blocks after
compression have a less than 64B size, selection is not
required and CBSE considers all data blocks for fullencryption. For ETH = 48 (Fig.8 (b)), we select 48 bytes of
the block from most-significant bytes. However, for ETH
= 32 and ETH = 16 (Fig.8 (c), (d)), blocks are partitioned
into ETH segments and then in each segment, byte number SSC%SegmentSize is selected for encryption. Since
SSC changes periodically, the selected bytes are not fixed
over time. However, it should be considered that we can
always correctly distinguish the encrypted bytes from unencrypted bytes since the SSC value is updated before each
write.
5.4
Putting all together
Write Operation. When a write arrives to memory controller, it is pushed into the write queue. Then, the following operations are realized in order, for each write picked
from the queue: compression selective encryption
rotation. Finally, the data block is delivered to the write
circuit.
Read Operation. When a data block is pushed into the
read queue, its content is manipulated before passing it
to the requester. Hence, after rotating its content, AES
engine decrypts the data block making it ready for decompression. The final result is passed to the requesting
unit.
6
6.1
E VALUATION R ESULTS
Metrics
Bit flips per write. We define the average bit-flip rate

per write as the average number of bits that are changed
during the write divided by the total number of bits in a

block. Since write latency is 4-5 the read latency, RBW
scheme is used generally to mask unchanged bits. Due
to information redundancy, almost 10-15% of all bits are
changed during the writes. However, this value increases
when the security issues of NVMs are tackled.
Bit flips count. Bit-flip rate does not reflect the amount
of reduction on the number of bit-flips since it is a proportional metric. To resolve this issue, we count the total
number of flips in all memory systems, and report this
metric as Bit Flips Count to show the effectiveness of
compression in reducing the number of bit-flips.
Intra-Block Variation. It is experimentally found that
memory space is not used uniformly in both block-level
and cell-level granularities. The block-level variation is
addressed by Start-Gap, and cell-level variation is dealt
by RBSS technique. For estimating the effectiveness of the
proposed solution, Intra-Block Variation metric introduced
in i2 wap [55] for NVMs is used:
v
uP
M
P
uM
wij /M )2
(wij
N u
X
t
1
j=1
j=1
IntraV =
(4)
Waver .N
M 1
size and the remaining 99% blocks are considered for

selective-encryption (POISE=1%) based on their size. Each
memory line is partitioned into four segments (SSC = 4)
and for each write the block content is rotated by one
segment (SSCR = 1). Also, a sensitivity analysis of all these
parameters (ETH, POISE, SSC and SSCR) is realized to
find the best systems from different view points. As iNVMM is configured for high coverage, memory system
is scrubbed every 1 billion cycles for encrypting nonworking set content (bytes aged more than 100 million
cycles). The summary of simulated systems can be seen in
Table 4.
TABLE 4
Summary of evaluated systems.
System
Baseline
DEUCE
i-NVMM
CC(ETH=64)
CC(ETH=48)
CC(ETH=32)
CC(ETH=16)
i=1
where wij is the write count of the cache line located at

set i and way j and Waver is the average write count. N
and M are the total number of cache sets and the number
of cache ways in one set, respectively. Considering 512-bit
blocks in the main memory, we re-write this equation to
satisfy our demand in memory system as:
6.3
Encryption
No
Full
Partial
Full
Partial
Partial
Partial
Degradation
No
Yes
Yes
Yes
Yes
No
No
Scheme
HWL
HWL
RBSS
RBSS
RBSS
RBSS
Evaluation results
Fig.9(a) presents bit-flips per write for different applications. The baseline system has a 12.5% bit-flip rate while all
other secured systems increase the rate due to encryption
(e.g. DEUCE= 29%). On the other hand, CryptoComp, in
v
u 512
the best case (ETH = 16), decreases this metric to 19%.
512
P
uP
u
ETH=16 and POISE=1% means CBSE under all circum(BFij
wij /512)2
N
X t j=1
1
j=1
(5) stances, picks 16 bytes for encryption that is translated to

IntraV =
BFaver .N
511
almost 64 bit-flips (8 bytes).
i=1
Fig.9(b) shows the bit-flips count for different systems.
where BFij is the write count for block i and cell j, BFaver
CryptoComp with ETH=16 reduces bit-flips by 341% and
is the average bit-flip count and N is the total number of
53% compared to DEUCE and i-NVMM, respectively. The
blocks.
reduction in bit-flips count means lower activity in memCoverage. Since our proposal and i-NVMM partially enory cells and therefore better control on avalanche effect
crypt the memory content, we use a previously defined
when adopting CryptoComp.
metric in [32] called coverage for determining the perFig.10 investigates the bit-flips uniformity (IntraV) of
centage of memory bytes that remain un-encrypted at the
different applications in all systems. Since encryption
time of a power event (when simulation is terminated).
changes 50% of bits (avalanche effect), adopting the enTo do so, at the first cycle of simulation, we assume that
cryption makes the cells usage almost uniform. Nevertheall bytes in the memory are encrypted and at simulation
less, it comes at the cost of lifetime because it increases
termination, we count the number of un-encrypted bytes
bit-flips per write significantly. As mentioned before, all
left by CryptoComp and i-NVMM to calculate this metric.
works in the literature including i-NVMM, DEUCE and
CryptoComp, bind this effect in a limited number of cells.
6.2 Experimental setup
But these approaches decrease the uniformity since some
We compare various configurations of CryptoComp with
specific locations face frequent changes. To fix this issue,
3 systems: i) Baseline which has the lowest bit-flip rate
DEUCE introduced HWL. However, in our experiment,
among all systems (it is un-encrypted), ii) DEUCE which
we applied HWL to DEUCE and i-NVMM and used RBSS
has the lowest bit-flip rate among secured systems (it
in CryptoComp. Fig.10 shows that CryptoComp always
is fully protected), iii) i-NVMM which has the highest
shows more uniformity than DEUCE and i-NVMM recoverage among partially-encrypted systems. In CBSE,
gardless of its configuration for different ETHs. ETH=64
two configuration parameters (ETH and POISE) are used.
case shows better uniformity compared to other configuThese two variables have direct impact on coverage and
rations of CryptoComp, since it uses encryption for more
bit-flip rate. For all experiments, CryptoComp uses all
bits. In other words, since over 50% of blocks have a size
combinations of ETH (16, 32, 48, and 64), but 1% of
less than 32 bytes after compression, and we rotate their
blocks are selected for full-encryption regardless of their
content eagerly, more uniformity is achieved. Briefly, the
(a) Bit Flips Per Write (%)
50
Baseline
DEUCE
i-NVMM
ETH=64
ETH=48
ETH=32
ETH=16
40
30
20
10
0
(b) Normalized Bit Flips (normalized to baseline)
10
8
6
4
2
0
ge
ra
ve
en
sj
r
ta
as
rl
pe
ne
om
tp
c
gc cs
a
om
gr
m
lb
nx
hi
sp
cf
m
al
de
e
sli
le r
e
m
hm
d
m
na
s
m
ge
us
go
ze
es
ip
m
ga
bz
ilc
lib
Fig. 9. Comparing CryptoComp, baseline, DEUCE, and i-NVMM schemes. (a) Percentage of bit-flips per write: CryptoComp increases it by 8%
compared to the un-protected baseline while improving on DEUCE by 52%; (b) Since encryption reduces the number of active cells, normalized
bit-flips count is useed. CryptoComp reduces bit-flips count by 53% compared to i-NVMM.
DEUCE+HWL
Baseline+HWL
0.004
i-NVMM+HWL
ETH=64
ETH=48
ETH=32
ETH=16
IntraV (less is better)
0.003
0.002
0.001
0
ge
ra
ve
g
en
sj
r
ta
as
ac
rl
p
tp
ne
pe
om
gc
o
gr
m
lb
nx
hi
sp
cf
m
al
de
e
sli
le r
e
m
hm
d
m
na
s
m
ge
3
1
go p
s
es
m
us
ze
m
ga
2
ip
ilc
bz
q
lib
Fig. 10. Comparing CryptoComp, baseline, DEUCE, and i-NVMM schemes. Intra-block variation measures the uniformity of cells usage.
CryptoComp with ETH=64 (IntraV=0.002) is close to the ideal case (IntraV=0).
proposed wear-leveling technique propels the uniformity

to the ideal case (IntraV = 0), and best uniformity is
achieved when ETH=64.
Lifetime is the direct translation of bit-flips per write
and IntraV as a real and sensible metric. Fig. 11 shows
the benefit of employing different schemes for improving
the lifetime. Although DEUCE and i-NVMM increase the
lifetime compared to a naive fully-encrypted system, they
degrade the lifetime in comparison to the un-protected
baseline, since both methods increase the bit-flip rate and
decrease uniformity. However, for CryptoComp, ETH=64
and 48 degrade the lifetime since they have a very close
bit-flip rate to DEUCE and i-NVMM. Additionally, they
show almost identical results and lower than the baseline.
ETH=32 and 16 are more durable because they have a very
good uniformity and a small bit-flip rate distance from the
baseline. This observation forces us to put aside selecting
ETH=48 and ETH=64.
i-NVMM and CryptoComp are selective-encryption
schemes and keep some bytes un-encrypted while DEUCE
and CryptoComp with ETH=64 keep all bytes in the
memory system encrypted. The amount of secured bytes
in i-NVMM and CryptoComp with ETH=16, 32, and 48 is
investigated using the coverage metric in Fig.12. Crypto-
Comp (in all configurations) has higher amount of secured

bytes compared to the best configuration of i-NVMM.
Briefly, for the sake of security, choosing ETH=48 is the
best option.
6.4
Sensitivity analysis
As previously discussed, CryptoComp uses some thresholds during its operation. Now, in this section, we clarify the conditions under which these thresholds provide
higher durability and security. These thresholds are as
follows.
Encryption threshold (ETH): in CBSE algorithm, if the
block size after compression is less than or equal to ETH,
we fully encrypt the block; otherwise, we should follow a
random scheme for selective-encryption.
Percentage of ignored blocks in selective encryption
(POISE): when we decide to do selective-encryption based
on block size, some blocks are occasionally selected for full
encryption regardless of their size.
Starting segment counter (SSC): a block location is partitioned into some segments and writes are started from
different starting segments.
Starting segment changing rate (SSCR): we increase the
SSC based on SSCR.
10
Baseline
2.5
DEUCE
i-NVMM
ETH=64
ETH=48
ETH=32
ETH=16
Normalized Lifetime (normalized to baseline)
2
1.5
1
0.5
0
vg
ng
e
sj
ta
as
pp
t
ne
rl
pe
om
c
gc cs
a
om
gr
m
lb
nx
hi
sp
cf
m
al
de
e
sli
le r
e
m
hm
na
ge
go p
m
us
ze s
es
m
ga
2
ip
bz
ilc
q
lib
Fig. 11. Comparing CryptoComp, baseline, DEUCE, and i-NVMM schemes. Lifetime is used to measure the effectiveness of intra-block wearleveling and bit-flips reduction. Although DEUCE and i-NVMM improve the lifetime compared to the naive fully-encrypted system, both degrade
the lifetime compared to an un-protected system. Since CryptoComp uses the compression+rotation mechanism, each cell sees lower bit-flips
(while all cells flip more uniformly
Baseline
Coverage (%) (more is better)
DEUCE
i-NVMM
ETH=64
ETH=48
ETH=32
ETH=16
100
90
80
70
60
50
ac
vg
A
g
en
sj
r
ta
as
rl
pe pp
t
ne
om
gc
om
gr
m
lb
nx
hi
sp
cf
m
al
de
e
sli
le r
e
m
hm
d
m
na
ge
go p
m
us
ze s
es
m
ga
2
ip
ilc
bz
lib
Fig. 12. Comparing CryptoComp, baseline, DEUCE, and i-NVMM schemes. As i-NVMM and CryptoComp are selective-encryption schemes, their
coverage shows the amount of un-encrypted bytes at the time of a power event.
30 DEUCE
25
20
15
10
20
30
POISE
40
50
1 HWL
0.95
0.9
0.85
0.8
10
20
30
40
50
POISE
ETH=48
100
1.8
99.5
1.6
1.4
1.2
Baseline
1
0.8
10
20
30
POISE
40
50
Coverage (%)
35
1.05
ETH=32
2
Normalized Lifetime
1.1
IntraV(*1E-3)
Bit Flip Rate (%)
ETH=16
40
99
98.5
98
97.5
97
invmm
10
20
30
40
50
POISE
Fig. 13. Effects of POISE and ETH on (a) Bit-flips per write: for CryptoComp to be more effective than DEUCE, a POISE value of less than 35, 25
and 15 for ETH=16, 32 and 64 should be selected, respectively; (b) IntraV: for CryptoComp to have a more uniform RBSS than HWL, the POISE
value should be less than 35 for ETH=48. For other ETH values, CryptoComp behaves always better; (c) Lifetime: for ETH=48, CryptoComp
has a lower lifetime but for ETH=16 it shows better lifetime. With POISE<30, CryptoComp improves the lifetime over the baseline; (d) Coverage:
CryptoComp keeps more bytes secured than i-NVMM.
For all experiments, we reported the arithmetic mean

of corresponding metrics for all workloads. Experimentally, we found that ETH and POISE are more important
compared to SSC and SSCR. Hence, the impact of these
two parameters on the previously mentioned metrics are
explored. Then, different values of SSC and SSCR are inspected by exploiting the best values for ETH and POISE.
Effects of ETH and POISE. Fig.13 shows the measured
metrics for ETH=16, 32 and 48 and different values of
POISE. For better judgment, in each figure, the average
value of the best state-of-the-art system is also shown.
Starting from Fig.13(a), it is understood that by increasing
the POISE for all systems, the percentage of bit-flips is also

increased. It is due to the fact that when a block is picked
up for selective-encryption, the chance of full-encryption
and therefore the bit-flip rate per block is increased. To
make CryptoComp more efficient than DEUCE, POISE
can be selected as follows: for ETH=16, 32 and 48 the
maximum value for POISE must be 37%, 26% and 17%,
respectively.
Fig.13(b) shows the bit-flip distribution in CryptoComp with ETH=48, 32 and 16. As can been seen, systems
with ETH=48, 32 are always better than HWL whereas
ETH = 16 needs to select a POISE value larger than 25% to
SSC=4
SSC=8
1.2
SSC=16
1.15
IntraV
Lifetime
1.3604
1.3602
1.36
1.3598
1.3596
1.3594
1.3592
1.359
11
1.1
1.05
1
0.95
1 2 3 4 5 6 7 8
SSCR
SSC=4
SSC=8
1.2
1.03
1.02
1.015
SSC=16
1.1
1.05
1
1.01
1.005
1.15
1.025
IntraV
Normalized Lifetime
1.035
5
SSCR
0.95
SSCR
5
SSCR
Fig. 14. Sensitivity analysis of SSCR and SSC for (up) ETH=16
and POISE=35%, and (down) ETH=32 and POISE=30%. Selecting
SSC=8 and SSCR=1 leads to more uniformity and longer lifetime.
TABLE 5
Best systems characteristics.
CCa
MDCb
MSCc
a
b
c
Lifetime Improvement (%)

DEUCE i-NVMM Base
359
218
366
224
94
18
on coverage, the corresponding figure are not shown. For

SSCR, we chose only 1, 2, 4 and 8 writes in the experiment
(as higher values had no impact). Additionally, we opted
3 values for SSC (SSC=16, 8 and 4) and ignored other
possible values since they impose more than 5 bits storage
overhead per block.
When SSCR is increased from 1 to 8 for SSC=3, 8 and
4, we observed better lifetime and more uniformity for
SSC=8 and SSCR=1 for both ETH=16 and 32. In other
words, on each write (SSCR=1), the block is partitioned
into 8 segments (SSC=8) to achieve the highest possible
levels of lifetime and uniformity. These selections lead to
3 bits storage overhead per block.
Best selections. The aforementioned sensitivity analysis
reveals that we can pick up two systems that outperform
all state-of-the-arts in terms of lifetime and coverage. In
other words, SSCR and SSC must be set to 1 and 8, respectively. For ETH=16 (32), we must choose POISE=30%
(35%). For ETH=16, our system is more durable while
for ETH=32, it is more secure. Briefly, Table 5 shows
the lifetime improvement and coverage of CryptoComp
compared to other systems.
Coverage (%)
i-NVMM CCa
87
87
98
99
CryptoComp
Most durable CryptoComp (ETH=16, POISE=35, SSCR=1, SSC=8)
Most secure CryptoComp (ETH=32, POISE=30, SSCR=1, SSC=8)
attain a more uniform distribution.

The most important metric in our results is the lifetime as illustrated in Fig.13(c). CryptoComp with ETH=48
is worse than un-protected baseline (the most durable
system), so a system with ETH=48 should be dismissed.
ETH=16 shows longer durability than the baseline regardless of POISE value, but it should be less than 30 for
ETH=32.
By increasing POISE, the coverage of CryptoComp is
increased. Fig.13(d) shows the amount of secured bytes as
a function of POISE. For all configurations, CryptoComp
covers more bytes compared to i-NVMM. For ETH=16, we
must select POISE less than 35% but higher than 25%, so in
the former 97.5% and for the latter case 98.2% of memory
blocks are ciphered. For ETH=32, POISE must be less than
30%, but higher than 27%. So, in the former case 98.2%
and in the latter case 98.4% of memory blocks are secured.
Eventually, CryptoComp with ETH=64 and 48 does
not satisfy our lifetime demand, although it provides a
higher level of security. For ETH = 32 (16), POISE is
selected in 25%-35% range (i.e. 27%-30%). In the following
subsection, different values of SSC and SSCR are inspected
for ETH=16 and 32 with best POISE values (30 and 35).
Effects of SSC and SSCR. Fig.14 shows the effects of
selecting different values for SSC and SSCR on lifetime
and IntraV. Since these two parameters have no impact
C OMPARISON TO OTHER SCHEMES
Having two forms of CryptoComp (most-durable CryptoComp and most-secure CryptoComp), we are able to
compare the performance and total power consumption
of our proposal with DEUCE, i-NVMM and un-protected
baseline. We assume that all systems (except the unprotected baseline) use the counter-mode AES in order to
reduce the encryption/decryption latency cost. To obtain
the energy overhead estimates for AES, we use the power
overheads reported in [56] for CTR-AES. Fig.15 shows
the normalized IPC and total power consumption for the
mentioned systems. From the performance point of view,
i-NVMM and un-protected baseline are similar when
counter-mode AES is employed by i-NVMM. It is because:
1) just one XOR operation must be done for decryption
and 2) the scrubbing mechanism of i-NVMM is done in
the background which is off critical path. DEUCE needs
some other operations to recover the original block by
concatenating two parts: one recovered using the current
content of the counter and another part recovered using
the previous content of the counter. Most-durable CryptoComp and most-secure CryptoComp must run CBSE algorithm during write operation and one XOR operation and
shift during read operation. Hence, DEUCE, and the two
CryptoComp architectures respectively degrade the system performance by 1%, 2.15% and 2.1%, on average. For
some memory-intensive applications like libq, gamess and
milc, the degradation in performance is not higher than
5%. Among all evaluated systems, i-NVMM consumes less
power since it keeps the hot blocks un-encrypted (1%), but
DEUCE and the two CryptoComp configurations consume
15.3%, 8.4% and 8.5% extra power, respectively. Although
CryptoComp degrades the system performance by 2.1%
and increases power consumption by 8.5%, it keeps more
than 99% of memory lines encrypted and extends the
lifetime by 3. It must be considered that, we devise
this technique to work at main memory level. However,
1
0.98
0.96
0.94
0.92
0.9
1.2
1.15
1.1
1.05
1
0.95
0.9
12
(a) Normalized IPC
Baseline
DEUCE
i-NVMM
MDC
MSC
(b) Normalized Total Power Consumption
mi
lc
bzi gam zeu go

p2
ess smp
l
h
mc
d
gem na
md mm eslie eal
f
s
er
sph lbm
inx
gro gcc
ma
cs
p
net erl
pp
om
ast sje Av
ng
ar
g
45
40
35
30
25
20
15
10
5
0
Normalized Values
% of Blocks
Fig. 15. Normalized IPC and total power consumption for different systems.
(a)
<1
[1,2) [2,3) [3,4) [4,5) [5,6]

Entropy Range
1.02
1.01
1
0.99
0.98
0.97
0.96
0.95
0.94
(b)
CBSE
1 2 3 4 5 6 7 8
CBSE+
Lifetime Coverage IPC

Metric
Power
Fig. 16. Percentage of blocks with different entropies (a); less than
15% of blocks have a >5 entropy. Comparing CBSE and CBSE+ (b).
it can be modified to use at another level such as LLC.

For example, we can simplify the CBSE by removing the
POISE to pave the way to reduce its latency overhead.
Then, it can be used in LLC.
9 10 11 12 13 14 15 16
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
P ROTECTING SENSITIVE DATA
To have a more secure CryptoComp and protect the sensitive data, we present CBSE+ algorithm, an scheme that
covers high-entropy data blocks fully encrypted. To do
so, we first analyzed the percentage of high-entropy data
blocks selected for partial-encryption in CryptoComp with
ETH=32. Since the maximum observable entropy in a 64Bblock is 6 (according to Eq.1), we categorized the blocks
in terms of entropy with resolution of 1, in Fig.16(a). As
can be deducted from this figure, less than 15% of data
blocks (out of 45% of total blocks selected for partialencryption) have an entropy of 5 or higher; these are
the blocks that have not benefited from compression and
entered into the selective-encryption process. Therefore, if
we cover such a small fraction of data blocks with fullencryption, we can assure that high-entropy data blocks
are immune. However, it is a time- and power-consuming
process to calculate the entropy for each data block during
the write operation. To avoid it, we can use the block size
after compression to select high-entropy data blocks for
full-encryption. It must be considered that due to the deficiency of compression algorithms not all large compressed
blocks have high-entropy, but our observation shows that
all high-entropy compressed data blocks are of large size.
Therefore, we can use the block size after compression to
predict sensitive data blocks.
Hence, we categorized the compressed data blocks into
3 categories: i) non-sensitive for block size < ETH, ii)
sensitive for block size > SDTH (sensitive data threshold),
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
Size=1-32: insensitive data/lifetime-friendly/full-encryption
Size=33-59: ordinary data/lifetime-friendly/partial-encryption
Size=60-64: sensitive data/security-friendly/full-encryption
Fig. 17. Categorizing data blocks with different sizes after compression.
and iii) ordinary. We refine CBSE as follows: insensitive

data blocks (category I) are fully encrypted due to their
small size after compression, sensitive data blocks (category II) are fully encrypted to increase the resiliency of
CryptoComp, and ordinary data blocks (category III) are
entered to the random process of encryption. We found
that setting SDTH=60 enforces CryptoComp to pick all
data blocks with entropy>5. Fig.17 shows the categorizing
strategy used in CBSE+. Fig.16(b) shows the behavior of
CBSE+ in comparison to CBSE. As can be seen, CBSE+
with 1% and 2% performance and power consumption
overheads has 0.1% better coverage while keeping the
sensitive data blocks protected.
Fig.18 shows the average coverage as a function of
compressed block size for CBSE and CBSE+ when ETH=32
and POISE=35%. As shown in the figure, both schemes
behave similarly for insensitive and ordinary data blocks.
The sensitive data blocks (those with high-entropy and
randomly encrypted in CBSE) have been fully covered
by CBSE+ while in CBSE the coverage rate for such data
blocks has been dropped to less than 90%.
OVERHEAD OF META - DATA
In the two CryptoComp structures with SSCR=1 and

SSC=8, we should use 4 bits meta-data storage. Also, FPC
requires 48 bits meta-data. Hence, CryptoComp imposes
52 bits meta-data per block, i.e. about 10% storage overhead. We store meta-data in the SLC mode in the main
memory and cache them in memory controller side. This
approach reduces the updating rate of meta-data cells and
abolishes the extra reads for fetching the meta-data. Our
evaluations show that meta-data storage cells last much
longer than the actual data cells in the main memory.
CBSE
[4]
[5]
Insensitive Data
10
Ordinary Data
20
30
40
50
Block Size after Compression
Sensitive Data
Average Coverage
CBSE+
100
95
90
85
80
75
70
65
60
13
60
[6]
[7]
[8]
Fig. 18. Average coverage of CBSE and CBSE+ vs compressed data

block size.
[9]
Additionally, one might consider MLC meta-data storage

cells if a proper encoding is employed [36] [17].
[10]
[11]
10
C ONCLUSION
In many respects, non-volatile memories (NVMs) are considered as promising alternatives for traditional DRAMand SRAM-based memory systems. However, due to the
short lifetime and security vulnerability of NVMs, replacing DRAM and SRAM with NVM causes some problems.
We proposed CryptoComp to improve the security and
lifetime of a phase change main memory by adopting
a selective-encryption scheme working on compressed
memory blocks. Our method selects sufficient bytes from
proper locations in the block to increase coverage using the
space freed after compression. Additionally, we adopted a
block-dependent rotation mechanism that made the cell
usage more uniform. Based on extensive simulations, two
best configurations of CryptoComp were proposed for
best lifetime and security. These two systems enhanced the
memory system lifetime by 3.59 and 3.66, compared
to DEUCE, while provided 98% and 99% coverage ratios.
Recently, several technologies have been proposed to replace conventional memories in different levels of memory
hierarchy, from CPU registers to external storage. Hence,
similar schemes for securing NV-register files, NV-caches
(L1, L2, ), and solid-state drives (SSDs) can be considered
as future works.
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
ACKNOWLEDGMENTS
The authors would like to thank Armin Ahmadzadeh,
Mohammad Sadrosadati, and Mahmood Naderan for their
assistance with setting up and maintaining the cluster
used to conduct the simulation experiments. This work
is supported in part by a grant from IPM.
[21]
[22]
[23]
R EFERENCES
[1]
[2]
[3]
B. C. Lee, E. Ipek, O. Mutlu, and D. Burger, Architecting phase

change memory as a scalable dram alternative, in Proceedings of
the 36th Annual International Symposium on Computer Architecture,
ISCA, 2009.
K. Suzuki and S. Swanson, A survey of trends in non-volatile
memory technologies: 2000-2014, in 2015 IEEE International
Memory Workshop (IMW), May 2015.
B. C. Lee, E. Ipek, O. Mutlu, and D. Burger, Phase change
memory architecture and the quest for scalability, Commun.
ACM.
[24]
[25]
Q. Guo, X. Guo, Y. Bai, R. Patel, E. Ipek, and E. Friedman, Resistive ternary content addressable memory systems for dataintensive computing, Micro, IEEE.
J. Wang, X. Dong, and Y. Xie, Enabling high-performance
lpddrx-compatible mram, ISLPED 14, pp. 339344, 2014.
J. Wang, X. Dong, and Y. Xie, Building and optimizing mrambased commodity memories, ACM Trans. Archit. Code Optim.
S. Cho and H. Lee, Flip-n-write: A simple deterministic technique to improve pram write performance, energy and endurance, MICRO, 2009.
Q. Guo, X. Guo, R. Patel, E. Ipek, and E. G. Friedman, Acdimm: Associative computing with stt-mram, in Proceedings of
ISCA 13, 2013.
P. Nair, C. Chou, B. Rajendran, and M. Qureshi, Reducing read
latency of phase change memory via early read and turbo read,
in High Performance Computer Architecture (HPCA), 2015 IEEE
21st International Symposium on, 2015.
M. K. Qureshi, M. M. Franceschini, A. Jagmohan, and L. A. Lastras, Preset: Improving performance of phase change memories
by exploiting asymmetry in write times, ISCA 12, 2012.
M. Jalili, M. Arjomand, and H. Sarbazi-Azad, A reliable 3d mlc
pcm architecture with resistance drift predictor, DSN, 2014.
M. Jalili and H. Sarbazi-Azad, Captopril: Reducing the pressure
of bit flips on hot locations in non-volatile main memories, in
DATE, 2016.
S. Schechter, G. H. Loh, K. Straus, and D. Burger, Use ecp, not
ecc, for hard failures in resistive memories, in Proceedings of
ISCA, 2010.
R. Azevedo, J. D. Davis, K. Strauss, P. Gopalan, M. Manasse, and
S. Yekhanin, Zombie memory: Extending memory lifetime by
reviving dead blocks, ISCA, 2013.
J. Fan, S. Jiang, J. Shu, Y. Zhang, and W. Zhen, Aegis: Partitioning data block for efficient recovery of stuck-at-faults in phase
change memory, in Proceedings of the 46th Annual IEEE/ACM
International Symposium on Microarchitecture, MICRO, 2013.
M. Asadinia, M. Arjomand, and H. S. Azad, Prolonging lifetime of pcm-based main memories through on-demand page
pairing, ACM Trans. Des. Autom. Electron. Syst., 2015.
A. N. Jacobvitz, R. Calderbank, and D. J. Sorin, Coset coding
to extend the lifetime of memory, in Proceedings of the 2013
IEEE 19th International Symposium on High Performance Computer
Architecture, HPCA, 2013.
M. Qureshi, A. Seznec, L. Lastras, and M. Franceschini, Practical and secure pcm systems by online detection of malicious write streams, in High Performance Computer Architecture
(HPCA), 2011 IEEE 17th International Symposium on, pp. 478489,
Feb 2011.
B. D. Yang, J. E. Lee, J. S. Kim, J. Cho, S. Y. Lee, and B. G. Yu, A
low power phase-change random access memory using a datacomparison write scheme, in 2007 IEEE International Symposium
on Circuits and Systems, pp. 30143017, 2007.
V. Young, P. J. Nair, and M. K. Qureshi, Deuce: Write-efficient
encryption for non-volatile memories, in Proceedings of the 20St
International Conference on Architectural Support for Programming
Languages and Operating Systems, ASPLOS, 2015.
S. Chhabra, B. Rogers, Y. Solihin, and M. Prvulovic, Secureme:
A hardware-software approach to full system security, in Proceedings of the International Conference on Supercomputing, ICS,
2011.
P. Peterson, Cryptkeeper: Improving security with encrypted
ram, in 2010 IEEE International Conference on Technologies for
Homeland Security (HST), 2010.
M. Henson and S. Taylor, Beyond full disk encryption: Protection on security-enhanced commodity processors, in Proceedings of the 11th International Conference on Applied Cryptography
and Network Security, ACNS, 2013.
G. Duc and R. Keryell, Cryptopage: An efficient secure architecture with memory encryption, integrity and information leakage
protection, in Computer Security Applications Conference, 2006.
ACSAC 06. 22nd Annual, 2006.
X. Zhuang, T. Zhang, and S. Pande, Hide: An infrastructure for
efficiently protecting information leakage on the address bus,
in Proceedings of the 11th International Conference on Architectural
Support for Programming Languages and Operating Systems, ASPLOS XI, 2004.
[26] N. Rathi, S. Ghosh, A. Iyengar, and H. Naeimi, Data privacy in

non-volatile cache: Challenges, attack models and solutions, in
ASP-DAC, pp. 348353, Jan 2016.
[27] H. Zhang, C. Zhang, X. Zhang, G. Sun, and J. Shu, Pin tumbler
lock: A shift based encryption mechanism for racetrack memory, in ASP-DAC, pp. 354359, 2016.
[28] S. Kannan, N. Karimi, O. Sinanoglu, and R. Karri, Security
vulnerabilities of emerging nonvolatile main memories and
countermeasures, IEEE Transactions on Computer-Aided Design
of Integrated Circuits and Systems, vol. 34, no. 1, 2015.
[29] A. Awad, P. Manadhata, S. Haber, Y. Solihin, and W. Horne,
Silent shredder: Zero-cost shredding for secure non-volatile
main memory controllers, ASPLOS 16, 2016.
[30] K. Shamsi and Y. Jin, Security of emerging non-volatile memories: Attacks and defenses, in 2016 IEEE 34th VLSI Test Symposium (VTS), pp. 14, 2016.
[31] S. Swami, J. Rakshit, and K. Mohanram, Secret: Smartly encrypted energy efficient non-volatile memories, DAC 16, 2016.
[32] S. Chhabra and D. Solihin, i-nvmm: A secure non-volatile main
memory system with incremental encryption, in Proceedings of
ISCA, 2011.
[33] S. Kannan, N. Karimi, O. Sinanoglu, and R. Karri, Security
vulnerabilities of emerging nonvolatile main memories and
countermeasures, IEEE Transactions on Computer-Aided Design
of Integrated Circuits and Systems, 2015.
[34] M. Henson and S. Taylor, Memory encryption: A survey of
existing techniques, ACM Comput. Surv., 2014.
[35] C. Shannon, Communication theory of secrecy systems, Bell
System Technical Journal, The, 1949.
[36] R. Maddah, S. Seyedzadeh, and R. Melhem, Cafo: Cost aware
flip optimization for asymmetric memories, in Proceedings of
the 2015 IEEE 21st International Symposium on High Performance
Computer Architecture, HPCA, 2015.
[37] T. Wang, D. Liu, Y. Wang, and Z. Shao, Towards write-activityaware page table management for non-volatile main memories,
ACM Trans. Embed. Comput. Syst., 2015.
[38] G. E. Suh, D. Clarke, B. Gassend, M. v. Dijk, and S. Devadas,
Efficient memory integrity verification and encryption for secure processors, in Proceedings of the 36th Annual IEEE/ACM
International Symposium on Microarchitecture, MICRO, 2003.
[39] C. Yan, D. Englender, M. Prvulovic, B. Rogers, and Y. Solihin, Improving cost, performance, and security of memory
encryption and authentication, in Proceedings of the 33rd Annual
International Symposium on Computer Architecture, ISCA, 2006.
[40] M. K. Qureshi, J. Karidis, M. Franceschini, V. Srinivasan, L. Lastras, and B. Abali, Enhancing lifetime and security of pcmbased main memory with start-gap wear leveling, in Proceedings of the 42Nd Annual IEEE/ACM International Symposium on
Microarchitecture, MICRO, 2009.
[41] H. G. Lee, S. Baek, J. Kim, and C. Nicopoulos, A compressionbased hybrid mlc/slc management technique for phase-change
memory systems, in Proceedings of the 2012 IEEE Computer
Society Annual Symposium on VLSI, ISVLSI, 2012.
[42] M. Jalili and H. Sarbazi-Azad, A compression-based morphable pcm architecture for improving resistance drift tolerance, ASAP, 2014.
[43] G. Pekhimenko, T. Huberty, R. Cai, O. Mutlu, P. Gibbons,
M. Kozuch, and T. Mowry, Exploiting compressed block size
as an indicator of future reuse, in High Performance Computer
Architecture (HPCA), 2015 IEEE 21st International Symposium on,
2015.
[44] S. Baek, H. G. Lee, C. Nicopoulos, J. Lee, and J. Kim, Sizeaware cache management for compressed cache architectures,
Computers, IEEE Transactions on.
[45] S. Baek, H. G. Lee, C. Nicopoulos, J. Lee, and J. Kim, Ecm:
Effective capacity maximizer for high-performance compressed
caching, in High Performance Computer Architecture (HPCA2013),
2013 IEEE 19th International Symposium on, pp. 131142, 2013.
[46] N. Vijaykumar, G. Pekhimenko, A. Jog, A. Bhowmick,
R. Ausavarungnirun, C. Das, M. Kandemir, T. C. Mowry, and
O. Mutlu, A case for core-assisted bottleneck acceleration in
gpus: Enabling flexible data compression with assist warps,
ISCA 15, 2015.
[47] G. Pekhimenko, T. C. Mowry, and O. Mutlu, Linearly compressed pages: A main memory compression framework with
low complexity and low latency, PACT 12, 2012.
14
[48] S. Baek, H. G. Lee, C. Nicopoulos, and J. Kim, Designing

hybrid dram/pcm main memory systems utilizing dual-phase
compression, ACM Trans. Des. Autom. Electron. Syst., 2014.
[49] N. Binkert et al., The gem5 simulator, SIGARCH Comput.
Archit. News, 2011.
[50] X. Dong, C. Xu, Y. Xie, and N. Jouppi, Nvsim: A circuit-level
performance, energy, and area model for emerging nonvolatile
memory, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2012.
[51] J. L. Henning, Spec cpu2006 benchmark descriptions,
SIGARCH Comput. Archit. News, 2006.
[52] A. Alameldeen and D. Wood, Frequent pattern compression: A
significance-based compression scheme for l2 caches, tech. rep.,
The University of Wisconsin-madison, 2004.
[53] T. Lookabaugh and D. Sicker, Selective encryption for consumer applications, IEEE Communications Magazine, 2004.
[54] A. Massoudi, F. Lefebvre, C. De Vleeschouwer, B. Macq, and
J.-J. Quisquater, Overview on selective encryption of image
and video: Challenges and perspectives, EURASIP J. Inf. Secur.,
vol. 2008, 2008.
[55] J. Wang, X. Dong, Y. Xie, and N. Jouppi, i2wap: Improving
non-volatile cache lifetime by reducing inter- and intra-set write
variations, in IEEE 19th International Symposium on High Performance Computer Architecture, HPCA, pp. 234245, Feb 2013.
[56] C. Panait and D. Dragomir, Measuring the performance and
energy consumption of aes in wireless sensor networks, in Computer Science and Information Systems (FedCSIS), 2015 Federated
Conference on, 2015.
Majid Jalili received the B.Sc. degree from

Shahid Bahonar University of Kerman, Kerman, Iran, in 2010, and the M.Sc. degree from
Sharif University of Technology, Tehran, Iran,
in 2013. He is a member of High-Performance
Computing Architectures and Networks (HPCAN) Laboratory, Sharif University of Technology. His current research interests include
memory systems, multicore and parallel computing, and heterogeneous architectures.
Hamid Sarbazi-Azad received the B.Sc. degree in electrical and computer engineering
from Shahid Beheshti University, Tehran, Iran,
in 1992, the M.Sc. degree in computer engineering from Sharif University of Technology, Tehran, in 1994, and the Ph.D. degree
in computing science from University of Glasgow, Glasgow, U.K., in 2002. He is currently
a professor at the Department of Computer
Engineering, Sharif University of Technology,
and heads the School of Computer Science in
the Institute for Research in Fundamental Sciences (IPM), Tehran,
Iran. His research interests include high-performance computer architectures, networks-on-chip, and systems-on-chip, memory/storage
systems, and social networks, on which he has published over 300
refereed conference and journal papers. Prof. Sarbazi-Azad was a
recipient of the Khwarizmi International Award in 2006, the TWAS
Young Scientist Award in engineering sciences in 2007, and the
Sharif University Distinguished Researcher Awards in 2004, 2007,
2008, 2010, and 2013. He has served as the Editor-in-Chief of the
CSI Journal on Computer Science and Engineering, an Associate
Editor of IEEE TC and ACM Computing Surveys, and an Editorial
Board Member of the Elseviers Computers and Electrical Engineering
journal, the International Journal of Computers and Their Applications,
the Journal of Parallel and Distributed Computing and Networks, and
the International Journal of High-Performance Systems Architecture.

Endurance-Aware Security Enhancement in Non-Volatile Memories Using Compression and Selective Encryption

Diunggah oleh

Informasi Dokumen

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Endurance-Aware Security Enhancement in Non-Volatile Memories Using Compression and Selective Encryption

Diunggah oleh

Hak Cipta:

Format Tersedia

This article has been accepted for publication in a future issue of this journal, but has not been

Endurance-Aware Security Enhancement in

M. Jalili is with the Department of Computer Engineering, Sharif

H. Sarbazi-Azad is with the Department of Computer Engineering,

Manuscript received June 19, 2016; revised September 29, 2016.

solution and reduces cell updating rate to 15%, on average

The primary objective of this work is to address the short

Write Per Block (billion)

Memory Technology :#Number of chips

Bit Flips Per Write (%)

be secured? And 2) why efforts toward a secure NVM

S TATE - OF - THE - ART A RCHITECTURES

Because of asymmetric read and write latencies (read is

algorithm in the presence of a slight modification to the

these methods is reduced significantly. As a matter of

4-core ALPHA21264, 2.0GHz.

consequently, it opens a 25-minute window for an attacker

Memory Traffic (GB/s)

RotationAmount = Start mod BitsInLine

i-NVMM [32] postpones the encryption process to a

(a) Bit-Flip Percentage

(b) Average Entropy

Uncertainty level of the source data is mainly measured by

where pi is the appearance probability of character i in the

IEEE TRANSACTIONS ON COMPUTERS

gem na hm les dea mc sph lbm gro gcc

om per ast sje Av

Fig. 7. The timeline of activities realized on a block when RBSS is

encryption has also another important benefit in NVMs. If

Fig. 6. A block of m+n bits is compressed to m bits (as active bits).

Fig.3 shows the bit-flip rate and average entropy for 4

Making the cells usage more uniform

Compression reduces the data block size and, as shown in

Algorithm 1: Compressed block selective encryption

/* ETH: encryption threshold (16,32,48 or 64)

is partitioned into 4 segments and the starting segment

number and comparing it with POISE, some blocks are

Putting all together

Bit flips per write. We define the average bit-flip rate

during the write divided by the total number of bits in a

size and the remaining 99% blocks are considered for

where wij is the write count of the cache line located at

(5) stances, picks 16 bytes for encryption that is translated to

(a) Bit Flips Per Write (%)

IntraV (less is better)

proposed wear-leveling technique propels the uniformity

Comp (in all configurations) has higher amount of secured

Normalized Lifetime (normalized to baseline)

Coverage (%) (more is better)

Bit Flip Rate (%)

For all experiments, we reported the arithmetic mean

the POISE for all systems, the percentage of bit-flips is also

Lifetime Improvement (%)

on coverage, the corresponding figure are not shown. For

attain a more uniform distribution.

C OMPARISON TO OTHER SCHEMES

(a) Normalized IPC

(b) Normalized Total Power Consumption

bzi gam zeu go

[1,2) [2,3) [3,4) [4,5) [5,6]

Lifetime Coverage IPC

it can be modified to use at another level such as LLC.

P ROTECTING SENSITIVE DATA

and iii) ordinary. We refine CBSE as follows: insensitive