fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TC.2016.2642180, IEEE
Transactions on Computers
IEEE TRANSACTIONS ON COMPUTERS
I NTRODUCTION
RAM memory has been the popular choice of architects for designing computer systems over the past
decades. In recent years, to overcome the problems of
traditional memory systems such as scalability and high
static power consumption, other memory technologies,
including emerging non-volatile memories (NVMs) have
attracted much attention. Phase change memory (PCM),
flash memory, spin transfer torque (STT-RAM) and ferromagnetic RAM (FeRAM) have been employed in various
levels of memory hierarchy [1][6]. These memories have
some advantages and disadvantages. Low static power
consumption, non-volatility and proper scalability are the
main prominent features of NVMs, while long access latency, complicated peripheral circuitry, short lifetime and
security vulnerabilities are considered as their shortcomings [2], [7][12].
Among the mentioned problems associated with these
memories, short lifetime is a main concern and needed to
be addressed properly [13][18]. In this regard, NVM designers proposed the read-before-write (RBW) technique
(also known as Data-Comparison Write) [19] to mask the
unchanged bits during write operation in order to reduce
bit flips per write [7]. RBW is a simple and effective
0018-9340 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TC.2016.2642180, IEEE
Transactions on Computers
IEEE TRANSACTIONS ON COMPUTERS
TABLE 1
Lifetime of different memory technology prototypes fabricated during 2000-2014 [2].
in the source text. Such a desirable feature of an encryption algorithm, called avalanche effect, increases the bit-flip
rate of memory blocks and thus reduces NVM lifetime.
We show that conventional compression algorithms can
control the avalanche effect and provide a lifetime-security
trade-off.
Proposing the Compressed Block Selective Encryption
(CBSE) scheme. We show that for a large fraction of data
blocks, we can follow a fully-encryption approach while
for the remaining blocks we can selectively encrypt a
proper amount of data to ensure an acceptable security
level of the main memory. A more aggressive scheme
(named CBSE+) can also fully encrypt the data blocks
with high-entropy which are not compressed efficiently,
and thus enter into the selective encryption process in the
proposed algorithm.
Employing the Rotational Block Starting Segment
(RBSS) scheme. Due to the variation in bit flips locations,
we propose a technique to remove the stress from certain
cells. This scheme partitions the block locations into some
segments and tries to uniform the writes over them.
Based on above techniques, we propose CryptoComp,
an architecture that addresses the security and lifetime
concerns in a non-volatile main memory. Our evaluation
results, taken from a full-system simulator, reveal that
CryptoComp can effectively outperform all previous stateof-the-art architectures in terms of lifetime. More precisely, CryptoComp improves the lifetime by 3.59 and
3.66 over DEUCE [20] and i-NVMM [32], respectively.
Additionally, CryptoComp covers 99% of memory blocks
(i.e. 99% of memory blocks are encrypted when power
failure occurs) which is 11% higher than the most secured
configuration of i-NVMM with 88% coverage.
The rest of the paper is organized as follows. In Section
2 we discuss the lifetime problem of NVMs and the effects
of encryption. Section 4 presents the evaluation methodology. Section 3 describes previous works and highlights
their pros and cons. Section 5 explains our proposal.
Section 6 reports the evaluation results and comparison
to the state-of-the-art architectures. Finally, we conclude
the paper in Section 10.
P ROBLEM D EFINITION
DRAM
SRAM
Flash (NAND):18
Flash (NOR):35
Phase Change Memory (PCM) :42
Spin-Transfer Torque Magnetic RAM (STT-MRAM):9
Resistive Random-Access Memory (ReRAM):65
Endurance (cycles)
Max Min
Average
1016 1016
1016
16
16
10
10
1016
8
3
10
10
6.16 106
107
103
5.32 105
11
3
10
10
5.48 109
16
5
10
10
1.25 1015
12
3
10
10
1.92 1010
1.1
1
0.9
0.8
Baseline
0.7
0.6
0.5
3.31X
0.4
0.3
Full-Encryption
0.2
0.1
10 20 30 40 50
Bit Flips Per Write (%)
Average
1016
1
1
6.16 1010
5.32 1011
5.48 107
1.25 101
1.92 106
3
Write Per Block (billion)
60
128 bits
2.5
2
2.25
256 bits
1.5
1
0.5
0
384 bits
512 bits
10
20
0.255
30
40
50
60
Fig. 1. Bit-flip rate vs lifetime. When an encryption algorithm is employed, bit flips per write are increased from 12.5% to 50% (left)
resulting in 3.31 lifetime reduction. Effect of restricting the avalanche
effect on the number of tolerable write (right). Limiting the damage
caused by the avalanche effect, can effectively improve the lifetime
under different bit-flip rates.
0018-9340 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TC.2016.2642180, IEEE
Transactions on Computers
IEEE TRANSACTIONS ON COMPUTERS
Start B0
Bank1
Bank0
Rot (i+1)%512
131KB
B1
B2
Start B0
Start B0
Gap
Gap
Bank3
Bank2
B1
Rot i % 512
64B
Gap
Rot (i+1)%512
Gap
(a)
(b)
Start B0
Start B0
i Start B0
Gap
Gap
Memory Rank
Rot 0
8M
3
4KB
Bank4
Bank5
Gap
Start B0
Start B0
Gap
Bank7
Bank6
Gap
Start B0
Gap
Start B0
(c)
Fig. 2. Illustration of HWL mechanism: (a) Initially, all blocks are written
to the memory system with no rotation; (b) The state of blocks and
their corresponding RotationAmount after some Start-Gap rounds.
(c) The agility of HWL can be increased using a per-bank scheme.
0018-9340 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TC.2016.2642180, IEEE
Transactions on Computers
IEEE TRANSACTIONS ON COMPUTERS
TABLE 2
Baseline configurations.
Processor
L1 Cache
L1 Coherency
L2 Cache
DRAM Cache
Main Memory
Flash SSD
TABLE 3
Workloads characteristics under baseline configuration.
Comp. Ratio
(1)
(2)
E VALUATION M ETHODOLOGY
We perform micro-architectural level simulation of an outof-order processor model with ALPHA ISA using gem5
simulator [49]. NVSim [50] is used for detailed area, power
and timing models of the memory hierarchy. A 4-core
CMP system with 3 levels of caches and a PCM main
memory is considered. L1 is configured to be a private
cache while L2 and L3 are shared among the 4 cores. To
accommodate the long write latency of the PCM memory,
a large DRAM cache (16MB) and a SSD of unlimited size
as the lowest storage device with 25s response time are
employed. The line size of all caches and main memory is equal to 64B. Table 2 summarizes the evaluated
system. A set of multi-program applications from SPECCPU 2006 suite [51] are selected and characterized in
Table 3. All workloads were compiled with -O3 flag on
an Ubuntu 64-bit Linux system using GCC and Fortran
compilers. Since CryptoComps performance depends on
memory access rate and compression ratio, we selected
workloads with different combinations of memory access
intensity and compression ratio values. Main Memory
Write Traffic (WT) and Compression Ratio (CR) of the
accessed data are used to categorize the workloads. Applications with WT>1GB/S are called Memory-Intensive
(MI); Not-Memory-Intensive (NMI) applications are those
with WT<1GB/S . Workloads with CR>4 are categorized
as Highly-Compressible (HC), while applications with
2<CR<4 are known as Moderately-Compressible (CM);
0018-9340 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TC.2016.2642180, IEEE
Transactions on Computers
IEEE TRANSACTIONS ON COMPUTERS
60
50
40
30
20
10
0
6
5
4
3
2
1
0
Baseline
Full-Encryption
Comp.-Encryp.(active)
Comp.-Encryp.(all)
lib
b
g
z
g
lc zip2 ame eusm o
ss
p
mi
gem na hm les dea mc sph lbm gro gcc om per ast sje Av
ar ng g
f
net l
ma
l
inx
s md mer lie
pp
cs
Fig. 3. Bit-flip rate and average information entropy in 4 systems: i) baseline, ii) baseline+ecryption, iii) compression+encryption (only active
cells are considered), and iv) compression+encryption (all cells are considered). Increasing the information entropy, increases the bit-flip rate
except in systems iii and iv that limit the avalanche effect.
E=|
CBSE
4
LLC Port
AES
AES
d
Decomp.
AES
AES
c
Rotator
b
Fig. 4. Memory controller in the proposed scheme. Arrows 16 indicate the write path and ae path indicates the read path in memory
controller.
finally, applications with CR<2 are considered as PoorlyCompressible (PC). Hence, we have 6 workload classes,
namely MI-HC, MI-CM, MI-PC, NMI-HC, NMI-MC and
NMI-PC. Simulations are conducted for 8 billion instructions and the results of first 4 billion instructions are
ignored as warm-up.
T HE P ROPOSED S CHEME
In this section, we take three steps to present the proposed scheme: 1) we investigate the reason behind the
inefficiency of RBW technique for an encrypted PCM main
memory and show how RBW technique can be revived
to work well when encryption is employed; 2) a detailed
plan to progress toward our goal is then introduced,
and finally 3) the modified architecture is shown. Toward
these goals, we want to modify the memory controller to
employ a selective encryption scheme, as shown in Fig.
4. More clearly, we devise a compression based selective
encryption scheme along with data rotation unit to achieve
both the security and uniformity in cell usage. So, we
compress the data block first and then apply our selective
encryption. Then, using a shift register we rotate the
content of the memory line.
5.1
is given by:
Rotator
Read Buffer
Write Buffer
Comp.
DIMM Port
Reviving RBW
M
ax
X
i=1
pi log
1
|,
pi
(3)
0018-9340 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TC.2016.2642180, IEEE
Transactions on Computers
% of Blocks
100
80
60
40
20
0
0B-16B
lib mi bz2
lc
q
gam zeu go
ess smp
17B-32B
33B-48B
49B-64B
Fig. 5. Block size distribution after compression with 16B resolution (minimum length for an AES block).
Write# Size
2
3
4
0
1
1
0
Inactive
Counter
0
1
Active
Active Bits=m
Compressed Block
1 0
5.2
0018-9340 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TC.2016.2642180, IEEE
Transactions on Computers
IEEE TRANSACTIONS ON COMPUTERS
ETH=64
ETH=48
ETH=32
1 23
32
16B
SSC%2
AES
AES
AES
AES
AES
AES
AES
16B
ETH=16
1 2 3
16
16B
SSC%2
SSC%4
AES
AES
AES
AES
AES
AES
AES
Encrypted
Un-encrypted
AES
AES
Fig. 8. Selection procedure in CBSE algorithm: (a) for ETH=64, each block is encrypted regardless of its compressed size; (b) for ETH=48 and
compressed-block size = 64, 48 most-significant bytes are selected; (c),(d) for ETH=32 and 16, the block is divided into 32 and 16 segments and
from each segment byte#=SSC%2 and SSC%4 is selected, respectively.
1
2
3
4
5
6
7
8
9
Selective encryption
Despite i-NVMM (with a coverage of 76% for 4KB granularity), a good selective-encryption scheme should encrypt the plain-text with finer granularity and higher
coverage to increase security level [53], [54]. To do so,
Compressed Block Selective Encryption (CBSE) algorithm
adopts a byte-level, accurate and non-deterministic selective encryption approach that provides high coverage. The
details of the algorithm is as follows.
Compressed-block selective encryption (CBSE). To increase the coverage on one hand and extend the lifetime
on the other hand, we categorize the blocks into two sets:
fully- or partially-encrypted. This categorization is done
based on block size after compression and a random scheme.
Two thresholds are defined in CBSE algorithm for such
a categorization. One is Encryption Threshold (ETH) and
the other is Percentage Of Ignored blocks in Selective Encryption (POISE). Using ETH, any block whose size is less
than or equal to ETH is fully encrypted. For the remaining
blocks (size > ETH), a random scheme is applied based on
POISE to decide whether the block regardless of its size is
encrypted or not. More clearly, by generating a random
Write Operation. When a write arrives to memory controller, it is pushed into the write queue. Then, the following operations are realized in order, for each write picked
from the queue: compression selective encryption
rotation. Finally, the data block is delivered to the write
circuit.
Read Operation. When a data block is pushed into the
read queue, its content is manipulated before passing it
to the requester. Hence, after rotating its content, AES
engine decrypts the data block making it ready for decompression. The final result is passed to the requesting
unit.
6
6.1
E VALUATION R ESULTS
Metrics
0018-9340 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TC.2016.2642180, IEEE
Transactions on Computers
IEEE TRANSACTIONS ON COMPUTERS
(4)
Waver .N
M 1
System
Baseline
DEUCE
i-NVMM
CC(ETH=64)
CC(ETH=48)
CC(ETH=32)
CC(ETH=16)
i=1
6.3
Encryption
No
Full
Partial
Full
Partial
Partial
Partial
Degradation
No
Yes
Yes
Yes
Yes
No
No
Scheme
HWL
HWL
RBSS
RBSS
RBSS
RBSS
Evaluation results
Fig.9(a) presents bit-flips per write for different applications. The baseline system has a 12.5% bit-flip rate while all
other secured systems increase the rate due to encryption
(e.g. DEUCE= 29%). On the other hand, CryptoComp, in
v
u 512
the best case (ETH = 16), decreases this metric to 19%.
512
P
uP
u
ETH=16 and POISE=1% means CBSE under all circum(BFij
wij /512)2
N
X t j=1
1
j=1
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TC.2016.2642180, IEEE
Transactions on Computers
IEEE TRANSACTIONS ON COMPUTERS
50
Baseline
DEUCE
i-NVMM
ETH=64
ETH=48
ETH=32
ETH=16
40
30
20
10
0
(b) Normalized Bit Flips (normalized to baseline)
10
8
6
4
2
0
ge
ra
ve
en
sj
r
ta
as
rl
pe
ne
om
tp
c
gc cs
a
om
gr
m
lb
nx
hi
sp
cf
m
al
de
e
sli
le r
e
m
hm
d
m
na
s
m
ge
us
go
ze
es
ip
m
ga
bz
ilc
lib
Fig. 9. Comparing CryptoComp, baseline, DEUCE, and i-NVMM schemes. (a) Percentage of bit-flips per write: CryptoComp increases it by 8%
compared to the un-protected baseline while improving on DEUCE by 52%; (b) Since encryption reduces the number of active cells, normalized
bit-flips count is useed. CryptoComp reduces bit-flips count by 53% compared to i-NVMM.
DEUCE+HWL
Baseline+HWL
0.004
i-NVMM+HWL
ETH=64
ETH=48
ETH=32
ETH=16
0.003
0.002
0.001
0
ge
ra
ve
g
en
sj
r
ta
as
ac
rl
p
tp
ne
pe
om
gc
o
gr
m
lb
nx
hi
sp
cf
m
al
de
e
sli
le r
e
m
hm
d
m
na
s
m
ge
3
1
go p
s
es
m
us
ze
m
ga
2
ip
ilc
bz
q
lib
Fig. 10. Comparing CryptoComp, baseline, DEUCE, and i-NVMM schemes. Intra-block variation measures the uniformity of cells usage.
CryptoComp with ETH=64 (IntraV=0.002) is close to the ideal case (IntraV=0).
Sensitivity analysis
As previously discussed, CryptoComp uses some thresholds during its operation. Now, in this section, we clarify the conditions under which these thresholds provide
higher durability and security. These thresholds are as
follows.
Encryption threshold (ETH): in CBSE algorithm, if the
block size after compression is less than or equal to ETH,
we fully encrypt the block; otherwise, we should follow a
random scheme for selective-encryption.
Percentage of ignored blocks in selective encryption
(POISE): when we decide to do selective-encryption based
on block size, some blocks are occasionally selected for full
encryption regardless of their size.
Starting segment counter (SSC): a block location is partitioned into some segments and writes are started from
different starting segments.
Starting segment changing rate (SSCR): we increase the
SSC based on SSCR.
0018-9340 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TC.2016.2642180, IEEE
Transactions on Computers
IEEE TRANSACTIONS ON COMPUTERS
10
Baseline
2.5
DEUCE
i-NVMM
ETH=64
ETH=48
ETH=32
ETH=16
2
1.5
1
0.5
0
vg
ng
e
sj
ta
as
pp
t
ne
rl
pe
om
c
gc cs
a
om
gr
m
lb
nx
hi
sp
cf
m
al
de
e
sli
le r
e
m
hm
na
ge
go p
m
us
ze s
es
m
ga
2
ip
bz
ilc
q
lib
Fig. 11. Comparing CryptoComp, baseline, DEUCE, and i-NVMM schemes. Lifetime is used to measure the effectiveness of intra-block wearleveling and bit-flips reduction. Although DEUCE and i-NVMM improve the lifetime compared to the naive fully-encrypted system, both degrade
the lifetime compared to an un-protected system. Since CryptoComp uses the compression+rotation mechanism, each cell sees lower bit-flips
(while all cells flip more uniformly
Baseline
DEUCE
i-NVMM
ETH=64
ETH=48
ETH=32
ETH=16
100
90
80
70
60
50
ac
vg
A
g
en
sj
r
ta
as
rl
pe pp
t
ne
om
gc
om
gr
m
lb
nx
hi
sp
cf
m
al
de
e
sli
le r
e
m
hm
d
m
na
ge
go p
m
us
ze s
es
m
ga
2
ip
ilc
bz
lib
Fig. 12. Comparing CryptoComp, baseline, DEUCE, and i-NVMM schemes. As i-NVMM and CryptoComp are selective-encryption schemes, their
coverage shows the amount of un-encrypted bytes at the time of a power event.
30 DEUCE
25
20
15
10
20
30
POISE
40
50
1 HWL
0.95
0.9
0.85
0.8
10
20
30
40
50
POISE
ETH=48
100
1.8
99.5
1.6
1.4
1.2
Baseline
1
0.8
10
20
30
POISE
40
50
Coverage (%)
35
1.05
ETH=32
2
Normalized Lifetime
1.1
IntraV(*1E-3)
ETH=16
40
99
98.5
98
97.5
97
invmm
10
20
30
40
50
POISE
Fig. 13. Effects of POISE and ETH on (a) Bit-flips per write: for CryptoComp to be more effective than DEUCE, a POISE value of less than 35, 25
and 15 for ETH=16, 32 and 64 should be selected, respectively; (b) IntraV: for CryptoComp to have a more uniform RBSS than HWL, the POISE
value should be less than 35 for ETH=48. For other ETH values, CryptoComp behaves always better; (c) Lifetime: for ETH=48, CryptoComp
has a lower lifetime but for ETH=16 it shows better lifetime. With POISE<30, CryptoComp improves the lifetime over the baseline; (d) Coverage:
CryptoComp keeps more bytes secured than i-NVMM.
0018-9340 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TC.2016.2642180, IEEE
Transactions on Computers
IEEE TRANSACTIONS ON COMPUTERS
SSC=4
SSC=8
1.2
SSC=16
1.15
IntraV
Lifetime
1.3604
1.3602
1.36
1.3598
1.3596
1.3594
1.3592
1.359
11
1.1
1.05
1
0.95
1 2 3 4 5 6 7 8
SSCR
SSC=4
SSC=8
1.2
1.03
1.02
1.015
SSC=16
1.1
1.05
1
1.01
1.005
1.15
1.025
IntraV
Normalized Lifetime
1.035
5
SSCR
0.95
SSCR
5
SSCR
Fig. 14. Sensitivity analysis of SSCR and SSC for (up) ETH=16
and POISE=35%, and (down) ETH=32 and POISE=30%. Selecting
SSC=8 and SSCR=1 leads to more uniformity and longer lifetime.
TABLE 5
Best systems characteristics.
CCa
MDCb
MSCc
a
b
c
366
224
94
18
Coverage (%)
i-NVMM CCa
87
87
98
99
CryptoComp
Most durable CryptoComp (ETH=16, POISE=35, SSCR=1, SSC=8)
Most secure CryptoComp (ETH=32, POISE=30, SSCR=1, SSC=8)
Having two forms of CryptoComp (most-durable CryptoComp and most-secure CryptoComp), we are able to
compare the performance and total power consumption
of our proposal with DEUCE, i-NVMM and un-protected
baseline. We assume that all systems (except the unprotected baseline) use the counter-mode AES in order to
reduce the encryption/decryption latency cost. To obtain
the energy overhead estimates for AES, we use the power
overheads reported in [56] for CTR-AES. Fig.15 shows
the normalized IPC and total power consumption for the
mentioned systems. From the performance point of view,
i-NVMM and un-protected baseline are similar when
counter-mode AES is employed by i-NVMM. It is because:
1) just one XOR operation must be done for decryption
and 2) the scrubbing mechanism of i-NVMM is done in
the background which is off critical path. DEUCE needs
some other operations to recover the original block by
concatenating two parts: one recovered using the current
content of the counter and another part recovered using
the previous content of the counter. Most-durable CryptoComp and most-secure CryptoComp must run CBSE algorithm during write operation and one XOR operation and
shift during read operation. Hence, DEUCE, and the two
CryptoComp architectures respectively degrade the system performance by 1%, 2.15% and 2.1%, on average. For
some memory-intensive applications like libq, gamess and
milc, the degradation in performance is not higher than
5%. Among all evaluated systems, i-NVMM consumes less
power since it keeps the hot blocks un-encrypted (1%), but
DEUCE and the two CryptoComp configurations consume
15.3%, 8.4% and 8.5% extra power, respectively. Although
CryptoComp degrades the system performance by 2.1%
and increases power consumption by 8.5%, it keeps more
than 99% of memory lines encrypted and extends the
lifetime by 3. It must be considered that, we devise
this technique to work at main memory level. However,
0018-9340 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TC.2016.2642180, IEEE
Transactions on Computers
IEEE TRANSACTIONS ON COMPUTERS
1
0.98
0.96
0.94
0.92
0.9
1.2
1.15
1.1
1.05
1
0.95
0.9
12
Baseline
DEUCE
i-NVMM
MDC
MSC
mi
lc
l
h
mc
d
gem na
md mm eslie eal
f
s
er
sph lbm
inx
gro gcc
ma
cs
p
net erl
pp
om
ast sje Av
ng
ar
g
45
40
35
30
25
20
15
10
5
0
Normalized Values
% of Blocks
Fig. 15. Normalized IPC and total power consumption for different systems.
(a)
<1
1.02
1.01
1
0.99
0.98
0.97
0.96
0.95
0.94
(b)
CBSE
1 2 3 4 5 6 7 8
CBSE+
Power
Fig. 16. Percentage of blocks with different entropies (a); less than
15% of blocks have a >5 entropy. Comparing CBSE and CBSE+ (b).
9 10 11 12 13 14 15 16
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
To have a more secure CryptoComp and protect the sensitive data, we present CBSE+ algorithm, an scheme that
covers high-entropy data blocks fully encrypted. To do
so, we first analyzed the percentage of high-entropy data
blocks selected for partial-encryption in CryptoComp with
ETH=32. Since the maximum observable entropy in a 64Bblock is 6 (according to Eq.1), we categorized the blocks
in terms of entropy with resolution of 1, in Fig.16(a). As
can be deducted from this figure, less than 15% of data
blocks (out of 45% of total blocks selected for partialencryption) have an entropy of 5 or higher; these are
the blocks that have not benefited from compression and
entered into the selective-encryption process. Therefore, if
we cover such a small fraction of data blocks with fullencryption, we can assure that high-entropy data blocks
are immune. However, it is a time- and power-consuming
process to calculate the entropy for each data block during
the write operation. To avoid it, we can use the block size
after compression to select high-entropy data blocks for
full-encryption. It must be considered that due to the deficiency of compression algorithms not all large compressed
blocks have high-entropy, but our observation shows that
all high-entropy compressed data blocks are of large size.
Therefore, we can use the block size after compression to
predict sensitive data blocks.
Hence, we categorized the compressed data blocks into
3 categories: i) non-sensitive for block size < ETH, ii)
sensitive for block size > SDTH (sensitive data threshold),
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
Size=1-32: insensitive data/lifetime-friendly/full-encryption
Size=33-59: ordinary data/lifetime-friendly/partial-encryption
Size=60-64: sensitive data/security-friendly/full-encryption
Fig. 17. Categorizing data blocks with different sizes after compression.
0018-9340 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TC.2016.2642180, IEEE
Transactions on Computers
IEEE TRANSACTIONS ON COMPUTERS
CBSE
[4]
[5]
Insensitive Data
10
Ordinary Data
20
30
40
50
Block Size after Compression
Sensitive Data
Average Coverage
CBSE+
100
95
90
85
80
75
70
65
60
13
60
[6]
[7]
[8]
[9]
[10]
[11]
10
C ONCLUSION
In many respects, non-volatile memories (NVMs) are considered as promising alternatives for traditional DRAMand SRAM-based memory systems. However, due to the
short lifetime and security vulnerability of NVMs, replacing DRAM and SRAM with NVM causes some problems.
We proposed CryptoComp to improve the security and
lifetime of a phase change main memory by adopting
a selective-encryption scheme working on compressed
memory blocks. Our method selects sufficient bytes from
proper locations in the block to increase coverage using the
space freed after compression. Additionally, we adopted a
block-dependent rotation mechanism that made the cell
usage more uniform. Based on extensive simulations, two
best configurations of CryptoComp were proposed for
best lifetime and security. These two systems enhanced the
memory system lifetime by 3.59 and 3.66, compared
to DEUCE, while provided 98% and 99% coverage ratios.
Recently, several technologies have been proposed to replace conventional memories in different levels of memory
hierarchy, from CPU registers to external storage. Hence,
similar schemes for securing NV-register files, NV-caches
(L1, L2, ), and solid-state drives (SSDs) can be considered
as future works.
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
ACKNOWLEDGMENTS
The authors would like to thank Armin Ahmadzadeh,
Mohammad Sadrosadati, and Mahmood Naderan for their
assistance with setting up and maintaining the cluster
used to conduct the simulation experiments. This work
is supported in part by a grant from IPM.
[21]
[22]
[23]
R EFERENCES
[1]
[2]
[3]
[24]
[25]
Q. Guo, X. Guo, Y. Bai, R. Patel, E. Ipek, and E. Friedman, Resistive ternary content addressable memory systems for dataintensive computing, Micro, IEEE.
J. Wang, X. Dong, and Y. Xie, Enabling high-performance
lpddrx-compatible mram, ISLPED 14, pp. 339344, 2014.
J. Wang, X. Dong, and Y. Xie, Building and optimizing mrambased commodity memories, ACM Trans. Archit. Code Optim.
S. Cho and H. Lee, Flip-n-write: A simple deterministic technique to improve pram write performance, energy and endurance, MICRO, 2009.
Q. Guo, X. Guo, R. Patel, E. Ipek, and E. G. Friedman, Acdimm: Associative computing with stt-mram, in Proceedings of
the 40th Annual International Symposium on Computer Architecture,
ISCA 13, 2013.
P. Nair, C. Chou, B. Rajendran, and M. Qureshi, Reducing read
latency of phase change memory via early read and turbo read,
in High Performance Computer Architecture (HPCA), 2015 IEEE
21st International Symposium on, 2015.
M. K. Qureshi, M. M. Franceschini, A. Jagmohan, and L. A. Lastras, Preset: Improving performance of phase change memories
by exploiting asymmetry in write times, ISCA 12, 2012.
M. Jalili, M. Arjomand, and H. Sarbazi-Azad, A reliable 3d mlc
pcm architecture with resistance drift predictor, DSN, 2014.
M. Jalili and H. Sarbazi-Azad, Captopril: Reducing the pressure
of bit flips on hot locations in non-volatile main memories, in
DATE, 2016.
S. Schechter, G. H. Loh, K. Straus, and D. Burger, Use ecp, not
ecc, for hard failures in resistive memories, in Proceedings of
the 37th Annual International Symposium on Computer Architecture,
ISCA, 2010.
R. Azevedo, J. D. Davis, K. Strauss, P. Gopalan, M. Manasse, and
S. Yekhanin, Zombie memory: Extending memory lifetime by
reviving dead blocks, ISCA, 2013.
J. Fan, S. Jiang, J. Shu, Y. Zhang, and W. Zhen, Aegis: Partitioning data block for efficient recovery of stuck-at-faults in phase
change memory, in Proceedings of the 46th Annual IEEE/ACM
International Symposium on Microarchitecture, MICRO, 2013.
M. Asadinia, M. Arjomand, and H. S. Azad, Prolonging lifetime of pcm-based main memories through on-demand page
pairing, ACM Trans. Des. Autom. Electron. Syst., 2015.
A. N. Jacobvitz, R. Calderbank, and D. J. Sorin, Coset coding
to extend the lifetime of memory, in Proceedings of the 2013
IEEE 19th International Symposium on High Performance Computer
Architecture, HPCA, 2013.
M. Qureshi, A. Seznec, L. Lastras, and M. Franceschini, Practical and secure pcm systems by online detection of malicious write streams, in High Performance Computer Architecture
(HPCA), 2011 IEEE 17th International Symposium on, pp. 478489,
Feb 2011.
B. D. Yang, J. E. Lee, J. S. Kim, J. Cho, S. Y. Lee, and B. G. Yu, A
low power phase-change random access memory using a datacomparison write scheme, in 2007 IEEE International Symposium
on Circuits and Systems, pp. 30143017, 2007.
V. Young, P. J. Nair, and M. K. Qureshi, Deuce: Write-efficient
encryption for non-volatile memories, in Proceedings of the 20St
International Conference on Architectural Support for Programming
Languages and Operating Systems, ASPLOS, 2015.
S. Chhabra, B. Rogers, Y. Solihin, and M. Prvulovic, Secureme:
A hardware-software approach to full system security, in Proceedings of the International Conference on Supercomputing, ICS,
2011.
P. Peterson, Cryptkeeper: Improving security with encrypted
ram, in 2010 IEEE International Conference on Technologies for
Homeland Security (HST), 2010.
M. Henson and S. Taylor, Beyond full disk encryption: Protection on security-enhanced commodity processors, in Proceedings of the 11th International Conference on Applied Cryptography
and Network Security, ACNS, 2013.
G. Duc and R. Keryell, Cryptopage: An efficient secure architecture with memory encryption, integrity and information leakage
protection, in Computer Security Applications Conference, 2006.
ACSAC 06. 22nd Annual, 2006.
X. Zhuang, T. Zhang, and S. Pande, Hide: An infrastructure for
efficiently protecting information leakage on the address bus,
in Proceedings of the 11th International Conference on Architectural
Support for Programming Languages and Operating Systems, ASPLOS XI, 2004.
0018-9340 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TC.2016.2642180, IEEE
Transactions on Computers
IEEE TRANSACTIONS ON COMPUTERS
14
Hamid Sarbazi-Azad received the B.Sc. degree in electrical and computer engineering
from Shahid Beheshti University, Tehran, Iran,
in 1992, the M.Sc. degree in computer engineering from Sharif University of Technology, Tehran, in 1994, and the Ph.D. degree
in computing science from University of Glasgow, Glasgow, U.K., in 2002. He is currently
a professor at the Department of Computer
Engineering, Sharif University of Technology,
and heads the School of Computer Science in
the Institute for Research in Fundamental Sciences (IPM), Tehran,
Iran. His research interests include high-performance computer architectures, networks-on-chip, and systems-on-chip, memory/storage
systems, and social networks, on which he has published over 300
refereed conference and journal papers. Prof. Sarbazi-Azad was a
recipient of the Khwarizmi International Award in 2006, the TWAS
Young Scientist Award in engineering sciences in 2007, and the
Sharif University Distinguished Researcher Awards in 2004, 2007,
2008, 2010, and 2013. He has served as the Editor-in-Chief of the
CSI Journal on Computer Science and Engineering, an Associate
Editor of IEEE TC and ACM Computing Surveys, and an Editorial
Board Member of the Elseviers Computers and Electrical Engineering
journal, the International Journal of Computers and Their Applications,
the Journal of Parallel and Distributed Computing and Networks, and
the International Journal of High-Performance Systems Architecture.
0018-9340 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.