Largest Japanese IT company $43 Billion in annual revenue 143,000 staff www.nec.com
Owns & sells
Scalable disk based storage for backup with global deduplication Started in 2003 in NEC Labs by Cezary Dubnicki 2007 Product of the year award by SearchStorage.com 2008 Product innovation award by Network Products Guide 2009/2010 FAST conference publication in San Jose Sold in US and Japan since 2007 Will be sold in Poland in 2011 by 9LivesData in coop. with NEC
Backup storage
Sensitive environment requirements Unreliable restore Low performance Manual labor or expensive robots Problematic replication
4-12+ full backups 7-30+ incremental Majority of data does not change Data compression 2:1
5x-20x more than primary storage Includes many copies of the same data Each data chunk stored 5-10+ times
4-12+ full backups 7-30+ incremental Majority of data does not change Data compression 2:1
5x-20x more than primary storage Includes many copies of the same data Each data chunk stored 5-10+ times
Deduplication
Save disk space by eliminating duplicates Sample reduction ratio 10:1 Lowers price of gigabyte
Sub-file level deduplication
File A File B File A
A B C A D E A B C
Only unique blocks are stored Stored blocks
A B C D E
Global deduplication
HYDRAstor product
Provides
global deduplication using DataRedux performance, storage scalability and data resiliency using Distributed Resilient Data
HYDRAstor deployment
Interface: CIFS, NFS, Symantec OST Marker filtering for: Tivoli, Netbackup, Networker, CommVault
10
HYDRAstor architecture
11
HYDRAstor architecture
12
HYDRAstor scalability
Storage: 12 TB 240 TB* Performance: 1.3 TB / hour Storage: 48 TB 960 TB* Performance: 3.6 TB / hour Storage: 480 TB 9600 TB* Performance: 36 TB / hour
2AN 4SN
13
HYDRAstor scalability
14
Recovery of lost data resiliency Periodic data scrubbing Machine and disk failure recovery erasure coding better than RAID6
15
16
Programming Model
Repository of blocks
hash=011..0
17
Programming Model
Repository of blocks
hash=011..0
18
Programming Model
Repository of blocks
hash=010..1 Root1 E
E
011. .0
hash=011..0
19
Programming Model
Repository of blocks
hash=010..1 Root1 E
Root2 E
hash=110..0
E
011. .0
hash=011..0
01 1. .0
20
Programming Model
Repository of blocks
hash=010..1 Root1 E
Root2 E
hash=110..0
E
011. .0
hash=011..0
01 1. .0
21
Programming Model
Repository of blocks
hash=010..1 Root1 E
Root2 E
hash=110..0
E
011. .0
hash=011..0
01 1. .0
22
Programming Model
Repository of blocks
hash=010..1 Root1 E
Root2 E
hash=110..0
E
011. .0
hash=011..0
01 1. .0
23
Programming Model
Repository of blocks
Root2 E
hash=011..0
01 1. .0
Trees of blocks
24
Decode
Encode
Original block
Original Fragments
25
Decode
Encode
Original block Mirror Resiliency Overhead 1 100%
Original Fragments
26
00
01
10
11
27
Block location: DHT with prefix routing Block mapped to hash prefix
0 empty prefix
hash=011..0 Block
1
00
01
10
11
28
Block location: DHT with prefix routing Block mapped to hash prefix Prefix components
empty prefix
hash=011..0 Block
1
N=4
11
01
10
1 3 2 0 0 2 3 1
0 1
3 2 1 3 0
29
Block location: DHT with prefix routing Block mapped to hash prefix Prefix components
empty prefix
hash=011..0 Block
1
N=4
11
01
10
1 3 2 0 0 2 3 1
0 1
3 2 1 3 0
30
Block location: DHT with prefix routing Block mapped to hash prefix Prefix components
empty prefix
hash=011..0 Block
1
N=4
11
01
10
1 3 2 0 0 2 3 1
0 1
Distributed consensus
3 2 1 3 0
31
Block location: DHT with prefix routing Block mapped to hash prefix Prefix components
empty prefix
hash=011..0 Block
1
N=4
11
01
10
1 3 2 0 0 2 3 1
0 1
Distributed consensus
3 2 1 3 0
32
Block location: DHT with prefix routing Block mapped to hash prefix Prefix components
empty prefix
hash=011..0 Block
1
N=4
11
01
10
Distributed consensus
33
Block location: DHT with prefix routing Block mapped to hash prefix Prefix components
empty prefix
hash=011..0 Block
1
N=4
11
01
10
Distributed consensus
34
Block location: DHT with prefix routing Block mapped to hash prefix Prefix components
empty prefix
hash=011..0 Block
1
N=4
11
01
10
35
36
B
Hash 101
C
Hash 110
D
Hash 011
E
Hash 000
F
Hash 011
G
Hash 100
37
B
Hash 101
C
Hash 110
D
Hash 011
E
Hash 000
F
Hash 011
G
Hash 100
Data stream split to blocks Hashes of blocks computed Routing through DHT
38
B
Hash 101
C
Hash 110
D
Hash 011
E
Hash 000
F
Hash 011
G
Hash 100
Data stream split to blocks Hashes of blocks computed Routing through DHT
Prefix 01
39
B
Hash 101
C
Hash 110
D
Hash 011
E
Hash 000
F
Hash 011
G
Hash 100
Data stream split to blocks Hashes of blocks computed Routing through DHT
40
B
Hash 101
C
Hash 110
D
Hash 011
E
Hash 000
F
Hash 011
G
Hash 100
Data stream split to blocks Hashes of blocks computed Routing through DHT
0
Component
1
Component
2
Component
41
B
Hash 101
C
Hash 110
D
Hash 011
E
Hash 000
F
Hash 011
G
Hash 100
Data stream split to blocks Hashes of blocks computed Routing through DHT
0
Component
A D F
1
Component
A D F
2
Component
A D F
A D F
42
B
Hash 101
C
Hash 110
D
Hash 011
E
Hash 000
F
Hash 011
G
Hash 100
Data stream split to blocks Hashes of blocks computed Routing through DHT
Component
0
Component
A D F
1
Component
A D F
2
Component
A D F
A D F
Synchrun
43
B
Hash 101
C
Hash 110
D
Hash 011
E
Hash 000
F
Hash 011
G
Hash 100
Data stream split to blocks Hashes of blocks computed Routing through DHT
Erasure-coded fragments stored by components Grouped into synchruns Containers stored on disks
Component
0
Component
A D F
1
Component
A D F
2
Component
A D F
3 Container
A D F
Synchrun
44
B
Hash 101
C
Hash 110
D
Hash 011
E
Hash 000
F
Hash 011
G
Hash 100
Data stream split to blocks Hashes of blocks computed Routing through DHT
Erasure-coded fragments stored by components Grouped into synchruns Containers stored on disks
Component
0
Component
A D F
1
Component
A D F
2
Component
A D F
Fragment metadata separately from data Preserve order & locality Manageable
3 Container
A D F
Synchrun
45
01:0
Component
01:1
Component
01:2
Component
01:3
46
01:0
Component
01:1
Component
01:2
Component
01:3
Chain scanning
47
01:0
Component
01:1
Component
01:2
Component
01:3
Chain scanning
48
01:0
Component
01:1
Component
01:2
Component
01:3
Chain scanning
49
01:0
Component
01:1
Component
01:2
Component
01:3
Chain scanning
50
Component
01:0
Component
01:1
Component
01:2
Component
01:3
51
Component
01:0
Component
01:1
Component
01:2
Component
01:3
52
Component
01:0
Component
01:1
Component
01:2
Component
01:3
53
01:0
Component
01:1
Component
01:2
Component
01:3
54
01:0
Component
01:1
Component
01:2
Component
01:3
Data transfer
Old component 01:3
55
01:0
Component
01:1
Component
01:2
Component
01:3
Data transfer
Old component 01:3
56
01:0
Component
01:1
Component
01:2
Component
01:3
Data transfer
Old component 01:3
57
01:0
Component
01:1
Component
01:2
Component
01:3
58
01:0
Component
01:1
Component
01:2
Component
01:3
Completeness: definitely not a duplicate Deletion interaction: wasn't the block scheduled for deletion?
59
01:0
Component
01:1
Component
01:2 Query
Component
01:3
60
01:0
Component
01:1
Component
01:2
Component
01:3
61
01:0
Component
01:1
Component
01:3
Candidate verification
62
Block reference counter calculated independently on peer Container chains duplicates resurrection after garbage collection space reclamation in background
63
Resource management
64
Features and technical details of HYDRAstor Sales of HYDRAstor in Poland Cooperation with 9LivesData on other projects
65
Questions?