Page 1
Copyright 2001 Hewlett-Packard Company
Learning objectives
At the end of this session you will be able to: Recognize and explain:
The correct usage of threads in Java How to recognize contention The best way to avoid contention
Discuss the different data structures available Determine the optimal number of threads required
Page 2
2001 Hewlett-Packard Company
I/O CPU
Scheduling
Page 3
2001 Hewlett-Packard Company
4 5 6
CPU GC
Thread 0
CPU
7
Page 4
2001 Hewlett-Packard Company
Page 5
2001 Hewlett-Packard Company
Generate with: kill s SIGQUIT <pid> Dumps stack trace of every Java thread
Includes monitor information
waiting on locked
Page 6
2001 Hewlett-Packard Company
Page 7
2001 Hewlett-Packard Company
"Reader Thread 12" prio=10 tid=0x135c290 nid=85 lwp_id=14212 waiting on monitor [0xed1c000..0xed1c478]
at java.lang.Object.wait(Native Method)
Page 8
2001 Hewlett-Packard Company
Finalizer daemon
Runs java class finalizers
VM Thread
Does all the hard work of the runtime, such as compiling, doing GCs, and so on
VM Periodic Task
Simulates timer interrupts Some tasks are scheduled to execute when the timer fires
Page 9
2001 Hewlett-Packard Company
Java Programs
CPU CPU CPU
Disk
Easy to write
Memory
Cache
Consequences
Contention for shared resources Java Sockets API requires many threads
2001 Hewlett-Packard Company
Page 10
Contention
Page 11
2001 Hewlett-Packard Company
Page 12
2001 Hewlett-Packard Company
Efficient and low overhead Class header Object header non-static fields C++ vtable method table ptr static fields GC maps
Page 13
2001 Hewlett-Packard Company
Page 14
2001 Hewlett-Packard Company
header L
Object
If another thread tries to obtain the same monitor: Lock inflation (uses a pthread mutex)
Page 15
2001 Hewlett-Packard Company
Monitor/Mutex
Enter queue and wait Repeat: spin, sched_yield Next waiter given lock
th r
ad
re
th
Monitor/Mutex
th
re
thread 1
thread 3
ad
re
thread 1
ad
th
Monitor/Mutex thread 1
z z z
M Lou ctk ex
Monitor/Mutex
th r
thread 1
thread 3
ea
ea
th r th r th r
th r
ea ea
ea
d1 d1
d3
th r
ea
ea
d3
d1
Page 16
Monitors
Implemented using a single word in memory that serves as the lock word
Consequences:
Threads time is spent accessing one word spinning
Monitor/Mutex
th r
thread 7
thread 1
ad
re
th
Monitor/Mutex
th
re
thread 1
thread 3
ad
re
thread 1
ad
thread 6
thread 3
th
Memory
Monitor/Mutex thread 1
z z z
thread 8
thread 4
thread 5
M Lou ctk ex
Monitor/Mutex
th r
thread 1
thread 3
ea
ea
th r th r th r
th r
ea ea
ea
d1 d1
d3
th r
ea
ea
d3
d1
Page 18
pthread mutex contention can be detected using specially instrumented libraries Using these libraries with the Classic JVM we were able to:
Identify monitor contention Reduce the contention for monitors
The following are the steps that we took to accomplish this work
Note:
Number of Requests
Heap Lock
1,602,025
1,468,817 325,053
Heap Lock
Page 20
Number of Waiters
48,872
thread
thread
thread
thread
thread
thread
thread
thread
thread
thread
t hr
t hr
e thr
libc._mem_rmutex (malloc)
Heap Lock
thr e
e thr
ea thr
thr ea
thr ea
thr e
ea
ea
thr ea
thr ea
ad
d ea thr
d ea thr
d ea thr
ad
thr ea
ad
d ea thr
thread
thr ea
thr ea
d ea thr
d ea thr
d ea thr
thread
thr ea d
2.7
thr ea d
ad d t hr ea d
3.2
d d
thr ea
thr ea
d ea thr thr e
d ea thr
ad
10 8 6 4 2 0
8.7
thread thread
d ea thr
thread
Page 21
Page 22
2001 Hewlett-Packard Company
Page 23
2001 Hewlett-Packard Company
Implementation of Monitors
System Monitors
Heap Cls Ldr JNI Pin Thr Cr Comp Monitor
Monitor Cache
Header
Header
Header
Header
Page 24
2001 Hewlett-Packard Company
Page 25
2001 Hewlett-Packard Company
ad hre
Monitor
thread 1
Monitor Cache
Header Header
Header
thread 3
Header
thread 2
Header
Header
Header
Header
Header
Page 26
2001 Hewlett-Packard Company
-montlscache=N
N is the number of thread local cache entries Default is 8
Page 27
2001 Hewlett-Packard Company
Page 28
2001 Hewlett-Packard Company
Monitor Cache 1
thread 1
Monitor
Monitor Cache 2
thread 3
Monitor
Monitor Cache 3
thread 2
Page 29
2001 Hewlett-Packard Company
Page 30
2001 Hewlett-Packard Company
Page 31
2001 Hewlett-Packard Company
Page 32
2001 Hewlett-Packard Company
Page 33
2001 Hewlett-Packard Company
Ethernet
ai tin g
JVM Process
thread thread thread thread thread
thread thread thread thread thread thread thread thread thread thread thread thread thread thread thread thread thread thread thread thread thread thread thread thread thread thread thread thread thread thread
on
I/O
Ethernet
Th re ad s
bl oc ke d
Ethernet
Ethernet
Ethernet Ethernet
Ethernet
Page 34
2001 Hewlett-Packard Company
Solutions
Page 35
2001 Hewlett-Packard Company
Page 36
2001 Hewlett-Packard Company
Ethernet
ai tin g
JVM Process
thread thread thread thread thread
thread thread thread thread thread thread thread thread thread thread thread thread thread thread thread thread thread thread thread thread thread thread thread thread thread thread thread thread thread thread
on
I/O
Ethernet
Th re ad s
bl oc ke d
Ethernet
Ethernet
Ethernet Ethernet
Ethernet
Page 37
2001 Hewlett-Packard Company
Ethernet
JVM Process
thread thread thread thread thread
Ethernet
Ethernet
Worker Thread Worker Thread Worker Thread Worker Thread Worker Thread Worker Thread
Poll Thread
Ethernet
Ethernet Ethernet
Ethernet
Page 38
2001 Hewlett-Packard Company
User System
Page 39
2001 Hewlett-Packard Company
l ol P
Iu AP
sS se
m te ys
PU C
Page 40
2001 Hewlett-Packard Company
Page 41
2001 Hewlett-Packard Company
Page 42
2001 Hewlett-Packard Company
Thread number
Reduced from 100s or 1000s to dozens
Poll thread
Delegates work to worker threads Constantly busy
Fewer threads
Increase throughput
Page 43
2001 Hewlett-Packard Company
import com.hp.io.Poll; static Poll fds[]; int nfds = 0; // array of file descriptors // number of file descriptors in array
Page 44
2001 Hewlett-Packard Company
Buffers for data of primitive types Character-set (charsets) and decoders and encoders
Translate between bytes and Unicode characters
Pattern-matching facility based on Perl-style regular expressions Channels: New primitive I/O abstraction File interface that supports locks and memory mapping Multiplexed, non-blocking I/O facility for writing scalable servers
Page 45
2001 Hewlett-Packard Company
Buffers
Containers for data
Page 46
2001 Hewlett-Packard Company
Channel - A nexus for I/O operations ReadableByteChannel - Read into a buffer ScatteringByteChannel - Read into sequence of buffers WritableByteChannel - Write from a buffer GatheringByteChannel - Write from sequence of buffers ByteChannel - Read/write to/from a buffer
Page 47
2001 Hewlett-Packard Company
Selector - Multiplexor of selectable channels SelectionKey - Token representing the registration of a channel with a selector Pipe - Two channels that form a unidirectional pipe
Page 48
2001 Hewlett-Packard Company
Page 49
2001 Hewlett-Packard Company
Page 50
2001 Hewlett-Packard Company
// The select method will return when the operations registered // have occurred, the thread has been interrupted, or other ... while ((keysAdded = acceptSelector.select()) > 0) { // Key or keys have been returned, so process it (or them) Set readyKeys = acceptSelector.selectedKeys(); Iterator iter = readyKeys.iterator(); // Walk through the set of keys and process the requests while (iter.hasNext()) { SelectionKey sk = (SelectionKey)iter.next(); iter.remove(); // SelectionKey indexes into the Selector to obtain the // server socket channel ServerSocketChannel nextReady = (ServerSocketChannel)sk.channel(); Socket s = nextReady.accept(); // Accept request // Perform the designated operation on the socket } } Page 51
2001 Hewlett-Packard Company
AbstractList
Skeletal implementation of the List interface Backed by a "random access" data store (such as an array) ArrayList Resizable-array implementation of the List interface An unsynchronized Vector class LinkedList Doubly linked list implementation of the List interface get() and set() are less efficient than ArrayList get/set First/Last() are efficient use for stack or queue Vector Implements a growable array of objects an array Stack
Last-in-first-out (LIFO) stack of objects Use LinkedList instead 2001 Hewlett-Packard Company
Page 53
AbstractMap
Skeletal implementation of the Map interface Mapping between key objects and value object pairs Unique keys HashMap Hash table based implementation of the Map interface Unsynchronized version of Hashtable TreeMap Red-Black tree based implementation of the SortedMap interface
Hashtable
Implements a hashtable, which maps key objects to values
Page 54
2001 Hewlett-Packard Company
AbstractSet
Skeletal implementation of the Set interface HashSet
Implements the Set interface (unique elements only) Uses a HashMap instance
TreeSet
Implements SortedSet Uses a TreeMap instance
Page 55
2001 Hewlett-Packard Company
Interface Class List ArrayList LinkedList Vector Stack Map HashMap TreeMap Set HashSet TreeSet
HashTable Yes
Collection element access then avoids access methods Element access is direct
Create customized implementations of Hashtable
Page 57
Thread notification
Use notify() rather than notifyAll()
notifyAll wakes up all threads and will cause much unneeded thrashing Servlets
Minimize synchronization
JDBC
Use JDBC connection pooling size correctly Release JDBC resources when done Reuse datasources for JDBC connections
Page 58
2001 Hewlett-Packard Company
Consider:
n=number of active threads k=number of CPUs (n < k) CPUs are underutilized (n == k) Ideal conditions, but each CPU will probably be under-utilized (I/O and other considerations) (n > k)
Greater such that CPUs are always busy is ideal Greater by a large amount leads to significant performance degradation from contention, starvation and context switching Hewlett-Packard Company 2001
Page 59
Java Monitors
http://developer.java.sun.com/developer/Books/performance2/chap4.pdf Chapter 4 of
"High Performance Java Computing : Multi-Threaded and Networked Programming", "Monitors" (Page last updated January 2001, Authors George Thiruvathukal, Thomas Christopher)
Java Monitors
http://developer.java.sun.com/developer/Books/performance2/chap4.pdf
Good summary of policies for synchronizing threads trying to read from or write to shared resources:
One thread at a time Readers-preferred (readers have priority) Writers-preferred (writers have priority) Alternating readers-writers (alternates between a single writer and a batch of readers) Take-a-number (first-come, first-served)
Page 62
2001 Hewlett-Packard Company
Paradox:
Monitor contention problems can be masked by specifically binding a process with contention problems to a single CPU All accesses to the lock word are local to that CPU
High system time (more threads degrade performance) Frequent calls to sched_yield, ksleep, kwakeup
SIGQUIT will tell you immediately if you have a problem HPjmeter for JDK 1.1.5+ and 1.2 uses heuristics to estimate monitor contention JDK 1.3.1 Xeprof collects exact information on monitor Page 63 contention and HPjmeter 1.2 displays the information
2001 Hewlett-Packard Company
Page 64
2001 Hewlett-Packard Company
Page 65
Copyright 2001 Hewlett-Packard Company
http://www.weblogic.com/docs51/techdeploy/jdbcperf.html Weblogic JDBC tuning Last updated April 1999, BEA Systems
Page 66
2001 Hewlett-Packard Company
http://www.weblogic.com/docs51/techdeploy/jdbcperf.html Weblogic JDBC tuning Last updated April 1999, BEA Systems
Page 67
2001 Hewlett-Packard Company
Counting entries in a table (e.g. using SELECT count(*) from myTable, yourTable where ... ) is resource intensive
Try first selecting into temporary tables, returning only the count, and then sending a refined second query to return only a subset of the rows in the temporary table
Never let a DBMS transaction span user input Consider using optimistic locking
Optimistic locking employs timestamps to verify that data has not been changed by another user, otherwise the transaction fails
Page 70
2001 Hewlett-Packard Company