Anda di halaman 1dari 35

GemStone Collection Views

for Instant Display of GS Collections

by Paul Baumann
IntercontinentalExchange, Inc.
2011.03.15
Just because...

 To share ongoing work toward addressing


common performance problems.
 To show techniques that others may find
interesting.
 To encourage developers to think outside
the box.
 It was fun to develop.
Installation Sources

Public Store Repository

http://www.cincomsmalltalk.com/CincomSmalltalkWiki/PostgreSQL+Access+Page

http://techsupport.gemstone.com/forums

“Efficient GemStone Enumeration”


Prerequisites

“Efficient GemStone Enumeration” base code


- FROM -

http://www.visualworksforums.org/index.php?topic=35.0

- OR -

http://techsupport.gemstone.com/forums/37605-gss-tips-tricks

- OR -

Store Public Repository package


GS_CollectionPerformanceBase
Installation Option 1

• Download files from the “GSS Tips & Tricks”


section of the GemStone forums website:
http://techsupport.gemstone.com/forums

• Load the parcel named VW_CollectionViews.

• Use a topaz session to file-in


GS_CollectionViews.gs as SystemUser.

• Detailed instructions in the method


BcvCollectionViews class>> #setupInstructions.
Installation Option 2 Public Store Repository

• Use a VisualWorks image to connect to


the Cincom Public Store Repository
• Load “GemKit GemStone Source Code Management”.

• Load PlbCollectionViews.

• Detailed instructions in the method


BcvCollectionViews class>> #setupInstructions.
Session Configuration

Session login parameters must be configured


to connect the Collection View classes.

CollectionViews.BcvCollectionView class
>>setupInstructions_classConnectors
Collection>>view:
aCollection view: #([attributeAlias @] attributeGetter ...)

(Globals select: [:ea | ea isBehavior ])


view: #(
name
attributeAlias is a symbol that a row
instVarNames
attribute can be known as; the
default is the last symbol of the instSelectors @ selectors
attributeGetter. classVarNames
classSelectors @ (class selectors)
attributeGetter is either a symbol or )
an array of symbols that will be
sent to each object.
Customers view: #(
name
(account money_received)
(account money_sent)
(account balance)
address
)
Why Use Collection Views?

• An efficient way to display attribute values


from a large GS collection.
• An efficient way to #collect: a set of result
objects from a large GS collection.
• An efficient way to replicate results to a
GBS client image.
• Similar use and benefit as a table View of a
relational database.
Relational Database View
In database theory, a view consists of a stored query accessible as a virtual
table composed of the result set of a query.
query Unlike ordinary tables (base
tables) in a relational database, a view does not form part of the physical
schema: it is a dynamic, virtual table computed or collated from data in the
database.
database Changing the data in a table alters the data shown in subsequent
invocations of the view.

Views can provide advantages over tables:

* Views can represent a subset of the data contained in a table


* Views can join and simplify multiple tables into a single virtual table
* Views can act as aggregated tables,
tables where the database engine aggregates
data (sum, average etc) and presents the calculated results as part of the
data
* Views can hide the complexity of data; for example a view could appear as
Sales2000 or Sales2001, transparently partitioning the actual underlying
table
* Views take very little space to store; the database contains only the
definition of a view, not a copy of all the data it presents
* Depending on the SQL engine used, views can provide extra security
* Views can limit exposure of a table or tables to the outer world
SQL Comparison

SQL View GS View


SELECT Account>>balance
name, ^money_received - money_sent
money_received, Customer>>account
money_sent, “^Accounts at: customer_id”
(money_received - money_sent) ^account
AS balance,
address Customers view: #(
FROM table_customers c name
JOIN accounts_table a (account money_received)
ON a.customerid = c.customer_id (account money_sent)
(account balance)
address
)
SQL View vs. 1.0 GS View
• JOIN is achieved through unary messages through the
object graph.
• OUTER JOIN is not part of 1.0.
• GROUP BY support is not complete in 1.0.
• WHERE is achieved with a separate #select:
statement.
• GS Views can allow updates, but SQL views also allow
INSERT and DELETE.
• SQL result functions like AVG, COUNT, MAX, MIN, SUM
are not part of 1.0.
• 1.0 GS Views can only retrieve attribute values from a
sequence of unary messages (not binary or
keyword).
1.0 Objectives
• Quickly replicate some attributes of a large GS
collection to a client image for display.
• Minimize the time spent gathering attribute values.
• Reduce server paging costs.
• Avoid creating an object in GS for each result row.
• Return data to client as needed (in chunks of rows).
• Avoid replicating entire graphs of objects just to return
some attribute values.
• Easily control replication without replication specs.
GS View 1.0 Syntax
aCollection
[by: chunkSize]
view: #([replicationLevels] [attributeAlias @] attributeGetter ...)

EXAMPLE query from GBS client image:


chunkSize declares the number of rows
that are replicated in each client GBSM evaluate: ‘
traversal. 0 disables chunking.
(Globals select: [:ea | ea isBehavior ])
replicationLevels is an integer 0-6 (default by: 50 view: #(
is 1) that controls the depth that an
attribute is replicated to a client
name
image. 0 avoids replicating the object. 2 instVarNames
2 instSelectors @ selectors
attributeAlias is a symbol that a row
attribute can be known as; the default 2 classVarNames
is the last symbol of the 2 classSelectors @ (class selectors)
attributeGetter. 0 yourself
attributeGetter is either a symbol or an )'.
array of symbols that will be sent to
each object.
Traditional Result Sets
Collect to indexed arrays Collect to keyed collection
GBSM evaluate: ‘ GBSM evaluate: ‘
(Globals select: [:ea | ea isBehavior]) (Globals select: [:ea | ea isBehavior ])
collect: [:ea | collect: [:ea |
#[ea name, Dictionary new
ea instVarNames, at: #name put: (ea name);
at: #instVarNames
ea selectors,
put: (ea instVarNames);
ea classVarNames,
at: #selectors put: (ea selectors);
ea class selectors,
at: #classVarNames
ea]
put: (ea classVarNames);
].'. at: #classSelectors
put: (ea class selectors);
at: #yourself put: ea;
Collect to row objects
yourself
GBSM evaluate: ‘ ].'.
(Globals select: [:ea | ea isBehavior ])
collect: [:ea |
ClassQueryRow newForClass: ea
].'.
Traditional Result Sets
• All three styles have a cost of creating (and later disposing) at
least one object created for each row. The ‘keyed collection’
style also has hash costs.
• Only the ‘row objects’ style can be extended to apply changes
back to the original object model.
• Attribute values are not easily refreshed to reflect object changes.
• Have maintenance costs:
– ‘indexed arrays’ style seems dynamic but requires application
code to know field position.
– Row objects are not dynamic and require class connectors.
• Replication control
– None have chunked replication.
– It is difficult to control replication depth of indexed and keyed
rows.
GS View Row

• Has all the advantages of the three Traditional


Result Set styles.
• Has distributed workload and chunked replication.
• Avoids cost of creating a GS object for each row
attribute, yet still preserves row identity in
client image.
• Reduced client memory footprint (only one chunk
at a time plus any individual rows strongly
referenced).
• Rows attributes can be refreshed to reflect the
current object state.
Rows are Special

• Views are composed of an index for


each attribute--not row instances.
• Row instances are only created on-
demand in GS or Client.
• Identity is preserved for rows created in
the client.
• Equality is preserved for rows created in
GS.
• This is the most efficient design for how
views are intended to be used.
Better Replication

• You can specify a view-specific fault


level for each attribute.
• Fewer class connectors are needed
• You can return attributes through a
graph of GS objects that lack a
client representation (like class and
connector).
Attribute Replication Levels
Customers view: #( Customers view: #(
name 1 name
(account money_received) 1 (account money_received)
(account money_sent) 1 (account money_sent)
(account balance) 1 (account balance)
address 0 address
) )

• 0 replicationLevels leaves the object in GS—


typically a stub in VW.
• Used to keep a row associated with objects
without replicating them.
• If an attribute value is already client-replicated
then it stays replicated.
Attributes Values are a Snapshot

• Attribute values do not automatically


refresh when the GS object they came
from does.
• Can be useful if you want to see and edit a
consistent view of data regardless of
subsequent GS changes.
• You can refresh attribute values.
• You can compare attribute values with
current object state.
Views Can Be Refreshed
aView refresh
aView refreshAttributes
• Requires #yourself as attribute.
• #refresh only does something if
#yourself is an attribute.
• #refreshAttributes warns if #yourself
is not an attribute.
• Number of rows does not (normally)
change.
Rows Can Apply Changes
row name: 'Fred'
row asForwarder name: 'Fred'

Requires #yourself as attribute.

Row change goes to both row
attributes and to object.

If you abort object changes then
attribute changes endure (until you
#refresh).
Chunked Replication
aCollection
by: chunkSize
view: #([replicationLevels] [attributeAlias @] attributeGetter ...)

chunkSize declares how many rows are replicated in each client traversal. A
chunk size of zero disables chunking.

• Best for information displayed to users.


• Can achieve amazing response time and efficiency.
• Client image only references one chunk of rows at a time.
• Good for attributes that GS is slow at gathering.
• Slower if client image immediately requests other chunks.
• Rows referenced in client maintain identity, and cache attribute
values from prior chunks.
• Be careful with sequenceable collections containing objects that
change position.
View-Attribute-Chunk Structure
name $received $sent $balance address
Adam Adam
$300 Adam
$200 Adam
$100 anAddress
Adam
Fred $600
Fred $550
Fred Fred
$50 anAddress
Fred
Chelsea Chelsea
$1002 Chelsea
$902 Chelsea
$100 Chelsea
anAddress
Susan Susan
$192 Susan
$192 Susan
$0 anAddress
Susan
Trish $4711
Trish $4711
Trish Trish
$0 anAddress
Trish
Rico $192 $192 $0 anAddress
Warren $5992 $4992 $1000 anAddress
Helter $1843 $1843 $0 anAddress
Haley $1119 $1119 $0 anAddress
Mattox $691 $691 $0 anAddress
Troy $422 $400 $22 anAddress
Edmund $1919 $1919 $0 anAddress
Tyrel $6114 $6114 $0 anAddress
Oliver $1201 $1201 $1201 anAddress
Bret $592 $500 $92 anAddress
Chunking – How it Works

1.A view that is created will pause after the first chunk of
rows is processed. A suspended GS process waits to
resume chunk gathering.
2.First chunk is replicated to client along with view.
3.Client automatically forks a process that asks GS to
resume chunk gathering.
4.Client is able to use the first chunk while GS operates in
parallel to gather remaining chunks.
5.When client asks for a row that is not replicated (or
cached) then it asks GS for the chunk for that row.
Client Client Server/Gem Server/Gem
Window Gathering Gathering Default
Process Process Process Process

Request
Gather chunk Create View
then wait
Fork Process
Fork Process
Show View

Next chunk Gather chunk


until complete then wait yield
Use active
chunk

Needs Chunk
different Gathered?
chunk Gather specific
chunk then wait
Activate
Use active chunk
chunk
Chunking – Replication Efficiency

• Client image transports one chunk of row


attributes at a time.
• Fewer objects are contained in the GBS
cache and the GS export set.
• Less memory is used.
• User experiences less of a delay.
• Faster when GS takes time gathering
attribute values AND client has some
delay in using all the atttribute values
Chunking – Feels Faster
Chunking provides a quick first response and processes
remaining items in the background. It is best for information
that is displayed.
| t1 t2 t3 delegate view | | t1 t2 t3 delegate view |
t1 := Time millisecondsToRun: [ t1 := Time millisecondsToRun: [
delegate := GBSM execute: ' delegate := GBSM execute: '
AllUsers by: 0 view: #( AllUsers by: 10 view: #(
userId userId
2 symbolListNames @ (symbolList 2 symbolListNames @ (symbolList
names) names)
0 yourself 0 yourself
)']. )'].
t2 := Time millisecondsToRun: t2 := Time millisecondsToRun:
[ view := delegate asLocalObjectToLevel: [ view := delegate asLocalObjectToLevel:
1 ]. 1 ].
t3 := Time millisecondsToRun: t3 := Time millisecondsToRun:
[ view inspect ]. [ view inspect ].
^Array with: t1 with: t2 with: t3 with: (t1 ^Array with: t1 with: t2 with: t3 with: (t1
+ t2 + t3) + t2 + t3)
-------------------------------------------- -------------------------------------------
9531 ms GS execution 174 ms GS execution
21 ms replication 4 ms replication
93 ms client processing 91 ms client processing
9645 ms total 269 ms total (97% faster)
Chunking – Can be Slower
Chunking uses more traversals. Chunking can be
slower when client immediately uses row attributes.
| t1 t2 t3 delegate view | | t1 t2 t3 delegate view |
t1 := Time millisecondsToRun: [ t1 := Time millisecondsToRun: [
delegate := GBSM execute: ' delegate := GBSM execute: '
AllUsers by: 0 view: #( AllUsers by: 10 view: #(
userId userId
2 symbolListNames @ (symbolList 2 symbolListNames @ (symbolList
names) names)
0 yourself 0 yourself
)']. )'].
t2 := Time millisecondsToRun: t2 := Time millisecondsToRun:
[ view := delegate asLocalObjectToLevel: [ view := delegate asLocalObjectToLevel:
1 ]. 1 ].
t3 := Time millisecondsToRun: t3 := Time millisecondsToRun:
[view collect: [:row | row [view collect: [:row | row
symbolListNames] ]. symbolListNames] ].
^Array with: t1 with: t2 with: t3 with: (t1 ^Array with: t1 with: t2 with: t3 with: (t1
+ t2 + t3) + t2 + t3)
-------------------------------------------- -------------------------------------------
9698 ms GS execution 219 ms GS execution
22 ms replication 4 ms replication
0 ms client processing 10031 ms client processing
9720 ms total 10254 ms total (6% slower)
Chunking – Not Always Good

• Client referenced only one chunk at a


time.
• Not good when client immediately
needs other chunks.
• Not good for rapid random row
access on client—like sorting.
• Has more traversal costs (if you use
all rows).
Views on Non-Sequenceable Collections

• Contents is copied to a sequenceable


collection as view is created.
• View retains no knowledge of the
original collection.
• The copying of contents to a
sequenceable collection is the
biggest performance hit that views
can experience (entirely due to disk
page reads).
Views on Sequenceable Collections

• Much faster than non-sequenceable collections.


• A view on a sequenceable collection avoids the
page read cost of copying contents.
• The collection you created the view for is
enumerated to gather attribute values.
• Do not create a chunked view of a sequenceable
collection that contains objects that change
position. Changes could affect background
chunk gathering.
• Workaround options:
– Avoid chunking for that view.
– Create a view for a copy of the collection.
– Send #gathering_waitForRemaining before
changing the collection.
Ideal when...
 Client needs to display some attributes of a large
collection in GS.
 Client does not attempt to sort the results.
 Chunk size is greater than the number of attributes but
less than 2035.
 Query is dynamic and not easily maintained with
replication specs or class connectors.
 Server has gathering cost and client can use some
results before all are available.
 Collection is sequenceable.
 A snapshot of attribute values is desired.
Future Directions
• Attribute change reporting.
• View sorting.
• Outer Joins.
*companies
• Attributes from functions.
balance $(money_received - money_sent)
• Persistent managed shared views.
A new kind multi-indexed RC collection
• Use of attribute relation grouping.
row group: #(marketType product)

Anda mungkin juga menyukai