Anda di halaman 1dari 28

R TREES R-TREES

E0 261
J Jayant t Haritsa H it Computer Science and Automation Indian Institute of Science
JAN 2010 R-TREES Slide 1

Multidimensional Access Methods


n-D point data n-D volume data

PAM ( i t access method) (point th d) SAM (spatial access method)


JAN 2010 R-TREES Slide 2

Data Geometry to Index Mappings


1-D point index : B-tree n-D n D point index : Grid File n-D D spatial object index : R-tree R tree n
polygons, polylines

JAN 2010

R-TREES

Slide 3

Spatial p Applications pp
Computer Aided Design (CAD)

JAN 2010

p-wells p wells that are within 1 micron of a clock line Find Fi d all ll rivers i that th t go th through hK Karnataka t k (Line-Region intersection) Find all forests that lie within Karnataka (Region-Region Overlap) Find the ten nearest cities to Bangalore (Nearest Neighbor) Find all villages within 10 miles of a metropolis (Spatial Join)
R-TREES

Geographic applications (GIS)

Low L Dim

Slide 4

Spatial p Applications pp ( (contd) )


Multimedia Applications
feature vectors of images images, e e.g. g X X-rays rays High Dim point

Biological g Databases
Protein Folding Sequence S clustering l t i

JAN 2010

R-TREES

Slide 5

Solution Techniques q
Multi-dimensional SAM index
R-tree, R tree R R*-tree, tree R+-tree, tree P-tree, P tree s-k-d s k d tree tree, ...

or, map p everything y g to p points by y


Space-filling curves
Hilbert, Hilbert Peano Peano, Z Z, Gray Gray, [TIDS course]

Raise to higher-dimensional space


Rectangle is a point in 4-D space Problem: Spatial Operations become difficult

JAN 2010

R-TREES

Slide 6

R(egion)-Tree ( g )
Balanced (similar to B+ tree) I is an n-dimensional n dimensional rectangle of the form (I0, I1, ... , In-1) where Ii is a range [ b] [[a,b] [ ,] Leaf node index entries: (I, (I tuple tuple_id) id) Non-leaf node entry: (I, child_ptr) M is maximum entries per node. m M/2 is the minimum entries per node.
JAN 2010 R-TREES Slide 7

Invariants
Every leaf (non-leaf) has between m and M records ( (children) ) except p for the root. Root has at least two children unless it is a leaf. leaf For each leaf ( (non-leaf) ) entry, y, I is the smallest rectangle that contains the data objects (children) (children). All leaves appear at the same level.

JAN 2010

R-TREES

Slide 8

Example: Figure 3.1a 3 1a

JAN 2010

R-TREES

Slide 9

Example: Figure 3.1b 3 1b

JAN 2010

R-TREES

Slide 10

Differences with B-trees


Entries reflect containment, not < or > Keys Keys are not disjoint - they may themselves overlap No notion of adjacency between nodes Number of keys per node is typically small because size of each key is 2d values Permit m < M/2
this could be done in B-tree as well
JAN 2010 R-TREES Slide 11

Differences with B-trees (contd) ( )


Searching:
requires traversal of multiple paths need to compare with all entries in a node no right-pointers at leaf early termination possible Not storing real values but MBR approximations Only gives candidate candidate answers, answers not results
overlap between containers distinct from overlap between objects

AdjustTree during Insert has to go from leaf to root, even if not split case
JAN 2010 R-TREES Slide 12

Searching (Intersection)
Given search rectangle S S, find objects in DB that intersect S Given search rectangle S, find index records whose MBRs intersect S S. Start at root and locate all child nodes whose rectangle t l I intersects i t t S (via ( i linear li search). h) Search the subtrees of those child nodes.
Search strategy?


JAN 2010

When you get to the leaves, return entries whose rectangles intersect S. Search may y require q inspecting p g several p paths. Worst case running time is not so good ...
R-TREES Slide 13

S = R16 (Intersection) ( )

JAN 2010

R-TREES

Slide 15

Insertion

Insertion is done at the leaves Where to put new index entry E with rectangle R?
Start at root. Go down the tree by choosing child whose rectangle needs the least enlargement to include R. If there th is i room in i the th correct t leaf l f node, d insert i t it. Otherwise split the node. Adjust the tree.
R-TREES Slide 16


JAN 2010

Adjusting j g the tree


N=l leaf f node. d If th there was a split, lit then th NN is the other node. If N is root, stop. Otherwise P = Ns parent and EN is its entry for N. Adjust the rectangle for EN to tightly enclose N. If NN exists, add entry ENN to P. ENN points to NN and its rectangle tightly encloses NN. If necessary, necessary split P Set N=P and go to step 2.

JAN 2010

R-TREES

Slide 17

Deletion
Find entry to delete and remove from leaf L L. Set N=L and Q = . (Q is set of eliminated nodes) Let P be Ns parent and EN be the entry that points to N. If N has less than m entries, delete EN from P and add dd N t to Q Q. If N has at least m entries then set the rectangle of EN to tightly enclose N. Set N=P and repeat from step 3. *Reinsert entries from eliminated leaves. Insert nonleaf entries higher g up p so that all leaves are at the same level.
R-TREES Slide 18

JAN 2010

Why Reinsert?
Nodes can be merged with sibling whose area will increase the least least, or entries can be redistributed. In I any case, nodes d may need dt to be b split. lit p Reinsertion is easier to implement. Reinsertion refines the spatial structure of the tree reduces the effect of skew Entries to be reinserted are likely to be in memory because their pages are visited during the search to find the index to delete.
JAN 2010 R-TREES Slide 19

Splitting Nodes
Problem: Divide M+1 entries among two nodes so that it is unlikely that the nodes are needlessly examined during a search search. g Solution: Minimize total area of the covering rectangles for both nodes. Exponential algorithm. algorithm Quadratic algorithm. Linear time algorithm.

JAN 2010

R-TREES

Slide 21

Exhaustive Search
Try all possible combinations
M+1 1C * M M+1-m 1 m C * 2M M+1-2m 1 2m M m m Includes repetitions can you come up with correct formula?

Optimal results! Bad running g time!

JAN 2010

R-TREES

Slide 22

Quadratic Algorithm
Find p pair of entries E1 and E2 that maximizes area(J) - area(E1) - area(E2) where J is covering rectangle of E1 E2. (i.e. maximizes wasted area) Put E1 in one group, E2 in the other. If one group has M-m+1 M m+1 entries entries, put the remaining entries into the other group and stop. If all entries have been distributed then stop. For each entry E, calculate d1 and d2 where di is the area increase in covering rectangle of Group i when E is added. Find E with maximum |d1 - d2| and add E to the group whose area will increase the least. (i.e. maximum affinity) Repeat starting with step 3 3.
R-TREES Slide 23

JAN 2010

Quadratic (contd) ( )
Algorithm is quadratic in M. Linear in number of dimensions dimensions. But not optimal.

JAN 2010

R-TREES

Slide 24

Linear Algorithm
For each dimension, dimension choose the pair of rectangles that have the maximum distance between them (w (w.r.t. r t edges) Normalize by dividing the distance by the width of f entire ti set t along l that th t dimension. di i g normalized Put the two entries with largest separation (along any dimension) into different g p groups. Randomly choose next non-assigned point and then put it in group with lesser area increase. Algorithm is linear, almost no attempt at optimality optimality.
R-TREES Slide 25


JAN 2010

Performance Tests
CENTRAL circuit cell (1057 rectangles) Measure performance on last 10% inserts. inserts Search used randomly generated rectangles that match about 5% of the data. Delete every 10th data item.

JAN 2010

R-TREES

Slide 26

Performance e o a ce

JAN 2010

R-TREES

Slide 27

Conclusions
Linear time splitting algorithm is almost as good as the others others. Low node-fill requirement reduces spaceutilization but is not significantly worse than stricter node-fill node fill requirements requirements. R-tree can be added to relational databases.

JAN 2010

R-TREES

Slide 32

Q Questions
Why choose rectangle as bounding structure? Why not some other object, object for example, sphere ? Why isnt E always the best in Figures 4 4.4-4.6 4 4 6 (search performance)?

JAN 2010

R-TREES

Slide 33

END R-TREES R TREES


E0 261

JAN 2010

R-TREES

Slide 34

Anda mungkin juga menyukai