This set of software implements the rectangular object manipulation algorithm
described in the paper "Object Manipulation for Document Conversion" (Romero
and Thibadeau, Proceedings of ICIP-95, or see 
http://www.cs.cmu.edu/afs/cs.cmu.edu/project/pcvision/RATlib/www/ratlib.html)
This software is being distributed under the GNU General Public License.


Overview
========
The routines given here implement a general method of accessing objects based
on 2-dimensional rectangles. Rectangles can be represented in two equivalent
ways. For algorithmic purposes, it is useful to visualize the rectangle using
the center point (x, y) and the half-width and half-height (w_2, h_2). This is
what is used in the paper referenced above.

However, because of integer round-off, it is easier from a programmers
perspective to use instead the top-left and bottom-right corners of the
rectangle, (x_1, y_1) and (x_2, y_2), where x_2 >= x_1 and y_2 >= y_1. Then,
you can see that

	2 * (x, y) = (x_1 + x_2, y_1 + y_2)
	2 * (w_2, h_2) = (x_2 - x_1, y_2 - y_1)

I suggest using the corners of the rectangle myself. Then, a rectangle is the
4-tuple (x_1 + x_2, y_1 + y_2, x_2 - x_1, y_2 - y_1). It's not too hard to go
back and forth between the different representations, and this one won't have
any aliasing or round-off problems.

Given all that, this code will give you an interface to dynamically insert and
delete rectangular objects into a tree structure. In addition, it will allow
you to perform queries asking for rectangular regions that intersect or are
contained within the query region.

Objects are all represented as (void *)'s, so for portability sake, try not
to shove anything other than a pointer into that (void *).  (you're probably
safe as long as the object is under 32 bits, but wierd things may happen if
you try to later port to a 64 bit machine.) The protocol goes something like
this:

#include "ratlib.h"

func() {
	RATTree rTree;
	RATRegion region;
	RATRegionList regionList;
	int32 x1, y1, x2, y2;
	void *object;

	rTree = RATCreateTree();

	done = NewObject(&object, &x1, &y1, &x2, &y2);
	while (!done) {
	   RATInsert(rTree, x1 + x2, y1 + y2, x2 - x1, y2 - y1, object);
	   done = NewObject(&object, &x1, &y1, &x2, &y2);
	   
	}

	done = NewQuery(&x1, &y1, &x2, &y2);
	while (!done) {
	   regionList = RATContainedWithin(rTree, x1 + x2, y1 + y2, x2 - x1,
					   y2 - y1);
	   while(regionList) {
		DoSomething(RATListFirst(regionList));
		regionList = RATListDestructiveRest(regionList);
	   }

	   done = NewQuery(&x1, &y1, &x2, &y2);
	}

	RATDestroyTree(rTree);

}



The Structures
==============

RATTree		A pointer to a tree object, internals aren't needed, but here
		are a couple of useful ones:
	tree->points	The number of distinct rectangles currently in the tree
	tree->inserts	Number of inserts since the last tree rebuild
	tree->deletes	Number of deletes since the last tree rebuild


typedef int32 RATVector[4];
	An array of 4 32bit integers, used to represent the internal structure
	for the rectangles.  Note that this vector is **not** the same vector
	as the 4-tuple that is passed in to these routines.


typedef struct _rat_obj_list_ {
	void *object;
	struct _rat_obj_list_ *next;
} RATObjList_t, *RATObjList;
	This structure is used to store lists of objects. I decided that the
	easiest way of dealing with duplicate rectangles in the same tree was
	to not deal with them at all. If you have objects that occupy the
	exact same rectangle, they are just bucketed together and stored in
	a linked-list of objects. This structure is the linked list used.


typedef struct {
	RATVector location;
	RATObjList_t obj_list;
} *RATRegion;
	This is the basic building block structure for regions. It contains
	one vector for the location (again, the actual numbers are not what
	was originally passed in,) and the first element of the linked list
	of objects. Note that the space for the first object is included in
	the RATRegion. (ie, it is not a pointer to a RATObjList_t, it
	is the actual first element.) This is done since no rectangle should
	ever really be in the tree without an object.

	RATRegion's are generally the structure which is used to return
	the results of point queries.


typedef struct _rat_region_list_ {
	RATRegion region;
	struct _rat_region_list_ *next;
} RATRegionList_t, *RATRegionList;
	This is simply a linked list of RATRegion's. These are returned from
	intersection and containment queries.



typedef struct RATQualifiedOperation {
	int (*qualifier)(void *qualification, void *obj);
	void *qualification;
	int (*operation)(void *operand, void *obj);
	void *operand;
} RATQualifiedOperation;
	This looks like one of the stranger ones here. It is a structure that
	is used by "Tree Walkers." For instance, suppose you needed to do
	a linear search on the entire tree that had nothing to do with their
	associated rectangles. You would build a RATQualifiedOperation
	structure to do this as follows:
		Create a function F1 that takes two (void *) arguments.
		F1's purpose is to tell the tree walker whether or not this
		object "qualifies" for the operation to be performed.
		Argument 1, the qualification, is an arbitrary (void *) that
		the programmer places into the structure. Use this for
		whatever you need, and if you don't need it, you can just
		put in a NULL.
		Argument 2, the object, is the (void *) that was originally
		passed in to RATInsert(t,x,y,w,h,object).
		If F1 returns non-zero, the function F2 is called. F2 also has
		two (void *) arguments, the first of which is the operand,
		which is arbitrarily set by the programmer. The second is
		again the (void *) object itself.
	If you don't think you need this structure to begin with, you are
	probably right. When rectangles somehow are no longer enough, you
	should probably take a look at this structure and the related
	functions.

		

The Macros
==========

RATComputeTransform(RATVector vec, int32 x, int32 y, int32 w, int32 h)
	This macro stores a transformed vector into vec, based on the given
	arguments.

RATComputeTempTransform(int32 x, int32 y, int32 w, int32 h)
	This macro stores a transformed vector into a global temporary 
	vector, based on the given arguments. The global temporary variable
	is RATGlobalTmpRegion_t.


RATObjectList RATGetRegionObjList(RATRegion region)
	This returns the object list for a given region.


RATRegionList RATListFirst(RATRegionList regionList)
RATRegionList RATListRest(RATRegionList regionList)
RATRegionList RATListObject(RATRegionList regionList)
	These are used to access the various parts of the RATRegionList
	structure.


RATRegionInvert(RATRegion region, int32 x, int32 y, int32 w, int32 h)
	This macro will compute an inverse transformation on a region
	that was returned as part of a query. It fills in the values
	x, y, w, and h based on the values found in the RATRegion structure.



The following functions can be specified as inline macros or as function
calls at compile time. If you want to change them, change the line in ratlib.h
that reads
	#define RAT_NO_INLINE 1
When it's defined to 1, then actual function calls are used. When defined as 0,
inline code is generated via macros. The macros make use of the temporary
global region RATGlobalTmpRegion_t.

RATRegion RATNearestMatch();
void      RATInsert();
int       RATDelete();



The Routines
============

RATTree RATCreateTree(void)
void    RATDestroyTree(RATTree tree)
	These two functions are used to create and delete the trees.
	Note that RATDestroyTree does nothing to free the space pointed
	to by the (void *) object's which were passed in as arguments during
	tree insertions.

void RATInsert(RATTree tree, int32 x, int32 y, int32 w, int32 h, void *object)
	Given the rectangle described by the 4-tuple (x, y, w, h), insert
	the (void *) object into the tree structure.
	As mentioned above, there are two useful rectangle representations.
	The "native" algorithm implementation is one where (x, y) is the
	center of the rectangle, and (w, h) are the width and height from
	the center of the rectangle out to an edge. (Or, the half-width and
	half-height.)
	Most applications probably don't represent rectangles in that fashion.
	The most common is probably either a two-corner representation,
	(x1, y1, x2, y2), where x1 <= x2 and y1 <= y2, or a top-left corner
	and full width/height (x, y, w, h). To use the first representation,
	call insert using
		RATInsert(tree, x1 + x2, y1 + y2, x2 - x1, y2 - y1, object);
	To use the second form, call insert using
		RATInsert(tree, x1 << 1, y1 << 1, w, h, object);
	

int RATDelete(RATTree tree, int32 x, int32 y, int32 w, int32 h,
              RATObjList objs)
	This does just the opposite of insertion, deleting the entire region
	if necessary, and possibly rebalancing the tree. There's two ways to
	call this function. First, if (objs == NULL), then RATDelete will
	delete all of the objects that happen to be in that region. If the
	region you passed in does not match one of the stored regions exactly,
	then nothing is deleted.
	In the second form, if you have more than one object in a particular
	region, and would only like to delete some of them, you can instruct
	the deletion to only be applied to the objects in the passed in
	RATObjList objs. The test for equality to see whether two objects
	are the same is an equality check on the (void *), not the contents
	of whatever is being pointed to. I know this is a bit of a pain, but
	in practice, it's not that big of a deal. If the list of objects to
	delete you passed in consisted of the entire set of objects at that
	region, the actual region is deleted from the tree. Otherwise, it
	is left in to keep around the other objects.


RATRegion RATNearestMatch(RATTree tree, int32 x, int32 y, int32 w, int32 h);
	This is a *dangerous* function to use unless you are using it to
	find exact matches. The "nearest" part of the name is very misleading.
	It does not really find the "nearest", as nearest in this context has
	multiple definitions. Suffice it to say, if an exact match exists, it
	will return that region. If an exact match doesn't exist, it will
	return one near where it would have been in the tree structure, but
	that in no way implies that they will be similar regions.


RATRegionList RATContainedWithin(RATTree tree, int32 x, int32 y, int32 w,
                                 int32 h);
	Return a list of regions/rectangles which are completely contained
	within the query region/rectangle.


RATRegionList RATIntersectsWith(RATTree tree, int32 x, int32 y, int32 w,
                                int32 h);
	Return a list of regions/rectangles which intersect in any way the
	query region/rectangle. Note that this query returns a super-set of
	the regions returned for the equivalent containment query.


RATRegionList RATListDestructiveRest(RATRegionList list);
	This is used to cycle through the various components returned by the
	intersection and containment query operators. The code generally looks
	something like this:
		
		regionList = RATIntersectsWith(tree, x, y, w, h);
		while(regionList) {
			OperateOnRegion(RATListFirst(regionList));
			regionList = RATListDestructiveRest(regionList);
		}

	This code will pass to OperateOnRegion each of the regions returned by
	the query, (ie "void OperateOnRegion(RATRegion region);" should be the
	prototype for this.) In turn, RATListDestructiveRest frees up the
	space that was allocated for the list elements, one at a time.




Efficiency and Errata
=====================

	The reason that these functions	are also provided as inlines is
	simply to allow a re-munging of whatever form your rectangle may be.
	For each routine that takes in an (x, y, w, h), it computes a
	transform on that before shoving it into the data structure. By using
	the inline versions, you can allow the compiler to re-order both
	your transform and the data structure's transform. In other words,
	if you use the two-corners method of storing rectangles, then you
	will be passing all of these routines something like

		(x1 + x2, y1 + y2, x2 - x1, y2 - y1)

	The internal transform done is:

		(w + x, h + y, w - x, h - y)

	Which can be simplified into simply:

		(2 * x2, 2 * y2, -2 * x1, -2 * y1)

	It's a minor point, but something to be aware of.


	Also, I use Splay trees as my dynamic sorted tree structure. I'm not
	sure if they are in vogue now or not, but Danny Sleator is a smart
	guy and I doubt anybody has come up with anything significantly
	better in practice. (I've found it to be one of those severely
	flexible data structures that is almost impossible to go wrong
	with if you are in need of a dynamically modifiable sorted tree.)
	Anyways, that code should be pretty solid--I've banged on it for
	a couple of years now--and should probably be documented as well.
	It's not that it's complex, but it is useful.


	Finally, in a lot of my code, I don't actually free the data structures
	that I allocate. Instead, I stick them onto a local free list, and
	re-use them in later creations. I do this in both the splay tree code
	and in the divided kd-tree code. Also, when allocating these objects,
	I do it in hunks of like 100 at a time, throwing the unused 99 onto
	the free list. Hopefully, this is a good thing for you, as you don't
	have to worry so much about the efficiency of your particular malloc,
	how much memory is fragmenting, etc. However, I guess I could see the
	rare individual who needs to be able to give back to the OS that
	memory. If you are that individual, and you happen to modify my code
	to be both portable, non-malloc dependent, and still fast, I'd
	appreciate the patches.



Compiling
=========

	I've managed to get this to compile on just about every Unix box I've
	tried, as long as a good ANSI compiler was available. That includes
	Ultrix, OSF 1.2 (from DEC), Digital Unix 3.2, SunOS 4.1.3,
	and Solaris 2.4 or 2.5, I don't remember. I have no reason to believe
	that it wouldn't compile on an HP or a RS/6000, but let me know if it
	complains.

	In general, look at make.conf, choose your compiler and compiler
	options, then type:

		make clean depend all

	To create and run a test program, type

		make testtrees
		./testtrees 20000 2 1000

	This will create 20000 random vectors, insert them into the tree,
	force the tree to rebuild itself twice, then delete 1000 of those
	vectors from the tree. (On a Sparc 5/85, this takes about 25 seconds.)

	As far as I know, I've gotten all of the memory leaks out of my code.
	I don't guarantee that I've fully tested all of these functions, and
	there may be some bugs lurking around. (Especially in the cases where
	you have more than one object occupying the exact same region.)
	Any questions can be mailed to:

		rickr+@cmu.edu


Copyrights
==========
dkdtree.c
dkdtree.h
ratlib.h
ratlib.c
testtrees.c
	All of these programs are Copyright (C) 1995-1996 Rick Romero and
	Carnegie Mellon University.

splaymin.c
splaylib.h
	These programs are Copyright (C) 1995-1996 Rick Romero.

	All of these programs are free software; you can redistribute it and/or
	modify it under the terms of the GNU General Public License as
	published by the Free Software Foundation; either version 1, or (at
	your option) any later version.

	This program is distributed in the hope that it will be useful,
	but WITHOUT ANY WARRANTY; without even the implied warranty of
	MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
	GNU General Public License for more details.
     



