Galaxy_TOC
Bigintreeindb_t1 Encoding Method
WARNING: This is an untested specification, i.e. it lacks implementations.
This document describes the core of the method, how to encode arbitrary precision integers (hereafter:
bigint-s) to a plain, relational, database in a format, that allows to execute queries
greater-than and
smaller-than
efficiently.
General scheme:
BigInts are placed to a binary tree. An example of the tree and the root-originating vertex
paths:
Bigint Value
as a String
|
The Number of columns that
list the X,0,1 equals with the
number of levels in the tree minus 1.
In the case of this example: 4 - 1 = 3 .
|
73
|
0
|
1
|
X
|
74
|
X
|
X
|
X
|
75
|
1
|
0
|
X
|
76
|
1
|
0
|
1
|
42
|
0
|
X
|
X
|
50
|
0
|
1
|
0
|
34
|
0
|
0
|
X
|
22
|
0
|
0
|
0
|
1984
|
1
|
X
|
X
|
|
|
If the
X were to be replaced with
0, then the path
encoding of 34 (0,0,
X) would not differ from 22 (0,0,0). On the other hand, if
prior to the replacement
X -> 0, the 0 were to be replaced
with a "digit" that is smaller than the 0, for example,
-1, then the 22 would have one
insignificant "negative digit" more than the 34 has.
22==(-1)(-1)(-1) < (-1)(-1)(0)==34
|
|
That encoding has 3 sequential integers as its digits:
(-1),0,1.
The traditional ternary system uses slightly different digits, the
0,1,2, but all the rest
of the contemplation stays the same.
{(-1),0,1} + 1 = {0,1,2}
73
|
0
|
2
|
1
|
74
|
1
|
1
|
1
|
75
|
2
|
0
|
1
|
76
|
2
|
0
|
2
|
42
|
0
|
1
|
1
|
50
|
0
|
2
|
0
|
34
|
0
|
0
|
1
|
22
|
0
|
0
|
0
|
1984
|
2
|
1
|
1
|
|
|
The order of the numbers that are formed from the series of digits that form the path matches that of the
bigints that have been inserted to the tree, regardless of the value of the bigint. As the traditional
relational database engines use decimal systems, the base 3 numbers can be converted to base 10 and stored as
classical C/C++
unsigned int values, which in turn can be queried by using the traditional
SQL.
As the number of levels in the tree, the maximum path length, grows, new C/C++ unsigned int columns can be
added without rewriting any of the previously written data, unless a tree balancing operation is performed.
All of the new
unsigned int columns must be filled with a default value of
1.