I.Kashirin

Decimal Hierarchical Numbers

The Algebra of Decimal Hierarchical Numbers

Briefly, the idea of hierarchical numbers is as follows. Let N be the set of integers, including 0 and negative elements.

N = {…, -3, -2, -1, 0, 1, 2, 3, …}

Let there also be a distinct symbol "." The set A = N ᴗ "." is defined as the alphabet with integers n ∈ N, "ᴗ" is the set union operation, "." is the level separator symbol, and "∈" is the set-theoretic relation of element membership in a set.

Then the grammar:

h → < n >, h → < n > . <h>

describes the set of binary hierarchical numbers H with elements h.

Examples of hierarchical numbers:

-42.0.1.0.4, 4, -12, 0.1.1.0.

Hierarchical numbers have a graphical interpretation, in which they correspond to the numerical indices of the vertices of trees with a single root vertex n ∈ N. The number of such trees is equivalent to the set N, since the root vertex of any such tree is vertex n, which does not have the symbol "." in its spelling.

Descent to vertex n1 one level down from vertex nx is performed by the binary operation "+" (nx + ny = nx . ny ), ascent from vertex n1 . n2 up is performed by the unary operation "--" (n1 . n2 -- = n1, 0-- = 0). Thus, any hierarchical number is either an integer or starts with an integer and contains integers in its structure, separated by periods.

Let's give some examples.

Let { n1 , n2 , n3, n4, nx } ⊂ N.

n1 . n2 + n3 . n4 = n1 . n2 . n3 . n4 , n1 . n2 . n3 -- = n1 . n2,

0.nx + 0 = 0.nx.0, 0.0 + 0 = 0.0.0, 0 + -4 = 0.-4,

nx -- = nx . 41-- = 41, -33 + 0 = -33.0, -33-- = -33.

More complex are the operations {º, ^, ||}.

The "º" operation calculates the common ancestor of two vertices, for example:

45.-1.0.0.12 º 45.-1.1.0.12 = 45.-1, 1.-2.14 º -5.3.3.3 = 0.

That is, if there is no common ancestor, the result of the operation is 0.

The "^" operation calculates a path from one node to another through a common ancestor. The hierarchical number identifying the common ancestor is omitted, for example:

45.-1.0.0.14 ^ 45.-1.1.0.12 = [(45.-1.0.0.14 → 45.-1.0.0 → 45.-1.0 → 45.-1) → (45.-1.1 → 45.-1.1.0 →

45.-1.1.0.12)] = 14.0.0.1.0.12,

where the "→" symbol denotes one step of the tree transition from node to node.

Finally, the unary operator "||" calculates the length of the hierarchical number, for example:

|14.0.0.1.0.12| = 6 .

Here, we utilize the property of the algebra of hierarchical numbers H, defined as an algebra encompassing the formal arithmetic of integers. Indeed, the algebra of integers (arithmetic) is a special case of the algebra H if we extend it to include the arithmetic operations of addition, subtraction, and multiplication. After considering the semantics of these operations, we can define the universal arithmetic algebra H of hierarchical numbers:

H = < H, Ω >, Ω = {+ (2), --(1), º(2) , ^(2), ||(1)},

where H is the carrier set, and Ω is the signature of the algebra, i.e., the set of operations. The operations {+, º , ^} are binary, while the operations {--, ||} are defined as unary.

The algebra considered can be supplemented to an algebraic system H = < H, Ω, R > by introducing a set of relations R = {< , >, ~, = }, where the relations "a > b" and " b < a " mean, respectively, "the number a is more complex than the number b" and "the number b is shorter than the number a". The symbols "~" (a~b) and "=" (a = b) denote, respectively, the relations "equality of the lengths of the numbers a and b" and complete coincidence of the numbers.

The Idea of Using Decimal Hierarchical Numbers

The use of hierarchical numbers in designing ontological taxonomies for LLM is based on the calculation of hierarchical embeddings. Semantic multidimensional spaces are defined by thousands of dimensions, each corresponding to a single vocabulary unit (lexeme). Each word in a natural language sentence is a vector in the semantic space, which can be written as a corresponding multidimensional tuple. To a first approximation, each element of such a tuple is a number calculated as the semantic proximity of the word to another word defining the corresponding dimension of the space. This representation makes it possible to calculate the primacy of each word in the sentence. The mechanism for this calculation is called "attention concentration".

At the same time, LLMs designed on this basis (GPT 5, RoBERTa-transformers 4.3.0, Claude 3.2, LLaMA 3.2, Yandex/YaLM-100B, Gemini 3, GigaChat, BrainBot, etc.) have significant drawbacks:

− Requirement for enormous computing resources;

− Inability to fully generate creative ideas;

− Lack of self-improvement mechanisms (meta-analysis);

− Lack of knowledge in highly specialized subject areas;

− Errors in responses to customer questions;

− Risk of compromising the functional integrity of neural network weights during model retraining;

− Lack of knowledge of modern operational information that has become available since the release of LLM.

The functionality of creativity and meta-analysis requires the use of additional mathematical formalisms that take into account the internal structure of knowledge, which includes the use of conscious ontological taxonomies. The paradox of the functional principles of LLM is that, in response to a user request, they can quite reliably generate individual fragments of such taxonomies and return the result as a response, but they cannot yet integrate them into the basic schemas of their internal organization.

Thus, the "generative LLM tower" problem can be formulated similarly to the "algorithmic language tower" problem: "to construct a higher-level internal knowledge structure using the generative capabilities of existing LLMs." An additional positive effect of such a high-level construction is the simplification of generative language models. This simplification not only enables the use of meta-level LLM architectures but also reduces the computational complexity of using RAG (Retrieval-Augmented Generation), which solves the problem of using operational information.

At this stage, the author of this article proposes using the mathematical formalism of decimal hierarchical numbers. This can be used to calculate hierarchical embeddings and significantly simplifies the calculation of the semantic similarity of vocabulary constructions at different levels in natural language.

Optimization of word vectorization processes in semantic space to obtain improved embeddings is achieved through the following factors.

Reducing Working Memory Usage

Hierarchical numbers are semantic indices of concepts and relations that make up the meaning of a natural language sentence (vocabulary construction, text, phrase). These indices have already been obtained in basic LLMs using the "Tower of Generative LLMs" principle. The indices reflect the position of a concept or relation in genus-specific, meronymic, cause-and-effect, and other semantic taxonomies. As a result, the vectors of semantic space (embeddings) become significantly more compact.

For example, "attack (1.25.3.1)" → "attack (1.25.3.1.1)" → "bombardment (1.25.3.1.1.1), "shooting (1.25.3.1.1.2)" correspond to concepts linked by the transitive generic relation "→" and, therefore, have all the semantic connections belonging to the concept of "attack." It is easy to see that the hierarchical indices corresponding to the concepts clearly reflect this quality.

The same can be observed in the causal taxonomy of events: "war (2.12)" => "attack (2.25.3.1.1)" => "murder (2.25.3.1.1.3)".

This demonstrates that hierarchical numbers allow for a reduction in semantic space by dividing concept embeddings into levels with significantly fewer dimensions than existing LLMs. This makes it possible to use smaller vectors or even parts of them.

Computation Speedup

Fewer numbers in vectors make matrix multiplications and similarity calculations significantly less computationally complex.

Regularization/Interpretability

Explicitly incorporating hierarchy information can help the model generalize better and make more predictable inferences, especially if the task requires understanding of ancestral relationships.

Calculating Semantic Similarity Based on Decimal Hierarchical Numbers

The key property of decimal hierarchical numbers is that their special case (subalgebra) is the formal arithmetic of integers. To effectively utilize this property for calculating semantic similarity, the algebra of decimal hierarchical numbers should be extended as follows.

The signature of Ω must be supplemented with the arithmetic operations of addition "⊕", subtraction "⊖", multiplication "⊗", and integer division "/":

ΩA = {⊕(2), ⊖(2), ⊗(2), /(2)}, Ω0 = {{+ (2), --(1), º(2) , ^(2), | |(1) },

Ω = ΩA ∪ Ω0 ={+ (2), --(1), º(2) , ^(2), | |(1), ⊕(2), ⊖(2), ⊗(2), /(2)},

where "∪" is the set-theoretic operation of union of sets.

In fact, when supplementing the signature of the previously discussed algebra with the formal arithmetic ΩA, the algebra Ω had to be split into two subsets.

The algebraic semantics of arithmetic operations in ΩA is described by simple algorithms.

a ⊕ b = b ⊕ a, let a contain n digits (integers separated by periods in the hierarchical number) and b contain m digits. Let m also be greater than or equal to n. Then the result of the operation is the bitwise addition of the first n digits of the hierarchical numbers a and b. The remaining m-n digits are copied from the corresponding digits of the larger number.

For example, for a = 12.-1.0.2.4, b = 3.5.-1.0.0.115, the following equalities hold:

12.-1.0.2.4 ⊕ 3.5.-1.0.0.115 = 3.5.-1.0.0.115 ⊕ 12.-1.0.2.4 = 15.4.-1.2.4.115

The meaning of other arithmetic operations is completely analogous, with the only difference being that the corresponding digits of the hierarchical argument numbers are subject to the corresponding arithmetic operations of subtraction, multiplication, and integer division (an example of integer division in classical arithmetic: 3/5 = 0; 5/2 = 2).

In the definitions provided, the method for calculating semantic similarity for natural language vocabulary constructions at different levels is as follows.

Let's assume two dictionary constructions, a and b. These can be texts, sentences, phrases, or individual word forms. Each construction consists of a sequence of words (word forms):

a = a1 a2 a3 … ai… an ; b = b1 b2 b3 … bj … bm .

Each word { a1 a2 a3 … ai… an b1 b2 b3 … bj … bm } can be represented in the genus-species taxonomy of its part of speech or lexical category by a decimal hierarchical number. The first digit of the number is the index of the lexical category.

Examples

<adjective> => <cruel> => <crushing> => <deadly>

3 3.5 3.5.1 3.5.1.0

<adjective> => <colored> => <black and white> => <black>

3 3.12 3.12.0 3.12.0.0

<adjective> => <colored> =><colored_> => <green>

3 3.12 3.12.1 3.12.1.1

<verb> => <act> => <attack> => <attack> => <bomb>

1 1.3 1.3.7 1.3.7.0 1.3.7.0.0

For lexical units that do not have a clearly defined genus-species taxonomy, hierarchical numbers with a small number of digits can be used.

Examples

<question> => [qualitative] => <how>

8 8.1 8.1.0

<question> => [objective] => <who>

8 8.2 8.2.0

<question> => [objective] => <what>

8 8.2 8.2.1

For complex verbal constructions, a cause-and-effect taxonomy can be encoded.

Examples

<event>=><conflict>=><attack>=>

10 10.5 10.5.1

<get victims>=><suffer defeat>

10.5.1.1 10.5.1.1.4

<event>=><conflict>=><attack>=>

10 10.5 10.5.1

<get victims>=><receive resistance>

10.5.1.1 10.5.1.1.2

In software implementation, complex vocabulary constructions are identified using a pattern mechanism or complex regular expressions. In this case, an additional hierarchical index can be assigned only to the verb of the vocabulary construction.

Supplementary indices are hierarchical numbers assigned to elements of dictionary constructions (words), in addition to existing indices, such as genus-specific indices.

Thus, each element { a1 a2 a3 … ai… an b1 b2 b3 … bj … bm } of the input dictionary constructions a and b corresponds to a list of its own hierarchical indices I, J. However, very often this is a single-element list. This fact can be expressed formally as follows.

{ a1 (i11, i12, i13,…) a2 (i21, i22, i23, …) … ai… an(in1, in2, in3, …) b1 (j11, j12…) … bm (bm1, bm2, …)}

Next, the essence of calculating the semantic similarity of a and b consists of pairwise comparison of all indices of all words in a with all indices of all words in b, i.e., each with each. After this comparison, for each matched pair of words, their greatest common hierarchical number is selected, for which its length (number of digits) is calculated. This single-digit number, by the definition of decimal hierarchical numbers, also belongs to hierarchical numbers.

Let's introduce the operations ind(ax) and ind(by), which have the following meaning.

ind(ax) = ind(ax (ix1 ,ix2,…, ix-end)) = { ix1 ,ix2,…, ix-end }

ind(by) = ind(by(jy1, jy2,…, jy-end)) = { jy1, jy2,…, jy-end }

The set of all pairs of indices for the words ax and by of the constructions a and b is calculated by the following expression.

ix-end, jy-end

D(ax, by) = ind(ax) ⤫ ind(by) = ∪ (ix ,jj) ,

ix = ix1, jy =jy1

where "⤫" is the Cartesian product of sets, "∪" is the union of sets.

Next, we introduce the function O:

O: (ix ,jj) → º (ix ,jj), Max(|O(D(ax, by))|), where "O" is the function that calculates the common part of the hierarchical indices for each pair of indices from the set of pairs D(ax, by). Its basis is the "º" operation from the algebra of hierarchical numbers (for one pair).

The function O returns the set of common parts of pairs of hierarchical numbers D(ax, by), i.e., a set of indices, but not pairs. These indices are the results of matching all the hierarchical indices of the word ax with the word by for the dictionary constructions a and b.

The unary operator "| |" of hierarchical number algebra transforms the set of indices calculated by the function O into single-digit hierarchical numbers, which are the same length as the number of digits of the indices.

Finally, the Max function selects the largest number from the entire set of numbers after the "| |" operation.

For simplicity, we denote the expression Max(|O(D(ax, by))|) as L(ax, by):

L(ax, by) = Max(|O(D(ax, by))|) .

In other words, L(ax, by) is a function that calculates the length of the maximum common index for all pairs of word indices ax and by . The larger this value, the closer in meaning the matched words are.

To calculate the semantic similarity S of two dictionary constructions a and b as a whole, the following expression is used.

n m

∑∑ 2 * L(a_i, b_j) / (| a_i | + |b_j|)

i=1 j=1

S = 2 * ─────────────────────── .

|a| + |b|

This expression has been used previously for simpler calculations, in which words in dictionary constructions used a single binary hierarchical index. In this paper, this case is extended to polymorphic (multiple) taxonomies indexed by decimal hierarchical numbers. In the software implementation of the considered method, the softmax decimal number normalization function can be used to calculate semantic similarity after the O(D(ax, by) set calculation step.