# Trie Tree Implementation

Hello people…! In this post we will implement an amazing data structure, the Trie Tree. Trie Trees are are used to search for all occurrences of a word in a given text very quickly. To be precise, if the length of the word is “L“, the trie tree searches for all occurrences of this data structure in O(L) time, which is very very fast in comparison to many pattern matching algorithms.

But I must mention, this data structure is not exactly used for “pattern matching”, it is used to search for the occurrences of the word in the given text. How these both functionalities differ…? We’ll get to know that shortly. The Trie Tree has many applications. Your browser could be using a trie tree internally to search for words when you press Ctrl + F. So, let’s get started with this data structure…!

The Trie Tree is a very straight forward data structure. It is a simple tree where the nodes have an alphabet associated with them. And the way these nodes are arranged is the best part of the Trie Tree. To get an idea take a close look at the sketch below –

Structure of Trie Tree

The arrows in the sketch above indicate how to traverse the trie tree in order to tell if a word exists or not. We travel through the root node, down the tree until we hit a leaf. Picking up a character at every edge, we construct our word. Now, you can easily tell why the time to search for a node in the text will be in the order of length of the word to be searched. It’s obvious…! One would have to go down till the leaf. In the worst case, one would have to travel the height of the tree which would be the length of the longest word in the whole text..! 😉

So, as we got a little idea about working with a Trie Tree, let us move on to the serious part. Let us talk about how this is implemented and then we will talk about three fundamental operations done on the Trie Tree –

• Insert
• Delete
• Search

In C, the Trie Tree is implemented using structures. But the C implementation of this data structure gets a little dirty. So we will switch to C++ but we will keep it as C-ish as possible. As we keep discussing about the implementation, you will notice how many advantages we have when we use C++. This will be our structure of each node in the tree –

```struct node
{
struct node * parent;
struct node * children[ALPHABETS];
vector<int> occurrences;
};
```
• parent – This points to the node which is the parent of the current node in the tree hierarchy. It may seem useless at the first, but we need this to travel up the tree when we want to delete any word. We’ll get to the deletion operation in a moment.
• children – It points to the children nodes. It is made an array so that we can have O(1) accessing time. But why is the size 26…? Its simple. Consider a word, “th”, now, what could the third letter possibly be, if it had one…? One among the 26 english alphabets…! So that’s why the size is made 26. If you want to make a trie tree for another language, replace the number 26 with the number of letters in your language.
• occurrences – This will store the starting indices of the word occurring in the given text. Now, why this is made a vector is that, vector is as good as a Linked List with random access. It is one of the most handy ready made data structures available in the C++ STL Library. If you are not familiar with vectors this is a good place to start.
If this were C, we would have to give a fixed array size, and we would have a problem if the occurrences of a particular node are more. We could avoid this by putting a Linked List there. But we sacrifice random access and a whole lot of operations get time taking. Moreover, the code will get really really cumbersome to manage if you have a tree and a linked list.

Having got a picture of the implementation, let us look at how the operations are done in a Trie Tree.

### Insert Operation

When we do an insert operation, there are a few cases –

1. The word to be inserted does not exist.
2. The word to be inserted already exists in the tree.
3. The word to be inserted does not exists, but as the suffix of a word.

The first case is simple. One would have to traverse till the alphabets of the words have nodes in the trie tree or else create new nodes one-after-the-other. And at the end of the word, i.e., the node for the last alphabet, we will mark it as a leaf and push the starting index into the vector indicating the occurrence of the newly inserted word.

During this course of traversal, we will be cutting off the string of the word we have one-by-one as they are processed. This is done by putting using a vector of characters and popping off one character after the other. This is less code to handle and more efficient as we can use a vector as a queue. This is another advantage of using C++.

After having learnt what to do with the first case, you can guess what we would have to do in the second case. We simply have to add a new value to the occurrences vector at the node corresponding to the last alphabet of the word. We can also know the number of occurrences in constant time, we simply return the size of the vector. This is another advantage of using C++.

To understand the challenge in the third case, let’s take a simple example. What would you do with your trie tree if you wanted to insert the word “face” if the word “facebook” is already there in your tree…? This is the third case. The answer to this is the occurrence vector itself. We simply push the starting index of the word into the vector of that node which corresponds to the last alphabet of the word to be inserted, in the above example, this would be the node of “e”. So, what really tells if there’s a word ending with the alphabet corresponding to the current node is the size of the vector.

So I hope you understand how important our vector is. A lot depends on it…!

### Delete Operation

The deletion of a word in the trie tree is similar to the insertion, we have a few cases –

• Word exists as a stand-alone word.
• Word exists as a prefix of another word.

If the word is not there at all, we don’t have to do anything. We just have to make sure that we don’t mess up the existing data structure…!

The second case is a little tricky. We would have to delete the word bottom-up. That is, we will delete that part of the word which is not a part of any other word. For example, consider the sketch above. If we were to delete “this”, we would delete the letters ‘i’ and ‘s’ as, ‘h’ is a part of another word. This keeps us away from distorting the data structure. If the word were existing multiple number of times we will simply remove the occurrence from the vector of the concerned node.

In the third case too, we will simply delete the occurrence of the word from the vector. We needn’t write a lot of code as we can use the functions in algorithm header file. This is another advantage of using C++.

Note – When we delete the occurrence of a word, we are not concerned about the validity of the indices stored as occurrences of other words. What I mean to say is, suppose we have 10 words. If we delete the 3rd word, the 5th word or the 9th word is supposed to become the 4rth and the 8th word as far as the original text is concerned. But we will not consider this. The data stored in the trie tree is not meant to me deleted or inserted. The Trie Tree is meant for processing the given text not to manipulate the given text.

### Search Operation

The search operation is simple and is as we discussed when we began our discussion about the Trie Tree. We go down the tree traversing the nodes and keep “picking up characters” as we go. And the occurrences vector tells us if a word exists that ends with the alphabet associated with the current node, and if so, it gives us the indices of occurrences and also the number of occurrences.

Besides these basic operations, there is another very interesting operation that is done with the Trie Tree –

• Lexicographical Sort – If we want to print all the words processed into the trie tree lexicographically, all we have to do is do a Preorder Walk on the tree. This will automatically print all the words in the lexicographical order or the dictionary order. This is due to the very structure and arrangement of nodes in a Trie Tree. Now, I’d like to share another interesting thing about pre-order walk in trees… The Pre-order walk works exactly as a Depth First Search (DFS) in graphs, i.e., the sequence in which both the algorithms visit the nodes is the same. Think about this for a while and word out the two algorithms on an example (you could take mine in the sketch above), and you can see why it is so. You will also notice why the printed words would be lexicographically sorted.

Now, having learned a lot about the trie tree, try coding it in C++. If you are uneasy with C++, you can try it in C, but make sure you try at least 3 times. Trying is very important. I don’t know if you are new to reading my posts, but I insist a lot on trying in every post of mine…! If you have succeeded, you’re fabulous…! If not, check out my code below any try figuring out how just close you were…!!

```/* ==========  ========== ========== ========= */
//          Trie Tree Data Structure           //
//                using C++ STL                //
//                                             //
//         Functions follow Pascal Case        //
//           Convention and Variables      	   //
//         follow Camel Case Convention        //
//                                             //
//            Author - Vamsi Sangam            //
//            Theory of Programming            //
/* ========== ========== ========== ========== */

#include <cstdio>
#include <cstdlib>
#include <vector>

#define ALPHABETS 26
#define CASE 'a'
#define MAX_WORD_SIZE 25

using namespace std;

struct Node
{
struct Node * parent;
struct Node * children[ALPHABETS];
vector<int> occurrences;
};

// Inserts a word 'text' into the Trie Tree
// 'trieTree' and marks it's occurence as 'index'.
void InsertWord(struct Node * trieTree, char * word, int index)
{
struct Node * traverse = trieTree;

while (*word != '\0') {     // Until there is something to process
if (traverse->children[*word - CASE] == NULL) {
// There is no node in 'trieTree' corresponding to this alphabet

// Allocate using calloc(), so that components are initialised
traverse->children[*word - CASE] = (struct Node *) calloc(1, sizeof(struct Node));
traverse->children[*word - CASE]->parent = traverse;  // Assigning parent
}

traverse = traverse->children[*word - CASE];
++word; // The next alphabet
}

traverse->occurrences.push_back(index);      // Mark the occurence of the word
}

// Searches for the occurence of a word in 'trieTree',
// if found, returns poniter pointing to the
// last node of the word in the 'trieTree'
// Complexity -> O(length_of_word_to_be_searched)
struct Node * SearchWord(struct Node * treeNode, char * word)
{
// Function is very similar to insert() function
while (*word != '\0') {
if (treeNode->children[*word - CASE] != NULL) {
treeNode = treeNode->children[*word - CASE];
++word;
} else {
break;
}
}

if (*word == '\0' && treeNode->occurrences.size() != 0) {
// Word found
return treeNode;
} else {
return NULL;
}
}

// if found, deletes the nodes corresponding to the word
void RemoveWord(struct Node * trieTree, char * word)
{
struct Node * trieNode = SearchWord(trieTree, word);

if (trieNode == NULL) {
return;
}

trieNode->occurrences.pop_back();    // Deleting the occurence

// 'noChild' indicates if the node is a leaf node
bool noChild = true;

int childCount = 0;
// 'childCount' has the number of children the current node
// has which actually tells us if the node is associated with
// another word .This will happen if 'childCount' != 0.
int i;

// Checking children of current node
for (i = 0; i < ALPHABETS; ++i) {
if (trieNode->children[i] != NULL) {
noChild = false;
++childCount;
}
}

if (!noChild) {
// The node has children, which means that the word whose
// occurrence was just removed is a Suffix-Word
// So, logically no more nodes have to be deleted
return;
}

struct Node * parentNode;     // variable to assist in traversal

while (trieNode->occurrences.size() == 0 && trieNode->parent != NULL && childCount == 0) {
// trieNode->occurrences.size() -> tells if the node is associated with another word
//
// trieNode->parent != NULL -> is the base case sort-of condition, we simply ran
// out of nodes to be deleted, as we reached the root
//
// childCount -> does the same thing as explained in the beginning, to every node
// we reach

parentNode = trieNode->parent;

for (i = 0; i < ALPHABETS; ++i) {
if (parentNode->children[i] != NULL) {
++childCount;

if (trieNode == parentNode->children[i]) {
parentNode->children[i] = NULL;
free(trieNode);
trieNode = parentNode;
}
}
}
}
}

// Prints the 'trieTree' in a Pre-Order or a DFS manner
// which automatically results in a Lexicographical Order
void LexicographicalPrint(struct Node * trieTree, vector<char> word)
{
int i;
bool noChild = true;

if (trieTree->occurrences.size() != 0) {
// Condition trie_tree->occurrences.size() != 0,
// is a neccessary and sufficient condition to
// tell if a node is associated with a word or not

vector<char>::iterator charItr = word.begin();

while (charItr != word.end()) {
printf("%c", *charItr);
++charItr;
}
printf(" -> @ index -> ");

vector<int>::iterator counter = trieTree->occurrences.begin();
// This is to print the occurences of the word

while (counter != trieTree->occurrences.end()) {
printf("%d, ", *counter);
++counter;
}

printf("\n");
}

for (i = 0; i < ALPHABETS; ++i) {
if (trieTree->children[i] != NULL) {
noChild = false;
word.push_back(CASE + i);   // Select a child

// and explore everything associated with the cild
LexicographicalPrint(trieTree->children[i], word);
word.pop_back();
// Remove the alphabet as we dealt
// everything associated with it
}
}

word.pop_back();
}

int main()
{
int n, i;
vector<char> printUtil;		// Utility variable to print tree

// Creating the Trie Tree using calloc
// so that the components are initialised
struct Node * trieTree = (struct Node *) calloc(1, sizeof(struct Node));
char word[MAX_WORD_SIZE];

printf("Enter the number of words-\n");
scanf("%d", &n);

for (i = 1; i <= n; ++i) {
scanf("%s", word);
InsertWord(trieTree, word, i);
}

printf("\n");   // Just to make the output more readable
LexicographicalPrint(trieTree, printUtil);

printf("\nEnter the Word to be removed - ");
scanf("%s", word);
RemoveWord(trieTree, word);

printf("\n");   // Just to make the output more readable
LexicographicalPrint(trieTree, printUtil);

return 0;
}
```

The code is highly commented with explanation. It is well described, but if you have any doubts regarding the data structure or the code, feel free to comment them. I have used a few macros. The macro CASE indicates for which case the Trie Tree works. If we mention ‘A’ as the macro, the Trie Tree will work for upper case words only.

### Other Implementations

The code is well tested against fairly large input. You can download the test case file here – Trie Tree Input (PDF). You can just clock your code for the insert operations. My code took 1.236 seconds to execute that loop which reads a word and inserts it into the Trie Tree. There are 5000 words in total. The last word in the input file is the word to be deleted.

If you think you can give a suggestion to make my code better, please do comment them too. I appreciate your suggestions. For those there who are struggling to get their code right, “Keep the effort going guys”…! Remember, you won’t learn anything if you keep writing Hello World program again and again. So, keep practicing…! I hope my post has helped you in getting to know about the Trie Tree. If it did, let me know by commenting ! Happy Coding…! 😀

## 16 thoughts on “Trie Tree Implementation”

1. Aashay

In lexicographical method , temp which is a empty vector sending to the function and
itr is pointing to the begining to that vector word and it printing the character till its end
but my question is its empty how it will print the character for suppose if root is a word whith occurence.size != 0; then how that line 158 -163 will print anything ????

Like

• Nice q’..! Well… If the root is a word with occurences.size != 0, then it would mean that the word is an empty word → “”… Any non-empty word would definitely lead to an edge. I modified a small flaw in my trie tree diagram to support my argument… You see, an edge represents an alphabet in the word… If there is an empty word… then word.begin() would be equal to word.end(), and it would not print anything as it should not because the word is “”… And the occurrence is printed… Such an output is perfectly valid… I hope this clears your doubts… Let me know if you have anymore issues.. 🙂

Like

2. Wowwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww………. 🙂 🙂 I am in love with your blog

Like

• Haha..! 😀 … Thanks a lot..!! 😀

Like

3. Juan Carlos Aravena Esparza

There’s an error in the lexicographicPrint function.

If you have strings like: “a”, “b”, “ab”, “ba” it wil print “ab”, “a”, “ba”, “b”, but the correct output should be “a”, “ab”, “b”, “ba”.

To fix it you just have to place the recursive call after the prints.

Saludos and thanks for this code.

Like

• Oh yes… I guess is was a bit careless about the lexicographical order… 😛 … Thanks a lot for pointing out the error Carlos..! 🙂

Like

4. Reza

Hi
Thank For This Post
I want to translate this c code to c++
i mean i want to change struct with class and …
can you help me ?

Like

• Hi Reza..! I’d really like to help you… But, I don’t have much experience in OOP with C++… However, I tried to code it using a class which has a few variables and methods related to Trie Tree… Check this page… Trie Tree using C++ Class.

I guess you will still find it like a C program… 😛 … Do correct my style of coding by commenting them..! 🙂

Like

5. Thank you for this wonderful explanation. Wanted to learn this since a long time.

Like

• I am happy that my post helped you, thanks for letting me know Rupesh….! ☺

Like