-
Checking for string rotations →
Today, let us play with some examples using “strings”. Let us see if we can write a function which, given two strings, would tell you if the input strings are rotations of each other. For example,
anna
andnaan
are rotations of each other whileanna
andnana
are not. More simply, two strings are rotations if you can get from on to the other while moving the characters in order (without over-stepping):a-n-n-a --> 0. original string n-n-a-a --> 1. rotated by 1 ('a' moves to the end) n-a-a-n --> 2. rotated by 1
A very simple algorithm would just concatenate one of the input strings with itself and try to find the second string in the concatenation of the first. In the above example, this would be:
- Concatenate one of the strings with itself. Say, we choose
anna
to concatenate and form the concatenated oneannaanna
- Search for the occurence of the second string
naan
in the concatenation, - We see that
naan
is indeed a substring of the concatenation annaanna and a rotation ofanna
.
A caveat here is to check for the length of the two strings a priori. This is to avoid sub-strings of smaller length that would be identified as rotations by our algorithm. For example, without this check, we would deduce
ann
to be a rotation ofanna
. - Concatenate one of the strings with itself. Say, we choose
-
Reversing an array in place →
Today, let us write a small program to reverse the elements of an array - i.e. reverse the order of elements in an array. For example, if the input array is
[1, 3, 4, 11, 2, 6]
, in the output array, the elements would be in reverse order - i.e.[6, 2, 11, 4, 3, 1]
. One way would be to create a new (output) array and traverse the input array from the end and copy the elements to the output array. But, in this case, we would be creating an additional array (if the input array was of a million elements, we would have to create another (output) array big enough to hold a million elements). We could do it “in place” - i.e., without creating an additional array.We traverse the array from the beginning and swap the elements with those from the end of the array. This way, by the time we are at the mid-point of the array, we would have swapped all the elements and the array would be in reverse order.
where
swap(arr[a], arr[b])
would swap the elements at array locationsa
andb
andn
is the size of the input arrayarr
. -
Creating and searching for prefixes in a Trie →
In a previous post we created a Trie data structure in C++ and wrote some basic constructor and utility functions. Now, let us populate a Trie given a character string as input. Recall that Tries are efficient as they store common prefixes together.
If we want to store strings ‘Annapaola’, ‘Annalisa’, ‘Annamaria’ and ‘Anna’, we will store the common prefix ‘Anna’ and then fork off to children nodes that would contain the remaining parts of the strings. This indicates that we need to traverse the string and keep adding nodes for the new (unseen) characters. We already have our
newTrieNode()
to create a new node for such characters. And we also need to maintain a node to indicate the end (recall we haveisLeaf
field for that purpose).We also convert std::string to character string (char*) and iterate over the characters of the string. We iteratively look for the character we are currently processing to check if it has an entry in our hashmap
lookup
and if does not have an entry, it is an unencountered character and we create a new Trie node. And when we finish iterating over all the nodes, we setisLeaf
for the last node to betrue
.Now, search follows a similar approach. Given an input string, if we want to search for the occurence of the entire string in our Trie (not just a prefix), we traverse the Trie starting from the root checking our hashmap to look for characters in the string and check if our search ends in a node with
isLeaf
set totrue
which would indicate the end of the string. If one of the characters is not found in the hashmap, we would know that the string does not exist and if immediately returnfalse
. For example, if our Trie contains onlyhello
and if we search forhi
,h
would be found in the hashmap and when we move to the next node,i
wouldn’t be found and hencecurr = curr->lookup[i]
would return anullptr
.We can also search for string prefixes in which case, we do not have the luxury of returning
isLeaf
as prefixes would just be a part of a path in the Tree and need not end in a leaf node. -
The Trie →
In a previous post we started with some motivation for the need of a tree like data structure for efficient prefix matching. We will see how to implement Trie in C++ in this post.
Trie is a tree data structure made up of nodes that contain some data. So, we need to design a structure for the nodes and some structure fields for storing the data. Since we are talking about prefix strings and strings are made up of characters, we can think of a node as having a hashmap, storing the prefix character and a reference to the Trie node holding the remaining characters (children nodes). We can go about defining a structure like this:
The boolean variable
isLeaf
is to indicate the end (leaf node) of the string in the Trie. And when we start populating a new Trie, we need to allocate a new structure, something like the followingnewTrieNode()
. We set theisLeaf
tofalse
as we need to set this totrue
only for leaf nodes.We can also write a small function to check if a node has any children (by checking the value of
isLeaf
).We will look at populating a Trie by inserting characters from a string into the Trie, as well as searching for a prefix in a future post.
-
From spell check to routers to many more... →
Everyone has seen or even used a dictionary - may be this statement is more true of people from my generation than younger generations. In any case, a phonebook app in a smartphone also lists contact names in alphabetical order. Aakash would precede Amar which would precede Amit which preceeds Aqsa and so on. The common prefix between Amar and Amit is ‘Am’ - which tells us we could a tree data-structure, storing ‘Am’ in a node which has two children storing ‘i’, ‘a’ and so on. This also makes search easy and it follows the same technique we use to search for a word in a dictionary.
There are several other applications where identifying and storing prefixes of strings efficiently can be very useful. Think about auto-completions in a texting app or Google search or how a router could do address matching efficiently.
This data structure must be something very popular as it seems to have several everyday and important applications. And it is the “Trie”. Whenever we have a large number of strings (like protein sequences or words in a language or all possible strings one could type in a Google search box) that share some common prefix, we need to use think about efficiently matching shared prefixes. Trie is very simple and straight-forward to implement as we will see in a future post.