• Checking for string rotations →

    Today, let us play with some examples using “strings”. Let us see if we can write a function which, given two strings, would tell you if the input strings are rotations of each other. For example, anna and naan are rotations of each other while anna and nana are not. More simply, two strings are rotations if you can get from on to the other while moving the characters in order (without over-stepping):

    a-n-n-a   --> 0. original string
    n-n-a-a   --> 1. rotated by 1 ('a' moves to the end)
    n-a-a-n   --> 2. rotated by 1
    

    A very simple algorithm would just concatenate one of the input strings with itself and try to find the second string in the concatenation of the first. In the above example, this would be:

    1. Concatenate one of the strings with itself. Say, we choose anna to concatenate and form the concatenated one annaanna
    2. Search for the occurence of the second string naan in the concatenation,
    3. We see that naan is indeed a substring of the concatenation annaanna and a rotation of anna.
    bool str_rotation(std::string s1, std::string s2)
    {
      if (s1.length() != s2.length())
        return false;
      std::string concat_str;
      concat_str = s1 + s1;
      int res_pos = concat_str.find(s2);
      if (res_pos == std::string::npos)
        return false;
      else 
        return true;
    }

    A caveat here is to check for the length of the two strings a priori. This is to avoid sub-strings of smaller length that would be identified as rotations by our algorithm. For example, without this check, we would deduce ann to be a rotation of anna.

  • Reversing an array in place →

    Today, let us write a small program to reverse the elements of an array - i.e. reverse the order of elements in an array. For example, if the input array is [1, 3, 4, 11, 2, 6], in the output array, the elements would be in reverse order - i.e. [6, 2, 11, 4, 3, 1]. One way would be to create a new (output) array and traverse the input array from the end and copy the elements to the output array. But, in this case, we would be creating an additional array (if the input array was of a million elements, we would have to create another (output) array big enough to hold a million elements). We could do it “in place” - i.e., without creating an additional array.

    We traverse the array from the beginning and swap the elements with those from the end of the array. This way, by the time we are at the mid-point of the array, we would have swapped all the elements and the array would be in reverse order.

    void rev_arr(int arr[], int n)
    {
      for (int i = 0; i < n/2; i++)
        swap(&arr[i], &arr[n - i - 1]);
    }

    where swap(arr[a], arr[b]) would swap the elements at array locations a and b and n is the size of the input array arr.

  • Creating and searching for prefixes in a Trie →

    In a previous post we created a Trie data structure in C++ and wrote some basic constructor and utility functions. Now, let us populate a Trie given a character string as input. Recall that Tries are efficient as they store common prefixes together.

    If we want to store strings ‘Annapaola’, ‘Annalisa’, ‘Annamaria’ and ‘Anna’, we will store the common prefix ‘Anna’ and then fork off to children nodes that would contain the remaining parts of the strings. This indicates that we need to traverse the string and keep adding nodes for the new (unseen) characters. We already have our newTrieNode() to create a new node for such characters. And we also need to maintain a node to indicate the end (recall we have isLeaf field for that purpose).

    void insert(Trie*& root, std::string str)
    {
      char* cstr = &str[0];  
      // create a new trie node if not present
      if (root == nullptr)
        root = newTrieNode();
      // iterate over cstr and populate trie
      Trie* curr = root;
      while (*cstr)
      {
        // currently not in the trie, create a trie node
        if (curr->lookup.find(*cstr) == curr->lookup.end())
          curr->lookup[*cstr] = newTrieNode();
        // go to next node
        curr = curr->lookup[*cstr];
        cstr++;
      }
      curr->isLeaf = true;
    }

    We also convert std::string to character string (char*) and iterate over the characters of the string. We iteratively look for the character we are currently processing to check if it has an entry in our hashmap lookup and if does not have an entry, it is an unencountered character and we create a new Trie node. And when we finish iterating over all the nodes, we set isLeaf for the last node to be true.

    Now, search follows a similar approach. Given an input string, if we want to search for the occurence of the entire string in our Trie (not just a prefix), we traverse the Trie starting from the root checking our hashmap to look for characters in the string and check if our search ends in a node with isLeaf set to true which would indicate the end of the string. If one of the characters is not found in the hashmap, we would know that the string does not exist and if immediately return false. For example, if our Trie contains only hello and if we search for hi, h would be found in the hashmap and when we move to the next node, i wouldn’t be found and hence curr = curr->lookup[i] would return a nullptr.

    bool search(Trie* node, std::string str)
    {
      char* cstr = &str[0];
      if (node == nullptr)
        return false; 
      Trie* curr = node;
      while (*cstr)
      {
        curr = curr->lookup[*cstr];
        if (curr == nullptr)
          return false;
        cstr++;
      }
      return curr->isLeaf;
    }

    We can also search for string prefixes in which case, we do not have the luxury of returning isLeaf as prefixes would just be a part of a path in the Tree and need not end in a leaf node.

  • The Trie →

    In a previous post we started with some motivation for the need of a tree like data structure for efficient prefix matching. We will see how to implement Trie in C++ in this post.

    Trie is a tree data structure made up of nodes that contain some data. So, we need to design a structure for the nodes and some structure fields for storing the data. Since we are talking about prefix strings and strings are made up of characters, we can think of a node as having a hashmap, storing the prefix character and a reference to the Trie node holding the remaining characters (children nodes). We can go about defining a structure like this:

    struct Trie
    {
      std::unordered_map<char, Trie*> lookup;
      bool isLeaf;
    };

    The boolean variable isLeaf is to indicate the end (leaf node) of the string in the Trie. And when we start populating a new Trie, we need to allocate a new structure, something like the following newTrieNode(). We set the isLeaf to false as we need to set this to true only for leaf nodes.

    Trie* newTrieNode()
    {
      Trie* node = new Trie;
      node->isLeaf = false;
      return node;
    }

    We can also write a small function to check if a node has any children (by checking the value of isLeaf).

    bool hasChildren(Trie* const curr)
    {
      for (auto it : curr->lookup)
        if (it.second != nullptr)
          return true;
      return false;
    }

    We will look at populating a Trie by inserting characters from a string into the Trie, as well as searching for a prefix in a future post.

  • From spell check to routers to many more... →

    Everyone has seen or even used a dictionary - may be this statement is more true of people from my generation than younger generations. In any case, a phonebook app in a smartphone also lists contact names in alphabetical order. Aakash would precede Amar which would precede Amit which preceeds Aqsa and so on. The common prefix between Amar and Amit is ‘Am’ - which tells us we could a tree data-structure, storing ‘Am’ in a node which has two children storing ‘i’, ‘a’ and so on. This also makes search easy and it follows the same technique we use to search for a word in a dictionary.

    There are several other applications where identifying and storing prefixes of strings efficiently can be very useful. Think about auto-completions in a texting app or Google search or how a router could do address matching efficiently.

    This data structure must be something very popular as it seems to have several everyday and important applications. And it is the “Trie”. Whenever we have a large number of strings (like protein sequences or words in a language or all possible strings one could type in a Google search box) that share some common prefix, we need to use think about efficiently matching shared prefixes. Trie is very simple and straight-forward to implement as we will see in a future post.