• Computing the closest pair of points - Naïve approach →

    Given a set of points (say provided to us as (x, y) coordinates) on a plane, let us find the closest pair among the set. The straight forward approach would be to compute the distance of each point with every other point in the given set and store the minimum distance.

    #define REP(i, n) for (int i=0; i<n; i++)
    #define REPK(i, k, n) for (int i=k; i<n; i++)
    
    class Point {
      public: 
        int x;
        int y;
      public:
        Point () {};
        Point (int x, int y);
        static float dist(Point p, Point q);
    };
    
    float min_distance (Point p[], int n) {
      float min = MAX_DIST;
      REP(i, n)
        REPK(j, i+1, n)
          if (dist(p[i], p[j]) < min) 
            min = dist(p[i], p[j]);
      return min;
      // Time Complexity: O(n^2) - nested for loops (2)
    }

    As you can see this is a O(n^2) algorithm. Can we do better than this? It turns out we can - a topic for a future post.

  • Computing the number of inversions in an array - using divide and conquer approach →

    In our previous post, we saw an O(n^2 algorithm to compute the number of inversions in an input array. Now, let us see how we can use divide and conquer approach to solve the same problem. We will see that we can do better than O(n^2) using this approach.

    In the traditional recursive version of merge-sort, we divide the input array into two halves and call merge-sort procedure on each of the sub-arrays recursively and eventually merge the results. Suppose our input array is {2, 4, 1, 3, 5} and suppose that the merge-sort procedure calls are complete and that we are in the merge process. Let us say that we ended up with the following sorted subarrays and trying to merge them up into a single sorted array - {2, 4} and {1, 3, 5}. The way merge procedure works is to compare the two sub-arrays (call it left and right) and compare the elements one by one. First, we compare 2 and 1, since 1 is less than 2, we push 1 to the result array (1 being the smaller element). And we have encountered an inversion - the index of 2 is smaller than the index of 1. Since we know that each of the sub-arrays are sorted, elements that follow 2 in the left sub-array would be greater than 2 and since 2 formed an inversion with 1 (of the right sub-array), so will all the elements that follow 2 in the left sub-array. So, we already see two inversions - (2, 1) and (4, 1). Now we move on and compare the next elements - 2 and 3 and push 2 into the result array, and move on to the next. Now, we compare 4 and 3 and since 3 is smaller, we push it into the result array - we also encounter an inversion (4, 3). A pair of keen eyes would note that whenever we push an element from the right sub-array into the result array, we encounter (at least) an inversion and the number of inversions at that step equals the number of elements following the current element in left sub-array. Think over it.

    #define REP(i, n) for (int i=0; i<n; i++)
    #define REPK(i, k, n) for (int i=k; i<n; i++)
    
    int merge(int arr[], int l, int m, int r) {
      int inv_cnt = 0;
      // create tmp arrays for l and r
      int lsz = m - l + 1;
      int rsz = r - m;
      int left[lsz];
      int right[rsz];
      // populate
      REP(i, lsz)
        left[i] = arr[l + i];
      REP(i, rsz)
        right[i] = arr[m + 1 + i];
      // compare and merge
      int i, j, k;
      i = 0; j = 0;
      k = l;
      while ((i < lsz) && (j < rsz)) {
        if (left[i] <= right[j])
          arr[k++] = left[i++];
        else {
          arr[k++] = right[j++];
          inv_cnt += lsz - i;
        }
      }
      // fill in the left-over elements
      while (i < lsz) 
        arr[k++] = left[i++];
      while (j < rsz) 
        arr[k++] = right[j++];
    
      return inv_cnt;
    }
    
    int mergesort (int arr[], int l, int r) {
      int inv_cnt = 0;
      if (l < r) {
        int mid = l+(r-l)/2;
    
        inv_cnt = mergesort(arr, l, mid);
        inv_cnt += mergesort(arr, mid+1, r);
        inv_cnt += merge(arr, l, mid, r);
      }
      return inv_cnt;
    }

    Since we use divide and conquer, in fact we could re-purpose merge step of merge-sort as shown below to count the number of inversions, the time complexity of this approach is O(nlogn) which is better than our previous approach.

  • Computing the number of inversions in an array - naïve method →

    Let us think about computing the number of inversions in an array. Let us also assume that our array has unique elements.

    Two elements in an array arr form an inversion if arr[i] > arr[j] for i < j.

    For example, if {2, 4, 1, 3, 5} is our input array, it has 3 inversions (2, 1), (4,1) and (4, 3). Let us look for a naïve straight-forward algorithm to solve this problem - i.e., traverse the array comparing the elements and checking if there is an inversion.

    int inv_count (int arr[], int n) {
      int inv_count = 0;
      for (int i=0; i<n-1; i++)
        for (int j=i+1; j<n; j++)
          if (arr[j] < arr[i])
            inv_count++;
      return inv_count;
      // Time complexity: O(n^2) (two for loops)
    }

    Inversion tells us how far the array is from being sorted. For a sorted array, the number of inversions is 0 and if the array is sorted in the other order it has maximum number of inversions.

    But can we do better than O(n^2?

  • Computing sub-array sum given a set of queries on sub-array indices - Mo's Algorithm →

    In the previous post, we saw the naïve way of computing sub-array sums, given a set of queries. Now, following the hint provided in the previous post, let us try to think what would make it better.

    First, let us try to understand it with an example. If we had an inpur array a and three queries of the form {0, 3}, {4, 8} and {0, 8}, we were independently computing a[0] + a[1] + a[2] + a[3], a[4] + a[5] + a[6] + a[7] and a[0] + a[1] + a[2] + a[3] + a[4] + a[5] + a[6] + a[7] + a[8]. Although the sub-array sums we computed for the first two queries {0, 3} and {4, 8} could have just been reused instead of computing the sum all over again. Even in the case where the third query was of the form {0, 10} we could have used the results computed for the queries {0, 3} and {4, 8}. This also means that we need to compute the queries {0, 3} and {4, 8} so that we could reuse them for {0, 8} - i.e., if we computed {0, 8} first, we already lost the advantage. Which means that we need to order our queries in such a way that we compute the queries that have the potential to be reused before computing the other queries - in the above example {0, 3} and {4, 8}.

    The ordering of queries themselves could be done in several ways. In this post, we shall look at one Mo’s algorithm that buckets the queries into a number of buckets which are a function of the size of the input array. If the input array is of size n, we put all the queries whose left index (L) ranges from 0 … \sqrt(n) -1 in the first bucket, (\sqrt n) … (2 * \sqrt(n) - 1), and so on. Within each bucket, we shall order the queries in increasing order of the right indices - the queries’ R values.

    // a, b : two Queries to be compared in order to sort
    bool compare(Query a, Query b) {
      // first, sort by L if they are in different buckets
      if (a.L/bucket != b.L/bucket)
        return a.L/bucket < b.L/bucket;
      // within the same bucket, queries are sorted
      // according to R values
      return a.R < b.R;
    }

    Now, we need to adjust for over-lapping ranges in the queries. For example, if the previous (already computed) query was {2, 5} and the query that is currently being worked upon is {4, 7} then we need to discount a[2] and a[3] from the already computed sum (for {2, 5}) and also add a[6] and a[7] to it.

    Very similarly, we may have to remove elements from the right as well (i.e. tracking the R indices). For example, if the previous range was {0, 5} and the current range {2, 3}, we need to discount a[4] and a[5] from the previous sum, as well (besides a[0] and a[1] which were taken care when we were handling L (previous paragraph)).

    // initialize the current L and R values
    int currL = 0, currR = 0;
    // initialize an accumulator for the sum
    int curr_sum = 0;
    
    // iterate over the queries
    for (int i=0; i<m; i++) {
      int L = q[i].L, R = q[i].R;
    
      // taking care of L
      // taking care of discarding
      // sub-parts from the sum
      while (currL < L) {
        curr_sum -= a[currL];
        currL++;
      }
      // taking care of adding 
      // sub-parts to the sum
      while (currL > L) {
        curr_sum += a[currL];
        currL--;
      }
    
      // taking care of R
      // adding elements to R
      while (currR <= R) {
        curr_sum += a[currR];
        currR++;
      }
      // discounting elements
      while (currR > R+1) {
        curr_sum -= a[currR - 1];
        currR--;
      }
    }

    Let us run this on an example. Let us assume we are provided with an input array a which is { 1 2 3 1 1 2 4 1 3 4 2 1 2 3 4 1 1 3 4 2 3 2 1 2 3 } of size 25 (n) and the following 8 (m) queries (2,4) (1,7) (0,8) (5,8) (10,15) (12,16) (17,20) (18,21). Now, since n is 25, we need to put the queries with L values in the range 0 … \sqrt(n) -1 in the first bucket and so on. The first bucket, for example, would initially (arranged according to L values from 0..\sqrt(25) - 1), i.e. 0..4, will have the queries (0, 8) (2, 4) (1, 7) and then re-ordered as per their R values to (2, 4) (1, 7) (0, 8). The other buckets are second: (5,8), third: (10, 15) (12, 16), fourth: (17, 20) (18, 21).

    Take some time to work through it. It would be worth the time to enhance the understanding.

  • Computing sub-array sum given a set of queries on sub-array indices →

    Let us discuss an interesting algorithm problem - finding sub-array sums in an array.

    Given an array a and a set of queries q indicating ranges in an array, say sub-array indicies, how to compute the sum of the sub-array elements in the ranges specified in the query.

    In this post, we will discuss a very straightforward solution that traverses the array and the set of queries and computes the sum of each of the sub-arrays specified by the query bounds.

    A query is just modelled as a struct specifying the left and right sub-array bounds by integers L and R. For example, {3, 5} would mean sum the elements in the indicies 3, 4, 5 - i.e. a[3] + a[4] + a[5]

    struct Query {
      int L, R;
    };

    Now, let us walk through the queries one by one, and for each query, we shall traverse the array and pick up the elements in the indices specified by the query and sum them up. We initialize an accumulator sum and keep updating it for this purpose. Again, a is the input array of size n, and q is an array of queries and there are m queries in all.

    // walk through the queries one by one and 
    // get the query bounds
    // Complexity: O(m)
    for (int i=0; i < m; i++) {
      int L = q[i].L, R = q[i].R;
      // accumulator to collect the sum
      int sum = 0;
      // traverse the array and sum up the elements
      // within the query bounds
      // Complexity: O(n)
      for (int j=L; j<=R; j++) {
        sum += a[j];
      }
    }
    // Total Complexity: O(nm)

    As you can see we have two loops - one within another and hence the total complexity is O(mn). How can we make this naïve algorithm better? What do we look for? Hint: If I have queries one of which is a subset of another, we would end up computing the sum twice. For example, if two of our queries are {0,4} and {1,3}, by walking through all the queries and traversing the array for each query as above, we are not utilizing the fact that we could reuse the sub-array sum within {1,3} in the query {0,4} instead of recomputing them independently.