-
Computing the closest pair of points - Naïve approach →
Given a set of points (say provided to us as (x, y) coordinates) on a plane, let us find the closest pair among the set. The straight forward approach would be to compute the distance of each point with every other point in the given set and store the minimum distance.
As you can see this is a O(n^2) algorithm. Can we do better than this? It turns out we can - a topic for a future post.
-
Computing the number of inversions in an array - using divide and conquer approach →
In our previous post, we saw an
O(n^2
algorithm to compute the number of inversions in an input array. Now, let us see how we can use divide and conquer approach to solve the same problem. We will see that we can do better thanO(n^2)
using this approach.In the traditional recursive version of merge-sort, we divide the input array into two halves and call merge-sort procedure on each of the sub-arrays recursively and eventually merge the results. Suppose our input array is
{2, 4, 1, 3, 5}
and suppose that the merge-sort procedure calls are complete and that we are in the merge process. Let us say that we ended up with the following sorted subarrays and trying to merge them up into a single sorted array -{2, 4}
and{1, 3, 5}
. The way merge procedure works is to compare the two sub-arrays (call itleft
andright
) and compare the elements one by one. First, we compare2
and1
, since1
is less than2
, we push1
to the result array (1
being the smaller element). And we have encountered an inversion - the index of2
is smaller than the index of1
. Since we know that each of the sub-arrays are sorted, elements that follow2
in theleft
sub-array would be greater than2
and since2
formed an inversion with1
(of theright
sub-array), so will all the elements that follow2
in theleft
sub-array. So, we already see two inversions -(2, 1)
and(4, 1)
. Now we move on and compare the next elements -2
and3
and push2
into the result array, and move on to the next. Now, we compare4
and3
and since3
is smaller, we push it into the result array - we also encounter an inversion(4, 3)
. A pair of keen eyes would note that whenever we push an element from theright
sub-array into the result array, we encounter (at least) an inversion and the number of inversions at that step equals the number of elements following the current element inleft
sub-array. Think over it.Since we use divide and conquer, in fact we could re-purpose merge step of merge-sort as shown below to count the number of inversions, the time complexity of this approach is
O(nlogn)
which is better than our previous approach. -
Computing the number of inversions in an array - naïve method →
Let us think about computing the number of inversions in an array. Let us also assume that our array has unique elements.
Two elements in an array arr form an inversion if arr[i] > arr[j] for i < j.
For example, if
{2, 4, 1, 3, 5}
is our input array, it has 3 inversions(2, 1), (4,1)
and(4, 3)
. Let us look for a naïve straight-forward algorithm to solve this problem - i.e., traverse the array comparing the elements and checking if there is an inversion.Inversion tells us how far the array is from being sorted. For a sorted array, the number of inversions is
0
and if the array is sorted in the other order it has maximum number of inversions.But can we do better than
O(n^2
? -
Computing sub-array sum given a set of queries on sub-array indices - Mo's Algorithm →
In the previous post, we saw the naïve way of computing sub-array sums, given a set of queries. Now, following the hint provided in the previous post, let us try to think what would make it better.
First, let us try to understand it with an example. If we had an inpur array
a
and three queries of the form{0, 3}
,{4, 8}
and{0, 8}
, we were independently computinga[0] + a[1] + a[2] + a[3]
,a[4] + a[5] + a[6] + a[7]
anda[0] + a[1] + a[2] + a[3] + a[4] + a[5] + a[6] + a[7] + a[8]
. Although the sub-array sums we computed for the first two queries{0, 3}
and{4, 8}
could have just been reused instead of computing the sum all over again. Even in the case where the third query was of the form{0, 10}
we could have used the results computed for the queries{0, 3}
and{4, 8}
. This also means that we need to compute the queries{0, 3}
and{4, 8}
so that we could reuse them for{0, 8}
- i.e., if we computed{0, 8}
first, we already lost the advantage. Which means that we need to order our queries in such a way that we compute the queries that have the potential to be reused before computing the other queries - in the above example{0, 3}
and{4, 8}
.The ordering of queries themselves could be done in several ways. In this post, we shall look at one Mo’s algorithm that buckets the queries into a number of buckets which are a function of the size of the input array. If the input array is of size
n
, we put all the queries whose left index (L
) ranges from 0 … \sqrt(n) -1 in the first bucket, (\sqrt n) … (2 * \sqrt(n) - 1), and so on. Within each bucket, we shall order the queries in increasing order of the right indices - the queries’R
values.Now, we need to adjust for over-lapping ranges in the queries. For example, if the previous (already computed) query was
{2, 5}
and the query that is currently being worked upon is{4, 7}
then we need to discounta[2]
anda[3]
from the already computed sum (for{2, 5}
) and also adda[6]
anda[7]
to it.Very similarly, we may have to remove elements from the right as well (i.e. tracking the
R
indices). For example, if the previous range was{0, 5}
and the current range{2, 3}
, we need to discounta[4]
anda[5]
from the previous sum, as well (besidesa[0]
anda[1]
which were taken care when we were handling L (previous paragraph)).Let us run this on an example. Let us assume we are provided with an input array
a
which is{ 1 2 3 1 1 2 4 1 3 4 2 1 2 3 4 1 1 3 4 2 3 2 1 2 3 }
of size 25 (n
) and the following 8 (m
) queries(2,4) (1,7) (0,8) (5,8) (10,15) (12,16) (17,20) (18,21)
. Now, sincen
is 25, we need to put the queries with L values in the range 0 … \sqrt(n) -1 in the first bucket and so on. The first bucket, for example, would initially (arranged according to L values from 0..\sqrt(25) - 1), i.e. 0..4, will have the queries(0, 8) (2, 4) (1, 7)
and then re-ordered as per theirR
values to(2, 4) (1, 7) (0, 8)
. The other buckets are second:(5,8)
, third:(10, 15) (12, 16)
, fourth:(17, 20) (18, 21)
.Take some time to work through it. It would be worth the time to enhance the understanding.
-
Computing sub-array sum given a set of queries on sub-array indices →
Let us discuss an interesting algorithm problem - finding sub-array sums in an array.
Given an array
a
and a set of queriesq
indicating ranges in an array, say sub-array indicies, how to compute the sum of the sub-array elements in the ranges specified in the query.In this post, we will discuss a very straightforward solution that traverses the array and the set of queries and computes the sum of each of the sub-arrays specified by the query bounds.
A query is just modelled as a
struct
specifying the left and right sub-array bounds by integersL
andR
. For example,{3, 5}
would mean sum the elements in the indicies 3, 4, 5 - i.e. a[3] + a[4] + a[5]Now, let us walk through the queries one by one, and for each query, we shall traverse the array and pick up the elements in the indices specified by the query and sum them up. We initialize an accumulator
sum
and keep updating it for this purpose. Again,a
is the input array of sizen
, andq
is an array of queries and there arem
queries in all.As you can see we have two loops - one within another and hence the total complexity is O(mn). How can we make this naïve algorithm better? What do we look for? Hint: If I have queries one of which is a subset of another, we would end up computing the sum twice. For example, if two of our queries are
{0,4}
and{1,3}
, by walking through all the queries and traversing the array for each query as above, we are not utilizing the fact that we could reuse the sub-array sum within{1,3}
in the query{0,4}
instead of recomputing them independently.