In interviews for programming positions, it is common for the interviewer to present a small programming problem and see what kind of solution the candidate comes up with. These problems help to reveal how the candidate thinks about solving problems, and also provide a chance to show basic coding competency. These problems can also be fun to work in their own right as exercises in the craft of software design.
The two sum problem is simple version of the more general (and NP complete!) subset sum problem. Two sum can be stated as follows:
Given an input array of numbers, and a target value, find all pairs of numbers in the input which sum to the target value.
There are various solutions possible, of course. Naively, one might consider iterating over all elements of the array, and for each element, iterating over the other elements to see if the pair add up to the sum. This leads to a solution that runs in O(N2) running time, though, which is clearly not desirable for large inputs.
A first pass at improving the performance would be to consider sorting the data. We can then use binary search to quickly see if a number’s compliment is also in the set.
Solution: Sort the input data. For each value in the array, use binary search to see if (target – value) is also in the array, and if so print the pair of numbers. We need not consider values greater that target/2 since the the data is sorted, and (target – value) would have already been checked.
Time performance is O(N log N) for the lookup, and varies for the sort, though typically it is also O(N log N) giving O(N log N) overall performance. Space complexity if O(1), though, since no extra space is needed for the operations.
" "" (indices " << i << ", " << pos << ")"
This is good, but if we are willing to use more memory, we can improve the speed of processing in a classic time-memory trade off.
Solution: Add the input numbers to a hash with the number as the hash key, and it’s index position in the input as the hash value. For each value in the array, check to see if (target – value) is also in the hash and if so print the pair of numbers. We need not consider values greater than target/2 since this would generate a duplicate pair when we eventually come across (target – value) itself in the array.
We are using an extra O(N) amount of memory for the hash table, but doing so gives us O(1) lookup time to see if (target – value) is also in the array, so once the hash is built in O(N) time we need only another O(N) amount of time to find all pairs, yielding an overall O(N) running time.
" "" (indices " << i << ", " << (*it).second << ")"// Same as above.
If you’d rather not read C++, here is a Perl version of the hash based algorithm that stops after finding the first matching pair of numbers.
# Usage: Pass the target value as a command line option, and the number array as
# values on stdin, one item per line.
"$num $lookfor (indices $i, "")\n"