1. Problem

    The longest Increasing Subsequence (LIS) problem is to find the length of the longest subsequence of a given sequence such that all elements of the subsequence are sorted in increasing order. For example, length of LIS for { 10, 22, 9, 33, 21, 50, 41, 60, 80 } is 6 and LIS is {10, 22, 33, 50, 60, 80}.

    Variants 

    Maximum Sum Increasing Subsequence

    Given an array of n positive integers. Write a program to find the maximum sum subsequence of the given array such that the integers in the subsequence are sorted in increasing order. For example, if input is {1, 101, 2, 3, 100, 4, 5}, then output should be {1, 2, 3, 100}.

    Box Stacking

    You are given a set of n types of rectangular 3-D boxes, where the i-th box has height h(i), width w(i) and depth d(i) (all real numbers). You want to create a stack of boxes which is as tall as possible, but you can only stack a box on top of another box if the dimensions of the 2-D base of the lower box are each strictly larger than those of the 2-D base of the higher box. Of course, you can rotate a box so that any side functions as its base. It is also allowable to use multiple instances of the same type of box.

    Solution

    DP: O(n^2)
    f[i]: LIS ending at i
    f[i] = max(f[j]) + 1, j < i & A[j] < A[i]

    OptimalO(nlog(n)) maintaining the longest increasing subsequence found so far

    Variants

    Maximum Sum Increasing Subsequence (weighted LIS)

    Notes

  2. Problem

    Given a triangle, find the minimum path sum from top to bottom. Each step you may move to adjacent numbers on the row below.
    For example, given the following triangle
    [
         [2],
        [3,4],
       [6,5,7],
      [4,1,8,3]
    ]
    
    The minimum path sum from top to bottom is 11 (i.e., 2 + 3 + 5 + 1 = 11).
    Note:
    Bonus point if you are able to do this using only O(n) extra space, where n is the total number of rows in the triangle.

    Solution

    Space: O(n^2)
    Rolling array: O(n) space.

    Notes

  3. Problem

    Find K smallest/largest elements in an array.
    For example, if given array is [1, 23, 12, 9, 30, 2, 50] and you are asked for the largest 3 elements i.e., k = 3 then your program should print 50, 30 and 23.

    Variants 

    Compute K closest stars @ EPI.

    Solution

    Heap

    nlog(k), additional O(klog(k)) if output has to be sorted.

    Build up a min-heap of size k with the first k records, scan the remaining records, replace the heap top if the current record is greater than the heap top and then adjust the heap.

    Selection

    You can do this in O(n) by using a selection algorithm. Find the kth largest element with the partition algorithm, then all the elements after it will be larger than it, and those are your top k.

    If you need those top k in sorted order, you can then sort them in O(k log k).

    Notes

    1. Heap is preferred over Selection for if array is a stream or size not known.
    2. Comprehensive summary @ geeksforgeeks.
  4. Problem

    Given a large number (millons or billons) of records (integers, IPs, URLs, query key words, etc.),
    • find out top k most frequent records;
    • find out duplicates/missing;
    • find out whether a given number x is in the set or not.
    One common requirement here is how to handle big data efficiently when they are not able to be all loaded into memory.


    1) Find Missing Integer (<Cracking Coding Interview> 10.3)

    Given a file with 4 billion distinct integers, find out an integer that is not contained in the file.
    • Assume you have 1 GB of memory.
    • Assume you have 10 MB of memory.

    2) Find Top k Most Frequent IP

    Given a large access log of a website that contains 10 billion IPs that access the website in a day, find out k IPs that access the website most frequently.

    Solution

    Many solutions resemble External Merge Sort (also see local commented pdf), multi-pass approach:

    1. Divide so it can be loaded into memory
    2. Solve in memory
    3. Merge
    These tools are often useful:

    • Hash table or Bitmap (or Bloom Filter)
    • Sort data by their occurrences
    • Heap
    1) 
    Also see CC150 and reference.

    2) 
    We solve the problem with two passes:
    1. Divide up the data set into N blocks.
      • Notice that the data set is not sorted, so when we split data, we cannot simple put every n records into a small file. We need to hashing the IP to its corresponding file so that the same IP will always go to the same file.
        100 MB / 19 bytes ~= 5.3 millons. So, let each block have 4 millon = 2^22 IPs.
        One way to hash the IPs is to parse the first two token "xxx.xxx" of an IP, convert it into an integer and use the high 10 bits as the corresponding file id.
    2. In each block, use a hash table to calculate frequencies and then perform in-place HeapSort to find the first k most frequent IP.
    3. MergeSort/HeapSort the N sorted arrays to find the overall k most frequent IP.

    Notes

  5. Problem

    Given an array of numbers and a sliding window size, how to get the maximal numbers in all sliding windows?

    For example, if the input array is {2, 3, 4, 2, 6, 2, 5, 1} and the size of sliding windows is 3, the output of maximums are {4, 4, 6, 6, 6, 5}, as illustrated in Table1.

    Sliding Windows in an Array
    Maximums in Sliding Windows
    [2, 3, 4], 2, 6, 2, 5, 1
    4
    2, [3, 4, 2], 6, 2, 5, 1
    4
    2, 3, [4, 2, 6], 2, 5, 1
    6
    2, 3, 4, [2, 6, 2], 5, 1
    6
    2, 3, 4, 2, [6, 2, 5], 1
    6
    2, 3, 4, 2, 6, [2, 5, 1]
    5
    Table 1: Maximums of all sliding windows with size 3 in an array {2, 3, 4, 2, 6, 2, 5, 1}. A pair of brackets indicates a sliding window.

    Solution

    Brute-force: O(nk)
    heap: O(nlog(k))

    Max queue

    O(n): queue contains elements in window.
    Implement a queue with getMax() in O(1).
    1. a queue can be implemented by using two stacks. See EPI 9.12
    2. a stack w/ min
    Combine above 2.

    Dequeue

    O(n) Full explanation. Key idea: no need to store all elements in window, only these potential window maximum; queue is sorted decreasingly. To see why it works, consider the invariant: queue is sorted.

    Notes

    1. "While" in line 26 shall be "if."
    2. See also geeksforgeeks, leetcode, & SO
  6. Problem

    Given an unsorted array of integers, find the length of the longest consecutive elements sequence.
    For example,
    Given [100, 4, 200, 1, 3, 2],
    The longest consecutive elements sequence is [1, 2, 3, 4]. Return its length: 4.
    Your algorithm should run in O(n) complexity.

    Solution


    Notes

    1. Do not omit line 13.
  7. Problem

    Finding longest/shortest subarray/substring in a linear consecutive data structure meeting a requirement. Often have to find O(n) time solutions.

    Solution

    1. Often it's easy to improve from brute-force by thinking of prefix/suffix to, say, O(n^2).
    2. To further optimize, think of 
      • whether next element(s) can be skipped bcoz subarrays starting with them are suboptimal
      • mark elements processed and skip later so O(n) even seemingly two loops. 
              In both cases, hash table comes into play.

    Notes

  8. Problem

    The count-and-say sequence is the sequence of integers beginning as follows:
    1, 11, 21, 1211, 111221, ...
    1 is read off as "one 1" or 11.
    11 is read off as "two 1s" or 21.
    21 is read off as "one 2, then one 1" or 1211.
    Given an integer n, generate the nth sequence.
    Note: The sequence of integers will be represented as a string.

    Solution


    Notes

  9. Problem

    Given a linked list, swap every two adjacent nodes and return its head.
    For example,
    Given 1->2->3->4, you should return the list as 2->1->4->3.
    Your algorithm should use only constant space. You may not modify the values in the list, only nodes itself can be changed.

    Generalize k from 2 to arbitrary number

    Given a linked list, reverse the nodes of a linked list k at a time and return its modified list.
    If the number of nodes is not a multiple of k then left-out nodes in the end should remain as it is.
    You may not alter the values in the nodes, only nodes itself may be changed.
    Only constant memory is allowed.
    For example,
    Given this linked list: 1->2->3->4->5
    For k = 2, you should return: 2->1->4->3->5
    For k = 3, you should return: 3->2->1->4->5

    Solution



    Notes

    1. Line 16 is not "k-- > 0" since we want to find the next (k - 1)th, , not kth, node.
  10. Problem

    Partition the array into 2 subarrays such that elements in the lower part <= elements in the higher part.

    Variant

    Partition the array into three parts, one each for items with keys smaller than, equal to, and larger than the partitioning item's key. Mainly to handle arrays with large numbers of duplicate keys.

    Solution

    Hoare's partition. Correctness proof.
    Lomuto's partition.
    Differences:

    1. Hoare uses the first element as pivot, Lomuto uses the last;
    2. Lomuto puts pivot in the right position, while Hoare may not.

    Variant

    Dutch National Flag problem

    Notes

    1. To use arbitrary (e.g., median or random) element in the array as pivot, just swap it with the first in Hoare, last in Lomuto. The rest is the same.
    2. Very good graphic lecture.
    3. Variant used in Sort Colors.
Loading