Yekun's Note

Machine learning notes and writeup.

Fork me on GitHub

Sorting Algorithms

A summary of sorting algorithms.

Basic sorting algorithms

Algorithms Time complexity(Avg) Time complexity(worst) Time complexity(best) Space complexity Stability
Bubble Sort $O(n^2)$ $O(n^2)$ $O(n)$ $O(1)$ Stable
Selection Sort $O(n^2)$ $O(n^2)$ $O(n^2)$ $O(1)$ Untable
Insertion Sort $O(n^2)$ $O(n^2)$ $O(n)$ $O(1)$ Stable
Shell Sort $O(n^1.3)$ $O(n^2)$ $O(n)$ $O(1)$ Unstable
Merge Sort $O(n \log n)$ $O(n \log n)$ $O(n \log n)$ $O(n)$ stable
Heap Sort $O(n \log n)$ $O(n \log n)$ $O(n \log n)$ $O(1)$ Unstable
Quick Sort $O(n \log n)$ $O(n^2)$ $O(n \log n)$ $O(n \log n)$ Unstable
Count Sort $O(n+k)$ - - $O(1)$ Stable

Bubble Sort

Idea: Bubble the least element to the front position $i=j-1$ at each episode $j$.(j= 1,2,…, len(A))

  • Time complexity: $O(n^2)$
  • Space Complexity: $O(1)$, in place
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    def BubbleSort(A):
    """
    Time Complexity: O(n^2)
    Space Complexity: O(1), in place
    :param A: List(int)
    :return: List(int)
    """
    for j in range(len(A)): # len(A)-1 episodes
    # bubble the least element to the most front position, at i-th index
    for i in range(len(A) - 1, j, -1):
    if A[i] < A[i - 1]:
    A[i], A[i - 1] = A[i - 1], A[i]
    return A

Bubble Sort with swap flag

  • Problems: The above version of bubble sort would always do len(A)-1 times episodes even though all elements are sorted before that.
  • Solution: set a flag to record whether there exists swap during the current episode. If there is no swap in the previous iteration, ilustrating that elements are already in order, it is no need to continue the bubble process.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
def BubbleSort_with_flag(A:list):
"""
Time Complexity: O(n^2)
Space Complexity: O(1), in place
:param A: List(int)
:return: List(int)
"""
for j in range(len(A)): # len(A)-1 episodes
flag = False # flag of whether to swap
# bubble the least element to the most front position, at i-th index
for i in range(len(A) - 1, j, -1):
if A[i] < A[i - 1]:
A[i], A[i - 1] = A[i - 1], A[i]
flag = True
# if no swap, all following elements are already in order
if not flag: break
return A

Selection Sort

Process: find the least value from A[i:] and swap it with the front (i-th) element at $i$-th episode. (i= 1,2,…, len(A)).

  • Time complexity: $O(n^2)$
  • Space Complexity: $O(1)$, in place
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
def SelectionSort(A: list):
"""
Time Complexity: O(n^2)
Space Complexity: O(1), in place
:param A: List(int)
:return: List(int)
"""
for i in range(len(A) - 1):
min_index = i
for j in range(i + 1, len(A)):
if A[j] < A[min_index]:
min_index = j
# find the least value to swap in each episode
A[i], A[min_index] = A[min_index], A[i]
return A

Insertion Sort

Process: keep the former section in order. From the index 1 on, insert the current value at the correct place in previous section.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
def InsertionSort(A: list):
"""
Time Complexity: O(n^2)
Space Complexity: O(1), in place
:param A: List(int)
:return: List(int)
"""
for j in range(1, len(A)):
key = A[j]
i = j - 1
while i >= 0 and key < A[i]:
A[i + 1] = A[i]
i -= 1
A[i + 1] = key
return A

Shell Sort

Process: divide the array into len(A)//$h$ groups, and sort separately; the reduce the step $h$ and sort, until the step size reaches to 1.

  • It can be seen as an improvement of Insertion Sort.
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    def ShellSort(A: list):
    """
    :param A: List(int)
    :return: List(int)
    """
    n = len(A)
    h = 1
    while h < n / 3:
    h = 3 * h + 1
    while h >= 1:
    # insertion sort
    for i in range(h, n):
    j = i
    while j >= h and A[j] < A[j - h]:
    A[j], A[j - h] = A[j - h], A[j]
    j -= h
    h //= 3
    return A

Merge Sort

Idea: divide-and-conquer.

  • Time Complexity: $O(n \log n)$
  • Space Complexity: $O(n)$

upload successful

The array version of merge sort:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
def merge(L, R):
"""
:param L: List(int)
:param R: List(int)
:return c: List(int)
"""
c = []
i = j = 0
for _ in range(len(L) + len(R)):
# check whether L or R reaches its end
if i == len(L):
c.append(R[j])
j += 1
elif j == len(R):
c.append(L[i])
i += 1
elif L[i] <= R[j]:
c.append(L[i])
i += 1
else:
c.append(R[j])
j += 1
return c

def MergeSort(A: list):
"""
merge sort from top down (with recursion)
Time Complexity: O(NlogN)
Space Complexity: O(n)
:param A: List(int)
:return: List(int)
"""
if len(A) <= 1: return A
mid = len(A) >> 1
left = MergeSort(A[:mid])
right = MergeSort(A[mid:])
return merge(left, right)

Linked List version [7] :

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
# Definition for singly-linked list.
class ListNode(object):
def __init__(self, x):
self.val = x
self.next = None

class Solution(object):
def sortList(self, head):
"""
:type head: ListNode
:rtype: ListNode
"""
if not head or not head.next:
return head
# divide the linked list into two parts
slow, fast = head, head.next
while fast and fast.next:
fast = fast.next.next
slow = slow.next
second = slow.next

# cut down the fist part
slow.next = None
l = self.sortList(head)
r = self.sortList(second)
return self.merge(l, r)

def merge(self, l, r):
"""
merge sort helper func
:param l: the left part of linked list
:param r: the right part of linked list
:return:
"""
if not l or not r:
return l or r
if l.val > r.val:
l, r = r, l
head = pre = l # define the return head node
l = l.next
while l and r:
if l.val < r.val:
pre.next = l
l = l.next
else:
pre.next = r
r = r.next
pre = pre.next
# at least one of l or r is None
pre.next = l or r
return head

Heap Sort

Process:

  1. Build a max heap
  2. Swap the top element of max heap with the end element, heaptify the max heap followed by heapsize -1 , until the array in order.
  • Time complexity: $O(n \log n)$
  • Space complexity: $O(1)$, in place
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
def Parent(i: int): return (i - 1) >> 1

def Left(i: int): return 2 * (i + 1) - 1

def Right(i: int): return 2 * (i + 1)

def max_heapify(A: list, i: int, heap_size: int):
"""
heapify the max heap, so that the root is the maximum
:param A: heap to be heapify
:param i: the parent node index
:param heap_size: size of the heap to heapify
:return:
"""
l, r = Left(i), Right(i) # left and right node
if l < heap_size and A[l] > A[i]:
# if the left node is larger
largest = l
else:
largest = i
if r < heap_size and A[r] > A[largest]:
# if the right node is larger
largest = r
if largest != i: # if not satisfies max heap property
A[i], A[largest] = A[largest], A[i] # swap the max value with the root value
max_heapify(A, largest, heap_size) # recursively heapify the swapped node

def build_max_heap(A: list):
"""
build the max heap
O(log n)
:param A:
:return:
"""
heap_size = len(A)
for i in range(len(A)//2, -1, -1):
max_heapify(A, i, heap_size)

def HeapSort(A: list):
"""
Time complexity: O(NlogN)
:param A: array to be sorted
:return:
"""
# build the max heap so that the top element is the maximum
build_max_heap(A)
heap_size = len(A)
for i in range(len(A) - 1, 0, -1): # from the tail of array to traverse
# swap the maximum with the array tail
A[0], A[i] = A[i], A[0]
# exclude the final element (i.e. maximum) in the current heap
heap_size -= 1
# heapify thus the maximum float upwards to the top
max_heapify(A, 0, heap_size)

Quick Sort

Process: divide-and-conquer

  1. Randomly choose one num as the pivot element A[r]
  2. Put elements less than A[r] on the left side, whilst numbers no less than A[r] on the righ side
  3. Repeat the step2, until there is only one element on the left and right side.
  • Time complexity: $O(n \log n)$
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
def partition(A: list, p: int, r: int):
"""
partition for quick sort
:param A: List(int)
:param p: index of the left boundary
:param r: index of the pivot element
:return: index of pivot element separating the left and right,
where the left values are less than or equal to it
whilst the right are greater than it.
"""
x = A[r]
i = p - 1
for j in range(p, r):
if A[j] <= x:
i += 1
A[j], A[i] = A[i], A[j]
A[r], A[i + 1] = A[i + 1], A[r]
return i + 1

def QuickSort(A: list, p: int, r: int):
"""
Quick Sort
Time complexity: O(NlogN)
:param A: List(int), array to be sorted
:param p: index of the left boundary
:param r: index of the right boundary
:return:
"""
if p < r:
q = partition(A, p, r)
QuickSort(A, p, q - 1)
QuickSort(A, q + 1, r)

Bucket sort

Bucket Sort is a kind of non-comparison sorting algorithms.

Count Sort

Process:

  1. find the maximum $k$ of the array, and create $k+1$ new buckets
  2. Traverse the array for one pass, put the elements into the buckets
  3. Traverse all the buckets, take the values from buckets.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
def CountSort(arr: list):
"""
Time complexity: O(n+k), where k is the max value of array elements.
"""
k = max(arr)
B = [None] *(len(arr) + 1) # the 1st element is an index placeholder
C = [0 for _ in range(k + 1)] # save the count (max index) of nums at index num in C
for j in range(len(arr)):
C[arr[j]] += 1 # count, e.g. A[1]=2, means there is two 1s in total
for i in range(1, k + 1):
C[i] = C[i] + C[i - 1] # the maximum index with starting index 1
for j in range(len(arr)):
B[C[arr[j]]] = arr[j] # B index : C values, B values: C index
C[arr[j]] -= 1 # C values -1
return B[1:]

Advance

Dutch national flag problem

Dutch national flag problem:

  • Given an array with $n$ objects colored red, white or blue, sort them in-place so that objects of the same color are adjacent, with the colors in the order red, white and blue. Here, we will use the integers 0, 1, and 2 to represent the color red, white, and blue respectively.[1][4]

Solution: three-way partitioning

Three-way partitioning [2][3] divides the arrays into four sections:

  • A[:lo]: all of 0s (red)
  • A[lo:mid]: all of 1s (white)
  • A[mid:hi]: unclassified
  • A[hi:]: all of 2s (blue)

The pseudocode:

  1. Input: array A with values 0,1,2, mid value 1
  2. Initilize $i \leftarrow 0, j \leftarrow 0, n \leftarrow sizeof(A)-1$
  3. while j<=n:
    1. if A[j] < mid:
      1. swap A[i] and A[j]
      2. $i \leftarrow i+1$
      3. $j \leftarrow j+1$
    2. else if A[j] > mid:
      1. swap A[j] and A[n]
      2. $n \leftarrow n-1$
    3. else:
      1. $j \leftarrow j+1$
  • Time complexity: $O(n)$
  • Space complexity: $O(1)$
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
def sortColors(nums:list):
"""
Dutch National Flag Algorithm, 3-way partitioning
O(n)
:type nums: List[int]
:rtype: sorted array nums (in-place)
"""
# divide into four sections:
# [:lo]: all of 0s
# [lo:mid]: all of 1s
# [mid:hi]: unclassified
# [hi:]: all of 2s
lo, mid, hi = 0, 0, len(nums) - 1 # red, white, blue
while mid <= hi: # if still exists unclassified elements
if nums[mid] == 0: # if it is 0s
nums[lo], nums[mid] = nums[mid], nums[lo]
lo += 1 # the 0's section grew
mid += 1
elif nums[mid] == 1:
mid += 1 # the 1's section grew
else: # nums[mid]==2
# swap the 2 (i.e. at the mid index ) with the last unclassified(i.e. at the hi position)
nums[mid], nums[hi] = nums[hi], nums[mid]
hi -= 1 # the 2's section grew
return nums

References