Algorithm Review: Insertion Sort

I have to admit it. I forget things quite easily, and probably the most annoying things to forget are oneself’s skills, aren’t they?

Fortunately, a couple of weeks ago I came across HackerRank¬†and I’m loving how following its challenges is easily refreshing and resharpening some of my skills!

A couple of days ago I committed myself to go through every single challenge there in order and today I’ve decided to write a small post here with the concepts and ideas of every algorithm, data structure, etc… reviewed, first of all to settle that concepts down in my mind a bit deeper but also to have a place where to easily come back and have a quick read if after some time I feel I need to. And last but not least to share it with whoever may read this! ūüôā

So, as HackerRank started off with the Insertion Sort sorting algorithm and as I’ve already finished all related exercises with the maximum score (was not so hard indeed), here is my Insertion Sort algorithm review!

Insertion Sort is a very simple in place sorting algorithm. This means that no significant extra memory is required for it to execute. This can be reworded as: Insertion Sort memory complexity is O(n).

The execution itself could be explained as a set of steps:

Consider the subarray formed by the first element as the sorted part and the remaining elements the unsorted part.

For each element in the unsorted part:

  1. Store it in an auxiliar variable
  2. Shift right all elements in the sorted part that are bigger than the value stored in step 1.
  3. Insert the auxiliar value in the location where step 2 stopped.

We could graphically draw it like this:

[2, 3, 4, 6, 5, 1]
[2, 3, 4, 6, 5, 1]
[2, 3, 4, 6, 5, 1]
[2, 3, 4, 6, 5, 1]
[2, 3, 4, 6, 6, 1]
[2, 3, 4, 5, 6, 1]
[2, 3, 4, 5, 6, 6]
[2, 3, 4, 5, 5, 6]
[2, 3, 4, 4, 5, 6]
[2, 3, 3, 4, 5, 6]
[2, 2, 3, 4, 5, 6]
[1, 2, 3, 4, 5, 6]

As we can see, the sorted array (in bold) keeps growing while next element is sorted and shifts to right elements, temporarily overlapping values until the proper place for the current value is found.

A possible Ruby implementation:

data = [...]
for i in 1...data.count
  value = data[i]
  j = i - 1
  while j >= 0 && value < data[j]
    data[j+1] = data[j]
    j -= 1
  end
  data[j+1] = value
end

Analysing the worst case algorithm performance we can easily guess that the worst case scenario occurs when the input array is in reverse order because the algorithm will have to shift that number to the beginning of the sorted part and therefore will need 1 + 2 + 3 + … + (N-1) shifts.

1 + 2 + 3 + … + (N-1) is a finite arithmetic progression, which sum, by definition is frac{n(a_{0}+a_{n})}{2}. In our particular case this is frac{n(1 + (n-1))}{2}, that is frac{n^{2}}{2} which we can round to n2.

Therefore we can say that Insertion Sort time complexity is O(n2).

iOS Data Collection Using Swift

Here at MyDrive we are completely data-driven and as such we need to collect
as much data as possible. This data is then analysed by our data scientists and/or
used for Machine Learning purposes. To collect all that data we have designed and implemented our
own iOS data collection app (a.k.a. the iOS awesome app) that silently records
all our movements (lovely, isn’t it?).

Now let’s get to the technical part. The application has been written in Apple’s
brand new programming language: Swift, and we started developing it since it was
in beta version, so that means that parts of our codebase had to be rewritten to
adopt the incoming changes from any new released beta.

That has been my first hands-on time with Swift and I have to say that I like it.
Although I used to like it more in one of the betas than in the current first release.
The idea of writing

if var v = some_method() {

}

or

if some_var? {

}

rather than:

var v = some_method()
if v != nil {

}

was, IMO, really a cleaner way of doing nil variable testing, but, for some reason
Apple decided to remove that feature.

Apart from that detail, I think that the language has some ‘must haves’ and cool
features such as closures support, tuples, multiple return values, variable
type inference and functional programming patterns. It also has some which are
not so cool like optionals, dodgy casting errors, different parameter names from
within or outside functions, and the fact that it is still very young and things
are likely to change in the short/mid term.

Finished giving my first impressions on Swift, let’s now move on to the so called
ios awesome app and particularly to the most technically interesting part of
it that is how we managed to be able to record accelerometer data for hours
without running out of memory, because, as you have guessed, our first approach
was to store all the accelerometer observations in an array structure and then,
when finished recording, dump it for gzipping and submitting to the cloud storage.

That ‘all in memory/brute force’ approach worked not bad for a while, given that
we were collecting data at 1Hz frequency, but when we required to start recording
data at higher frequencies (30 and 60 Hz), problems soon appeared.

After spotting the cause of the issue, we decided to create a custom NSOperation
meant to run outside the main NSOperationQueue that, from time to time simply
dumps the contents of the array holding the accelerometer data to a file in disk
through a NSOutputStream and that worked fine except for the fact that after some
time using the app we realised that the last batch wasn’t fully dumped
(failed to wait for the dumping queue to finish before reading for gzipping).

Once solved, the code looks more or less like this:

func addDataRow(...) {
  data.append(...)

  if data.count >= 300 {
    let toBeDump = data
    data = [Row]()
    dumpArray(toBeDump)
  }
}

This piece of code simply appends data to an array and when it’s reaches a
size, copies it to another variable and clears it to avoid it becoming too large.

func dumpArray(data: [Row]!) {
  let op = CSVDumpOperation(file: filePath, data: data)
  if lastOp != nil && !lastOp!.finished {
    op.addDependency(lastOp)
  }
  lastOp = op
  dumpQueue.addOperation(op)
}

The copy is then given to a custom NSOperation to be dump to disk outside the
main operation queue. Those operations are executed sequentially to avoid data
being disordered.

The dump operation looks like this:

class CSVDumpOperation: NSOperation {

  let data = [Row]()
  let os : NSOutputStream

  init(file: String, data: [Row]) {
    os = NSOutputStream(toFileAtPath: file, append: true)
    os.open()

    super.init()

    self.data = data
  }

  override func main() {
    for row in data {
      let rowStr = "(row.x),(row.y),(row.z)...n"
      if let rowData = rowStr.dataUsingEncoding(NSUTF8StringEncoding, allowLossyConversion: false) {
        let bytes = UnsafePointer<UInt8>(rowData.bytes)
        os.write(bytes, maxLength: rowData.length)
      }
    }

    os.close()
  }
}

This CSVDumpOperation simply opens a NSOutputStream to the file and writes there
the csv formatted contents of the given array.

And that’s it!, with this simple approach for this simple application we intend
to collect hundreds of hours of different activities for further analysis.