Hamming & Refactoring

Exercism.io is an exercise-based tool designed to help improve programming skills. It provides a command-line-interface allowing you to fetch challenges, solve them with a test-driven approach (using MiniTest in Ruby in my case), and upload your code to Exercism for code-review. I started using it today.

I completed the second challenge this morning which is to calculate the total Hamming distance between two strands of DNA. For those unfamiliar with information theory…

Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different. In another way, it measures the minimum number of substitutions required to change one string into the other, or the minimum number of errors that could have transformed one string into the other.

Alphabetic strings seemed to be the best representation of DNA strands in my mind and I quickly wanted to get something working with intentions to follow it up with a refactor.

My first iteration:

class Hamming
  VERSION = 1

  def self.compute(strand1, strand2)
    diff, count = 0, 0
    strand1, strand2 = strand1.split(''), strand2.split('')
    raise ArgumentError if strand1.length != strand2.length
    strand1.each do |x|
      diff += 1 if !(strand1[count].eql?(strand2[count]))
      count += 1
    end
    diff
  end
end

Please note that specifying VERSION is only part of the exercise & required by Exercism in a test.

Once it passed all tests, it occurred to me that I could simplify the array conversion and streamline by using Ruby’s zip method on the strands:

class Hamming
  VERSION = 1

  def self.compute(strand1, strand2)
    strand1, strand2 = strand1.chars, strand2.chars
    raise ArgumentError if strand1.length != strand2.length
    strand1.zip(strand2)
           .map { |x,y| x == y  }
           .count(false)
  end
end

In that example I called .zip on my first strand while passing in the second as my argument. This result was an array of nested 1:1 pairs since the strand lengths were equal. So, with two arrays you can do the following:

strand1 = ['F','G','Y','Y','E','Y']
strand2 = ['X','O','Y','G','J','E']

strand1.zip(strand2) # => [["F", "X"], ["G", "O"], ["Y", "Y"], ["Y", "G"], ["E", "J"], ["Y", "E"]]

After additional review, I simplified even further by eliminating Enumerable’s map method entirely to leverage count with a block argument to calculate the difference:

class Hamming
  VERSION = 1

  def self.compute(strand1, strand2)
    s1, s2 = strand1.chars, strand2.chars
    raise ArgumentError unless s1.length == s2.length
    s1.zip(s2).count { |x, y| x != y }
  end
end

I enjoy exploring different ways problems can be approached and simplification. If you have constructive feedback or an approach of your own, feel free to comment on my exercism submission.

Thank you for reading!