Advent of PowerShell 2018, pt II

Advent of Code is a fun christmas serial consisting of 50 small programming challenges. Let's solve them using PowerShell! In this post we'll cover the 2 challenges from December 2nd

Day 2

Advent of Code has been a delightful christmas pastime of mine for the last two years. According to the website:

Advent of Code is an Advent calendar of small programming puzzles for a variety of skill sets and skill levels that can be solved in any programming language you like. People use them as a speed contest, interview prep, company training, university coursework, practice problems, or to challenge each other.

You don't need a computer science background to participate - just a little programming knowledge and some problem solving skills will get you pretty far. Nor do you need a fancy computer; every problem has a solution that completes in at most 15 seconds on ten-year-old hardware.

This year I haven't really been following along with the daily challenges, so I thought this would be a nice opportunity to solve them all and invite you along - it might just turn out to be a nice excuse to discuss and showcase some aspects of my favourite language - PowerShell!

Without further ado, let's proceed to sign up for Advent of Code over at https://adventofcode.com (you can do so using your GitHub profile!), and then check out the second day of christmas challenges!

Challenge 1

For the first challenge we must count the occurrences of individual characters in a string - then note down the number of strings that have exactly 2 and 3 occurrences of any one character, respectively. We then multiply these two factors to get the corrrect answer. Sounds easy, doesn't it?

As you may have guessed, the test input for December 2nd is a list of strings, representing the ID label on a large number of warehouse boxes.

We're given the following set of sample input strings and how to count them for the final tally as an example:

  • abcdef contains no letters that appear exactly two or three times.
  • bababc contains two a and three b, so it counts for both.
  • abbcde contains two b, but no letter appears exactly three times.
  • abcccd contains three c, but no letter appears exactly two times.
  • aabcdd contains two a and two d, but it only counts once.
  • abcdee contains two e.
  • ababab contains three a and three b, but it only counts once.

Of these box IDs, four of them contain a letter which appears exactly twice, and three of them contain a letter which appears exactly three times. Multiplying these together produces a checksum of 4 * 3 = 12.

Let's break it down!

We can break down the required solution into the following steps:

  1. Count strings with at least one character appearing exactly twice
  2. Count strings with at least one character appearing exactly thrice
  3. Multiply the two counts

The last step is easy - we'll just use the * operator - and the first two steps are almost identical apart from the exact count of distinct characters varying - so let's reformulate the spec of our function f(s,n) as the question:

  • "Does string s contain any character exactly n times?"

Let's try and write that function! First off, we'll need an easy way of counting all distinct characters in a string. Luckily, Group-Object can help us do this:

PS C:\> 'bababc'.ToCharArray() |Group-Object

Count Name                      Group                                                                         
----- ----                      -----                                                                         
    3 b                         {b, b, b}                                                                     
    2 a                         {a, a}                                                                        
    1 c                         {c}                                                                           

With this single pipeline we now already have all the information available to answer the question - this is a great starting point!

Group-Object can sometimes become slow when you pipe large collections to it - but usually the bottleneck boils down to it having to track the individual element values in separate buckets - just add the -NoElement switch parameter to Group-Object and it'll discard the original elements and just keep track of their count!

PS C:\> 'bababc'.ToCharArray() |Group-Object -NoElement

Count Name                     
----- ----                     
    3 b                        
    2 a                        
    1 c                        

Notice how all the information we need is still there, and now it'll be fast too!

Alright, so all we need to do is test for whether the resulting collection has at least one item with the Count property equal to exactly n and we've got ourselves a partial solution! Let's do it like this:

$TestCharExactCount = {
  param([string]$s)

  return @($s.ToCharArray() |Group-Object -NoElement).Where({$_.Count -eq $n}).Count -gt 0
}

Now, you might be wondering about a couple of things in the snippet above - why am I assigning the scriptblock to a variable rather than defining it as a function? And why is there only one parameter defined? I clearly defined our intended function signature above as f(s,n) - where did n go...? Wait, $n is in the last Where() clause, but where would it get its value from!?

If the confusion is total, I apologize, but this is merely a set up - in order for me to use this as a segue into proving that PowerShell is a functional language!

Going functional!

That's right, we're going to use a few functional programming tricks to turn our original f(s,n) function into a two-step deal - first step will generate a function with a baked in n value, and the second step will execute said generated function against our input.

This is useful in situations where the variability of two or more arguments are vastly different in frequency. In our example the n argument will have the same value 50% of time whereas the s argument will change all the time.

We need closure...

In order to statically bind a named value (such as $n) in the local scope of our new $TestCharExactCount scriptblock, we need to - in the finest functional programming lingo imaginable - close over the value we want to capture, thereby turning our scriptblock into a closure. Let's try it out!

# Prepare our multi-argument function
$TestCharExactCount = {
  param([string]$s)

  return @($s.ToCharArray() |Group-Object -NoElement).Where({$_.Count -eq $n}).Count -gt 0
}

# Create a new closure over $n = 2
$n = 2
$TestExactly2 = $TestCharExactCount.GetNewClosure()

# Create a new closure over $n = 3
$n = 3
$TestExactly3 = $TestCharExactCount.GetNewClosure()

Cool, so... what does that mean in practice?! It means that we now have two distinct versions of $TestCharExactCount that will behave slightly differently - let's try them on our sample input to see what it means:

PS C:\> $StringWith2 = 'aabcde'
PS C:\> &$TestExactly2 $StringWith2
True
PS C:\> &$TestExactly3 $StringWith2
False
PS C:\> $StringWith2and3 = 'bababc'
PS C:\> &$TestExactly2 $StringWith2and3
True
PS C:\> &$TestExactly3 $StringWith2and3
True

Pretty nifty, huh?

A functional programming purist might argue that in order for us to engage in this functional programming exercise properly, we'll need to abstract the previous step away in a separate currying function, like so:

function New-TestCharExactCountClosure
{
  param([int]$n = 1)

  return {
    param([string]$s)

    return @($s.ToCharArray() |Group-Object -NoElement).Where({$_.Count -eq $n}).Count -gt 0
  }.GetNewClosure()
}

and use it like so:

PS C:\> $TestExactly3 = New-TestCharExactCountClosure -n 2

... but for the purposes of this exercise I think our approach will do - never let gatekeeping elitists bring you down, your code is beautiful and so and are you!

Put it all together!

Alright, where were we? Right, the solution to our first challenge of the day! With our closures from before, we end up with a solution that looks something like this:

The only thing I changed was the input argument $s to $_ so we can easily pass off the closures to the .Where() extension method, and there we go - challenge completed!

Challenge 2

As with the previous day, once we submit the output from our solution to the first challenge, a second challenge based on the same test input is revealed!

This time, we need to find two ID strings, differing in characters in exactly one place - that is abcdef and abcdeg would be a match but abcdef and bacdef wouldn't!

How might we go about that - in an efficient manner that is?!

Strings and distance

A few years ago, in my professional life I was leading a small team of security engineers - you can think of them like ordinary system engineers, sysadmins and developers except, they were being paid specifically to try and solve hard problems related to IT Security.

One of the problems we (and everyone else) had was with reuse of bad passwords - and lack of sensible and user-friendly tools to encourage better password creation. At the same time a young mathematician joined my team, and I thought I could tickle his curiosity with this problem, so I asked him the following:

"If you were to design an authentication service in which no passwords were stored in cleartext or using reversible encryption but the system could internally still qualify the likeness of a new password value to the existing one - what would that look like?"

For the next week or so, my colleague descended into a deep abyss of Jaccard indices, the probabilities of Levensteihn approximations and the efficiency of sphere-packing bounded string permutations. I didn't understand half of it, he was eventually assigned to another project and my plan was foiled completely.

So why am I telling you this? As they say in my native tongue: "intet er så skidt, at det ikke er godt for noget" - every cloud has a silver lining. The only positive part of the anecdote is that when I saw this problem, I knew exactly which string equality characteristic I had to google for - the Hamming distance!

The Hamming distance between two strings (of equal length) is simply the number of positions or character indices at which the two string differ - in other words, at how many positions must I update the characters in one string to have it be the same as the other one? With that in mind we can re-phrase the challenge as:

  • Which two strings have a Hamming distance of exactly 1?

Alright, let's start by devising a simple function to calculate the Hamming distance:

function Get-HammingDistance
{
  param(
    [string]$a,
    [string]$b
  )

  $distance = 0
  for($idx = 0; $idx -lt $a.Length; $idx++){
    if($a[$idx] -ne $b[$idx]){
      # Increment distance by 1 every time the strings differ
      $distance++
    }
  }

  return $distance
}

In most other reference implementations of Hamming distance calculations, you'll likely see a test for length equality of the strings at the top of the function, like this:

if($a.Length -ne $b.Length){
  throw 'Expected two string arguments of equal length'
}

... but we'll trust the good people at Advent of Code to only send us well-formed input strings of equal length for this specific challenge (Santa is on vacation by now, time to be naughty!)

Quadratic comparison schemes 101

Now to the actual search of strings that satisfy our criteria. Each string needs to be compared to every other string in the list, so a naive nested loop implementation might look something like this:

param(
  [string[]]$IDs
)

for($i = 0; $i -lt $IDs.Count; $i++){
  for($j = 0; $j -lt $IDs.Count; $j++){
    $HammingDistance = Get-HammingDistance -a $IDs[$i] -b $IDs[$j]
    if($HammingDistance -eq 1){
      $SimilarIDs = $IDs[$i,$j]
    }
  }
}

So far so good! It has some issues though...

You may have noticed that the arguments passed to our Get-HammingDistance function are commutative - we can swap two non-equal argument values and they'll yield the same result. In other words Get-HammingDistance -a "a" -b "b" is the exact same as Get-HammingDistance -a "b" -b "a", the order doesn't matter.

If you've done this sort of commutative comparison of an array against itself in the past, you'll probably already know that the approach above is unnecessarily computation-heavy - because we end up effectively doing at least twice the number of comparisons actually needed.

When $i reaches 1, the inner for loop starts over from $j = 0 again, and the next comparison will already be redundant - passing 1 and 0 to a commutative function yields the same result as passing 0 and 1.

So instead of starting with $j = 0, we might as well start at $j = $i - any previous values will have already been compared using the $i index. In fact, we may as well start at $j = $i + 1 - since we already know that the hamming distance of a string to itself is 0, and therefore irrelevant to our test. So applying what we just learned should result in something like this instead:

param(
  [string[]]$IDs
)

for($i = 0; $i -lt $IDs.Count - 1; $i++){
  for($j = $i + 1; $j -lt $IDs.Count; $j++){
    $HammingDistance = Get-HammingDistance -a $IDs[$i] -b $IDs[$j]
    if($HammingDistance -eq 1){
      $SimilarIDs = $IDs[$i,$j]
    }
  }
}

As you can see, we can also cut the last iteration out of the outer loop since we'll have already compared the string at index $i to the one at $i + 1 in the next-to-last iteration.

Don't forget to take breaks!

The last optimization we're going to apply is to simply break out of the nested loop once we find a match, since we already know from the problem description that there is exactly one such pair:

param(
  [string[]]$IDs
)

:IDSearch
for($i = 0; $i -lt $IDs.Count - 1; $i++){
  for($j = $i + 1; $j -lt $IDs.Count; $j++){
    $HammingDistance = Get-HammingDistance -a $IDs[$i] -b $IDs[$j]
    if($HammingDistance -eq 1){
      $SimilarIDs = $IDs[$i,$j]
      break IDSearch
    }
  }
}

Here we use the loop label IDSearch to break out of a loop further up the stack than the one we're currently executing in - another lovely but underappreciated syntactical feature of PowerShell!

Wrapping up!

For the last part of the puzzle we simply need to output the characters that are all equal in their respective places. Another simple for loop will do:

$s1,$s2 = $SimilarIDs
$sharedChars = for($charIdx = 0; $charIdx -lt $){
  if($s1[$charIdx] -eq $s2[$charIdx]){
    $s1[$charIdx]
  }
}

and $sharedChars will then hold our answer - sweet!

In the end, we end up with something like the following, and that's a wrap (geddit?) for day 2!