The concept of DRY - "Don't Repeat Yourself", can apply not only to the code you write, but also to the information you collect at runtime. Let's have a look at a simple cache implementation using PowerShell
cache noun
\ ˈkash \
A cache is, according to English wiktionary.org:
A store of things that may be required in the future, which can be retrieved rapidly, protected or hidden in some way.
In programming, it often describes a facility that duplicates and stores information, usually at a layer closer to where it's needed (near your CPU for example) than where it originates (say, your file system).
Fetching data from the CPU cache is going to be orders of magnitude faster than fetching it from disk (or memory, since the file system likely also employs a cache), thus not wasting precious time. At the same time, capacity of the CPU cache is scarce compared to a modern disk, so the caching facility needs to be economic about what goes into the cache.
Use case: Active Directory queries
One interesting use case for caching in PowerShell is querying Active Directory for inter-related information. When querying AD, it might be desirable to:
- Search as infrequently as possible
- To avoid having to wait due to network latency
- Search for a limited result set
- To avoid excessive paging on the DC
- To avoid excessive memory consumption on the client
On Friday I had the pleasure of doing a presentation on this very topic at the inaugural edition of PSDay.UK, and while trying to illustrate the difference in performance, I wrote this example during a live demo:
$Agents = Get-ADUser -Filter "title -like '*agent*'" -Properties manager
foreach($Agent in $Agents){
[pscustomobject]@{
Agent = $Agent.Name
Manager $(Get-ADUser -Identity $Agent.manager).Name
}
}
The above example outputs the name of all users with "agent" in their title, along with the name of each user's manager. This is an overly simplified example but not too far from the kind of information you might expect from a real-life HR or Payroll department to ask for.
In my lab, I have roughly 25000 user accounts titled "Customer Service Agent". That means that beyond the initial query for these objects, I need to make another 25000 additional queries before I'm done. This will likely take minutes, dreadful.
We can do better, for sure.
Attempt #1: CACHE ALL THE THINGS
One approach we could take is simply pre-cache all possible managers:
$Agents = Get-ADUser -Filter "title -like '*agent*'" -Properties manager
# Save all users to a variable
$AllUsers = Get-ADUser -Filter *
foreach($Agent in $Agents){
[pscustomobject]@{
Agent = $Agent.Name
Manager = $AllUsers.Where({$_.DistinguishedName -eq $Agent.manager}).Name
}
}
Great, now we're down to just 2 queries, but we have a new problem - we need to spend many more CPU cycles on the client side. See, everytime we need to locate the corresponding manager in $AllUsers
, the .Where()
method will iterate over every single user object until it reaches the end.
We've created a nested loop that makes the script exponentially slower as the number of user accounts grow.
Attempt #2: Use a lookup table!
No need to despair. This is easily solved with a technique I've come to love for it's speed and simplicity: the hashtable: @{}
The performance characteristics of hashtables (and other dictionary types) are interesting, most noticeably that random access to the values is ultra fast. This makes hashtables suitable as a lookup table.
(if you're interested in the nature and how of hash tables, I encourage you to read through Vaidehi Joshi's basecs article on the topic)
Let's try this:
# Save all users to a variable
$AllUsers = Get-ADUser -Filter * -Properties title,manager
# Iterate over all the users once, add them to your lookup table
$LookupTable = $AllUsers |ForEach-Object -Begin { $t = @{} } -Process {
$t[$_.DistinguishedName] = $_
} -End { return $t }
foreach($Agent in $AllUsers.Where({$_.title -like *agent*})){
[pscustomobject]@{
Agent = $Agent.Name
Manager = $LookupTable[$Agent.manager].Name
}
}
The reason I'm using the DistinguishedName
attribute value for the key in our lookup table, is that the manager
attribute happens to contain just that, making it easy to reference the corresponding user later.
Since we're retrieving all users anyways, we may as well leave out the initial query for all agents.
It still has a problem. It requires us to retrieve all the users from the directory. This may be fine if the number of users you need to find constitutes a majority of the total number of users.
But what if the users we need are less than 50%? 25%? 10%? We're likely to run into having to wait excessively for results we have no need for, or congest the network if the DC is behind a slow link. Caching that many entries on the client might also start to incur some serious memory.
Attempt #2.1: Selective batch pre-caching
We could go in the opposite direction, and pre-cache very selectively.
Since we already know all possible manager
values as soon as we query for the agents, we could construct a search query that would filter only those user accounts.
For Active Directory specifically, the -Filter
attribute is just a front for an underlying LDAP filter. LDAP has a pretty straightforward syntax and support nested clauses and prefix notation for its operators, making it easy to generate the filter clauses, concatenate them and then sub into a grouped condition.
Let's take this boilerplate example for on-the-fly LDAP query filter generation:
$Conditions = @{
Title = 'Truck Driver'
ANR = @('John','Jack','Jill','Jane')
}
$LDAPClauseTemplate = '({0}={1})'
$LDAPAnyTemplate = '(|{0})'
$LDAPAllTemplate = '(&{0})'
$Clauses = $Conditions.Keys |ForEach-Object {
switch($_){
{$Conditions[$_] -is [array]} {
$LDAPAnyTemplate -f -join(
$Conditions[($key = $_)].ForEach({$LDAPClauseTemplate -f $key,$_})
)
}
default {
$LDAPClauseTemplate -f $_,$Conditions[$_]
}
}
}
$LDAPAllTemplate -f -join($Clauses)
This will result in a filter like this:
(&(|(ANR=John)(ANR=Jack)(ANR=Jill)(ANR=Jane))(Title=Truck Driver))
matching all "Truck Driver"s for which Ambiguous Name Resolution resolves any of the names specified - pretty nifty!
Looks overly complicated? Don't worry, our example is much more simple, we just need a single OR'd condition:
# Save all users to a variable
$Agents = Get-ADUser -Filter "title -like '*agent*'" -Properties manager
# Generate filter in the form:
# (|(distinguishedName=DN1)(distinguishedName=DN2)..(distinguishedName=DNn))
$LDAPFilter = '(|{0})' -f -join(
@($Agents.manager|Sort-Object -Unique).ForEach({"(distinguishedName=$_)"})
)
# Iterate over all the known users, add the relevant property to your lookup table
$LookupTable = Get-ADUser -LDAPFilter $LDAPFilter |ForEach-Object -Begin { $t = @{} } -Process {
$t[$_.DistinguishedName] = $_.Name
} -End { return $t }
foreach($Agent in $Agents){
[pscustomobject]@{
Agent = $Agent.Name
Manager = $LookupTable[$Agent.manager]
}
}
In this example I've also removed storage of the entire ADUser
object in the lookup table, since I really have no need for it (only do this if you truly need just one property value, doing $_ |Select-Object ...
inside the loop is gonna be slow).
This is an approach I've taken a number of times in scripts where I'm touching a few thousand objects or less. It offloads filtering to the DC and it caches the result set efficiently on the client.
The LDAP Filter syntax also makes it trivial to add conditions to the search:
$LDAPFilter = '(&(|{0})(employeeType=fulltime))' -f -join(
@($Agents.manager|Sort-Object -Unique).ForEach({"(distinguishedName=$_)"})
)
High recommended for most environments!
Attempt #3: On-access caching
But what if we have 100.000 user objects in AD? Half a million? The LDAP filter we constructed in the previous example has a limitation of 10MB (~5 million unicode characters), and while that's not a likely bottleneck in most directories, you may still find the query timing out at this scale.
We can overcome this using one of the simplest caching techniques there is: cache items on access:
# Save all users to a variable
$Agents = Get-ADUser -Filter "title -like '*agent*'" -Properties manager
# Start out with an empty lookup table
$LookupTable = @{}
foreach($Agent in $Agents){
[pscustomobject]@{
Agent = $Agent.Name
Manager = if($LookupTable.ContainsKey($Agent.manager){
$LookupTable[$Agent.manager]
}
else{
($LookupTable[$Agent.manager] = $(Get-ADUser -Identity $Agent.manager -ErrorAction SilentlyContinue).Name)
}
}
}
Now, the first time a distinct manager
value is encountered, we check whether the lookup table contains a corresponding entry, at which point it fails, a cache miss. We then proceed to retrieve the user account, add it to the lookup table and then output it (by putting the assignment operation inside ()
).
The next time around, the lookup table will already contain a copy of the name, and we don't have to query the directory again.
In my lab environment, I went from ~80 seconds to just 10 seconds for the script to execute, hooray!
Conclusion
As we've seen, implementing caching in PowerShell can be achieved without over-complicating our code, and when you go test this out, you may find pretty significant performance gains in terms of execution speed and network latency.
If there's one thing you take away from these examples, please let it be this:
Beyond using hash tables, these patterns can all be generalized for use in other situations. One could use a boilerplate class to wrap the gritty bits, but that's for another blog post...