So You Think You Can Parse?

PowerShell treats input differently when operating on command arguments - but how can we programmatically force PowerShell to recognize and parse a string in argument mode? Let's find out!

Parsing modes in PowerShell

When specifying a string argument that we want to pass to a command in a PowerShell pipeline, we use a syntax that looks like this:

PS C:\> Get-Service -Name spooler

In the example above, we pass the string spooler to the Name parameter, resulting in Get-Service returning the service controller object corresponding to the Print Spooler service.

If you're familiar with interactive use of PowerShell, that explanation should feel almost insulting to you - of course -Name spooler means "treat spooler as a string"!

But what happens if we try to separate the argument value from the command itself, by storing it in a variable, or maybe a hashtable for splatting?

PS C:\> $Name = spooler
spooler : The term 'spooler' is not recognized as the name of a cmdlet, function, script file, or 
operable program. Check the spelling of the name, or if a path was included, verify that the path is 
correct and try again.
At line:1 char:9
+ $Name = spooler
+         ~~~~~~~
    + CategoryInfo          : ObjectNotFound: (spooler:String) [], CommandNotFoundException
    + FullyQualifiedErrorId : CommandNotFoundException

Oh no, what just happened?! Well, PowerShell (correctly) treated the bare word spooler, the only thing in the right-hand side expression of our assignment operation as a command name!

To have it do what we actually want, we need to qualify our string value with quotation marks:

PS C:\> $Name = 'spooler'
PS C:\> Get-Service -Name $Name

Yay, no more errors - but how come we didn't have to do that when we tacked the argument directly onto the pipeline in the first example? Hmmmm...

Introducing argument mode!

The simple reason is that the PowerShell parser - the component of the language engine that takes our raw code and turns it into something that the computer can analyze, make sense of and ultimately execute - discriminates your input based on context!

What that means is that PowerShell has one set of syntax rules that always applies, except for when a string follows the name of a command - in that case, the parser switches into something called argument mode - and all of a sudden the syntax rules change quite a bit.

From the about_Parsing help file that ships with the core module:

In argument mode, each value is treated as an expandable string unless it begins with one of the following special characters: dollar sign ($), at sign (@), single quotation mark ('), double quotation mark ("), or an opening parenthesis (().

emphasis added

This is dramatically different from the default syntax rules that apply, in which:

In expression mode, character string values must be contained in quotation marks. Numbers not enclosed in quotation marks are treated as numerical values (rather than as a series of characters).

The reason for this sort of contextual syntax discrimination of course has its merits. It would be super annoying having to qualify every single string argument with quotation marks when hacking away at your prompt - we want to get things done fast and smoothly, and most of the time, a simple string really is just what we want to pass along as an inline argument.

Notice how $, @, ', " and ( all automatically reverts the parsing mode, allowing us to mix parameter argument types with ease:

$Data |Select-Object -Property PropertyName,@{Name='MadeUpProperty';Expression={$_.SomeOtherProperty + "!"}}

In this example, we specify that string PropertyName as the first value in the list of arguments we pass to the Property parameter, and then a calculated property, the type of which ([hashtable]) will be preserved, since it starts with @ and expression mode kicks back in!

This is simply PowerShell trying to be user-friendly as it should be, great stuff.

To parse, but not execute

On Saturday, my friend Chris found himself in a bit of a pickle due to this exact behavior of the parser! He maintains a DSC resource for JEA configurations, and one of the core components in a JEA config, the Role Capability mapping, is usually structured as a mixed list of strings and hashtables.

When you deploy a JEA configuration manually this is not an issue, but since DSC resources need to be compiled a Managed Object Format definition, he needs a way of unpacking and validating the list before writing it to disk, without executing the session configuration registration.

His initial approach of naively splitting the input string by comma (,) works fine only as long as none of the hashtable value entries only contain a single item each:

PS C:\> 'string,@{key="value"}'.Split(',')
string
@{key="value"}

But as soon as a user supplies a hashtable with an array in a value entry, we run into a problem:

PS C:\> 'string,@{key="value1","value2"}'.Split(',')
string
@{key="value1"
"value2"}

Ouch, this is no good.

I noticed in the initial implementation that he was already trying to re-implement the argument mode parser partially, by detecting literals starting with @{:

$Parameters[$Parameter] = $Parameters[$Parameter].split(',') | Foreach-Object {
    if ($_ -match '@{') {
        Convert-StringToHashtable $_
    }
    else {
        $_
    }
}

I've done something similar in the past, and it has almost always come back to bite me - my naive half-assed parsers would never cover all the cases and have weird behaviors that would eventually break my tools.

I had some good news for Chris though - he doesn't have to write his own argument parser and neither do you! The PowerShell parser exposes a public API that we can use to programmatically parse a file or string of PowerShell code:

$Code = 'Verb-Noun -Param argument'
$AST  = [System.Management.Automation.Language.Parser]::ParseInput($Code,[ref]$null,[ref]$null)

The parser transform the code into an abstract syntax tree, hence the variable name $AST for the object returned from Parser.ParseInput().

An Abstract Syntax Tree is, as the "tree" part of the name suggests, a hierarchical representation of the syntactical elements contained in the input code. It isn't influenced by anything that has no bearing on the meaning of each syntactical element - parsing 1, 2, 3 and 1,2,3 will result in two completely identical ASTs for example, which is extremely useful when trying to analyze and interpret the actual contents of a piece of code (in a PowerShell AST, each element does hold a copy of the original code from which it was parsed, but that's not important for the purposes of what we're discussing here).

I was originally introduced to this API by Tobias Weltner at the European edition of PowerShell Summit 2014, and it kind of blew my mind that I would have such easy access to intermediary parser output directly from a file or string of code - and especially the subsequent realization that I'd never have to write naive fake parsers by hand anymore.

Faking an argument mode context

Now, the above Parser/AST stuff is obviously hella cool, but it doesn't really solve our problem, does it? If we try and parse an argument list with bare word strings (ie. strings with quotation marks), the parser still trips up:

PS C:\> $ParseErrors = @()
PS C:\> $InputString = 'string,@{key="value1","value2"},"another string"'
PS C:\> $AST = [System.Management.Automation.Language.Parser]::ParseInput($InputString,[ref]$null,[ref]$ParseErrors)
PS C:\> $ParseErrors

Extent ErrorId         Message                             IncompleteInput
------ -------         -------                             ---------------
,      MissingArgument Missing argument in parameter list.           False

So, how do we force the parser to step into argument mode? It's actually easier than you might think!


As explained previously, the parsing mode is context-dependent, so all we need to do is create some context around the it, by prepending a fake command name! The parser doesn't actually care or know about which commands exist and can be executed at runtime, it looks purely at the syntax of the code you provide, so we just need to supply something that looks like a command name, and the parser will go "if it walks like a duck, I'll shoot and pluck it for you!"

With that in mind, we might end up with something like this:

$ParseErrors = @()
$InputString = 'string,@{key="value1","value2"},"another string"'
$FakeCommand = "Totally-NotACmdlet -FakeParameter $InputString"
$AST = [System.Management.Automation.Language.Parser]::ParseInput($FakeCommand,[ref]$null,[ref]$ParseErrors)
if(-not $ParseErrors){
    # Use Ast.Find() to locate the CommandAst parsed from our fake command
    $CmdAst = $AST.Find({param($ChildAst) $ChildAst -is [System.Management.Automation.Language.CommandAst]},$false)
    # Grab the user-supplied arguments (index 0 is the command name, 1 is our fake parameter)
    $ArgumentAst = $CmdAst.CommandElements[2]
    if($ArgumentAst -is [System.Management.Automation.Language.ArrayLiteralAst]) {
        # Argument was a list
        # Inspect $ArgumentAst.Elements
    }
    else {
        # Argument was a scalar.
        # Test if it's a [StringConstantExpressionAst] or [HashtableAst], otherwise throw an error  
    }
}

The Ast.Find() method used above takes a predicate - a function that returns true or false based on the input - as it's first argument so we supply a script block that checks for a sub-AST of type [CommandAst], and a boolean indicating whether to search nested blocks which is unnecessary here, since our fake command is the top-most expression anyways.

My fellow Danish PowerShell-madman Axel even suggested a simple routine for extracting the array item values, and with both things in place Chris should be able to safely parse his Role Capability input without resorting to nasty things like piping untrusted data to Invoke-Expression, hooray!

In conclusion, the ability to offload parsing to the actual parser is golden, especially when you're writing code analysis tools or converting or serializing arbitrary input code.

I've written only a few such tools, but if you have, and you think the examples and use cases above stink, please don't hesitate to reach out and let me know why my code sucks and how much better it could have been :)

Until the next time!