Wednesday, February 14, 2018

Group objects evenly by size in PowerShell

A PowerShell function to group objects evenly by size, using a simple packing algorithm. But with advanced pipeline techniques for handing nested arrays and leveraging the PowerShell parser to protect against code injection. Just for fun.

The initial challenge

Brett Miller and Bob Frankly recently posed the hypothetical question on the PowerShell Slack channel, how can you easily split a collection of objects into 4 groups, based on the size of each object, such that the total size of each group of objects is approximately the same?

I threw together a quick handful of lines of PowerShell code to meet the requirements. And then I started to work on a generic function to solve the problem generally.

And that’s when it got interesting.

The added complexity

I wanted the function to be able to take input either through a parameter or from a pipeline. Techniques for handling that are well known. But as the input could be nested arrays, the standard techniques would not work for both input methods. I had to find out how to determine when the function is part of a pipeline.

I wanted the user not only to be able to specify a property to use to determine an object’s size, but also to be able to specify a nested property. For example, if I have an Active Directory $Group, the size of the group is in $Group.Members.Count. But if I let the user give me a string I am going to execute, I needed a way to confirm the string contains nothing but nested property names, and not a code injection.

I wanted the user to also be able to group objects by simple object count. I realized that this would be default behavior of the functions when it is used with non-collection objects simply by making the default -Property value “Count”, with no extra coding needed.

The code

Let’s define the function.

function Group-Evenly
    {

A good comment-based help block is good. I do this for functions that I’m sharing, or that I’m likely to reuse in other scripts, or when I want to thoroughly document it for myself or the poor soul that has to maintain my code after I’ve moved on.

     <#
    .SYNOPSIS
        Evenly divides input objects into a given number of groups
        optionally weighted by the value of a given property.

    .DESCRIPTION
        Creates specified number of groups (arrays)
        Input object are sorted by value of the specified Property, descending
            (If no property is specified, .Count is used)
        Each object is placed in the group with the smallest totale value of the specified Property

        This algorithm may not always produce an optimal result, but does
        produce a reasonable result quickly compared to the brute force
        required to guarantee an optimal result.

    .OUTPUT
        [array[]]

    .PARAMETER InputObject
        Objects to be grouped
        Accepts pipeline input
        Unlike most commands, accepts Null pipeline input

    .PARAMETER Property
        String - Property to use to determine object size for weighted grouping
        Accepts nested property names, e.g. - Members.Count
        Default to "Count"

    .PARAMETER Number
        Int32 - Number of groups to create
        Defaults to 2

    .EXAMPLE
        $Users = Get-ADUser -Filter *
        $Teams = Group-Evenly -InputObject $Users

        Results in two arrays, each with half of the users.

    .EXAMPLE
        $DataChunks = Get-ChildItem C:\Temp -File |
            Group-Evenly -Property Length -Number 4

        Results in four arrays of files, grouped such that the total file sizes
        of the groups are approximately equal.

    .EXAMPLE
        $Meetings = Get-ADGroup -Filter { Name -like "Dept*" } -Properties Members |
            Group-Evenly -Property Members.Count -Number 6

        Results in six arrays of AD department groups, grouped such that the total
        membership of the grouping are approximately equal

    .EXAMPLE
        $Whatever = Get-ChildItem C:\Temp -File |
            GroupEvenly -Property Directory.Parent.FullName.Length

        Results in two arrays of files, grouped evenly but weighted by the length
        of the full path of the parent of the file's directory. That is, of course,
        completely useless, but I didn't feel like taking the time to come up with
        a better example of using a deeply nested property value.

    .NOTES
        v 1.0 Tim Curwick Created
    #>

[cmdletbinding()] tells PowerShell to automatically do various advanced function things and better parameter handling than otherwise. In PowerShell 4.0 and up, [cmdletbinding()] is not needed if [parameter()] is used, but it doesn’t hurt to add it, and it’s good practice so I don’t forget it in those scripts where I do need it.

    [cmdletbinding()]
    Param (

Parameter $InputObject will hold the objects to group. It will be an array of any type of object that needs to be grouped. We want to be able to take objects from the pipeline.

We are not making it mandatory, because I like the behavior of returning empty groups instead of nothing if $InputObject is empty.

        [parameter( ValueFromPipeline = $True )]
        [array]$InputObject,

Parameter $Property is the string describing the path to the property or nested property to use to determine the size of the objects.

By defaulting to ‘Count’, arrays are automatically grouped according to the number of elements they have, and objects that are not collections are simply split into groups with equal numbers of objects.

        [string]$Property = 'Count',

Parameter $Number is the number of groups to divide the objects into.

        [int]$Number = 2 )

Because we want to act on pipeline objects, but not one at a time, we have to gather them up before starting to work on them. We’ll use a Begin block to create an array to hold the objects, a Process block to add objects to the array as they come in the pipeline or from the parameter, and then do all of the actual work in the End block.

    Begin
        {
        # Initialize array
        $RawItems = @()
        }

Typically, in a Process block such as this, we would simply add any incoming objects to the array.

But atypically here, the individual objects may themselves be arrays. PowerShell’s special handling of arrays requires us to do some special handling to get our desired behavior.

If we get an array from the -InputObject parameter, the element of the array are the objects we want to sort, and what we want to add to $RawItems.

But if we get an array from the pipeline, that means it was itself an element nested within a parent array. In that case, we don’t want to add each element of the array to $RawItems. We want to add the entire array as a single element in $RawItems.

To distinguish between the two, we need to be able to tell the when the function is running in a pipeline. Thanks to √ėyvind Kallstad for his blog Quick tip: Determine if input comes from the pipeline or not with the answer.

$PSCmdlet.MyInvocation.ExpectingInput is $True if we’re in a pipepline.

    Process
        {
        # If input is from pipeline
        # Treat an array as a single input item
        If ( $PSCmdlet.MyInvocation.ExpectingInput )
            {

If we are in a pipeline, we want the array to be added as a single element to $RawItems. To do that, we use a unary comma to indicate that it is an element. ,@($x) results in what this looks like: @( @( $x ) ). But @( @( $x ) ) results in @( $x ) by design, so we have to resort to the unary comma.

            $RawItems += ,$InputObject
            }

If we are not in a pipepline, we want to add all of the element of the $InputObject to $RawItems, which is the normal PowerShell behavior when “adding” two arrays.

        # Else (input is from paramter)
        # Treat an array as a collection of input items
        Else
            {
            $RawItems += $InputObject
            }
        }

Once we have collected all of the objects from the pipeline, the End block runs, and we can actually do some work.

    End
        {

First we create a string which, when executed, will get the size of the object based on the $Property string.

        ## Test for code injection

        # Build property string
        $SizeString = "`$_.$Property"

Then we need to check it to confirm that it really will do nothing other than reference a property or nested property. A scripter might take input from an untrusted source and use it to populate the -Property value, and end up with Group-Evenly -Property 'x;Remove-Item C:\*.* -Recurse' which would be a problem if we didn’t do this check.

To run this or any code, the PowerShell parser first needs to cut it up into identified tokens. We can leverage that capability, and ask PowerShell to do so now, and identify the tokens for us to parse.

        # Use PowerShell parser to tokensize the property string
        $TokenErrors = [System.Collections.ObjectModel.Collection[System.Management.Automation.PSParseError]]@()
        $Tokens = [System.Management.Automation.PSParser]::Tokenize( $SizeBlock, [ref]$TokenErrors )

If there were no errors during tokenizing, we set a validity flag to the $True and continue. If there were errors, the scriptblock is not going to run properly, and we set the flag to $False.

        # If there are errors, it won't work anyway; set to invalid
        $PropertyValid = $TokenErrors.Count -eq 0

Then we examine the tokens. The tokens would look like this if $Property = 'Members.Count'.

Content     Type Start Length StartLine StartColumn EndLine EndColumn
-------     ---- ----- ------ --------- ----------- ------- ---------
_       Variable     0      2         1           1       1         3
.       Operator     2      1         1           3       1         4
Members   Member     3      7         1           4       1        11
.       Operator    10      1         1          11       1        12
Count     Member    11      5         1          12       1        17
...      NewLine    16      2         1          17       2         1

Or like this if $Property = 'x;Remove-Item C:\*.* -Recurse '

Content                   Type Start Length StartLine StartColumn EndLine EndColumn
-------                   ---- ----- ------ --------- ----------- ------- ---------
_                     Variable     0      2         1           1       1         3
.                     Operator     2      1         1           3       1         4
x                       Member     3      1         1           4       1         5
;           StatementSeparator     4      1         1           5       1         6
Remove-Item            Command     5     11         1           6       1        17
C:\*.*         CommandArgument    17      6         1          18       1        24
-Recurse      CommandParameter    24      8         1          25       1        33
...                    NewLine    33      2         1          34       2         1

The $ in $_ is simply an indicator that a variable name follows, and is not included in any of the tokens.

In our script, the first token is always Type “Variable” and Content “_”, and the second token is always Type “Operator” and Content “.”, because we hardcoded $_. at beginning of the scriptblock. So we can ignore those.

In valid code (by our definition), all of the remaining tokens are of either Type “Operator”, “Member”, or “NewLine”. So if any of the tokens are of any other Type, we set the $PropertyValid flag to $False.

The only Operator token we need has the Content “.”. If any other Operators are present, we set the flag to $False.

            # If there are any tokens after the $_ other than .PropertyName.PropertyName.etc
            # (Bad -Property value (or code injection))
            # Set to invalid
            $Tokens[2..($Tokens.Count-1)].
                Where{
                    $_.Type -notin 'Operator', 'Member', 'NewLine' -or
                    ( $_.Type -eq 'Operator' -and $_.Content -ne '.' ) }.
                ForEach{ $PropertyValid = $False }

Otherwise, we are safe to proceed. (You might be concerned that a Member token can be a method rather than a property, but if it was, there would be associated GroupStart and GroupEnd tokens holding the parentheses following the method name, which are not allowed. If there were an extra NewLine token in there, it would have to be followed by something we aren’t allowing in order to be dangerous. Even if a NewLine were followed by a dot Operator that is intended as a call operator rather than as a member operator, the call operator would only be dangerous if it were followed by a String, Variable, or some other type of token that we are not allowing.)

        # If property string is valid
        # continue
        If ( $PropertyValid )
            {

We create an array of the correct number of arrays to hold groups that we will group the input objects into. The simplest way to do this is to create an array with a single empty array as an element, using the unary comma discussed earlier. Then we “multiply” the array by the number of groups we want, which in PowerShell means make X additional elements in the array that are copies of the existing element(s).

            # Initialize array with the desired number of groups
            $Groups = ,@() * $Number

We will want to quickly check the sizes of each group as we go along. Rather than re-measure the groups each time, we’ll store the running totals in an integer array. The index of a size in this array will match the index of the group in the group array that it measures.

Again, the simplest (or prettiest) way to do this is to create an array with a single zero, and then multiply it to get the correct number of zeros.

            # Initialize array to hold group sizes
            $Sizes  = @(0* $Number

We will be frequently referencing the last index of the arrays. Rather than repeating the calculation, we do it once and store it in a variable. It’s faster, and it makes the code prettier.

            # Get highest index number
            $TopIndex = $Number - 1

Then we convert the $SizeString to a [ScriptBlock] for later execution. We didn’t do it until we were sure it was valid so that the conversion doesn’t throw an error. (We’ll throw our own error later if it isn’t valid.) We’re doing it now instead of in the following code so that it is only done once. It’s faster and makes the code prettier.

            # Convert size string to a scriptblock
            $SizeBlock = [ScriptBlock]::Create( $SizeString )

To simplify handling of the objects and their sizes, we’re wrapping them in custom objects along with their sizes, and then sort them by size, biggest ones first.

            # Create an array with the items and their calculated sizes
            # Sort by size descending
            $Items = $RawItems |
                Select-Object -Property @(
                    @{ Label = 'Value'; Expression = { $_ } }
                    @{ Label = 'Size' ; Expression = $SizeBlock } ) |
                Sort-Object -Property Size -Descending

Then we simply go through the items, biggest ones first, and put each in whatever group has the most room.

            # For each item (starting with the largest)
            # Place item in smallest group
            ForEach ( $Item in $Items )
                {

For each possible group index, we sort them by the sizes of the group, and take the first one (the index of the smallest group).

                # Find the index of the smallest group
                $Smallest = 0..$TopIndex | Sort-Object -Property { $Sizes[$_] } | Select-Object -First 1

Add the item to the smallest group.

                # Add the item to the smallest group
                $Groups[$Smallest] += $Item.Value

Add the size of the item to the running total size of the smallest group.

                # Add the size of the item to the group size
                $Sizes[ $Smallest] += $Item.Size
                }

Repeat until all of the items are placed.

Return the results.

            # Return the results
            return $Groups
            }

If the $Property string is invalid, we throw an error. We use Write-Error instead of keyword Throw to make it a non-terminating error whose behavior is dictated by $ErrorActionPreference or by using the common parameter -ErrorAction on our function. (We don’t have to define -ErrorAction; [cmdetbinding()] took care of that for us.)

        # Else (invalid Property value)
        # Throw error (respecting ErrorAction)
        Else
            {
            Write-Error -Message "Invalid Property value."
            }
        }
    }


Usage

Now we can use our new function.

Using default parameters, and no pipeline, we can take all of our users and split them into two groups.

$Users = Get-ADUser -Filter *
$Teams = Group-Evenly -InputObject $Users

Or let’s say the file share where my user’s home drives are stored need to be split up onto four new drives. I don’t care which users go where, but I want the total spaced used on each share the be roughly equal.

$UserFolders = Get-ChildItem $SourceShare -Directory |
    ForEach-Object {
        [pscustomobject]@{
            FullName = $_.FullName
            FolderSize = Get-ChildItem $_.FullName -File -Recurse |
                Measure-Object -Property Length -Sum |
                Select-Object -ExpandProperty Sum } }

$NewShareGroups = $UserFolders |
    Group-Evenly -Property FolderSize -Number 4

HR needs to schedule 6 meetings to talk with all employees. They want all employees in a given department to go to the same meeting, and they want the meetings to be of roughly the same size. There are Active Directory groups that define who is in what department.

$Meetings = Get-ADGroup -Filter { Name -like "Dept*" } -Properties Members |
    Group-Evenly -Property Members.Count -Number 6

Just to test how well we handle nested properties, let’s group some files based on the length of the full name of the parent of the file’s directory. It’s kinda stupid, but I didn’t have the time to come up with a better example.

$Whatever = Get-ChildItem C:\Temp -File -Recurse |
    GroupEvenly -Property Directory.Parent.FullName.Length

Full function

function Group-Evenly
    {
    <#
    .SYNOPSIS
        Evenly divides input objects into a given number of groups
        optionally weighted by the value of a given property.

    .DESCRIPTION
        Creates specified number of groups (arrays)
        Input object are sorted by value of the specified Property, descending
            (If no property is specified, .Count is used)
        Each object is placed in the group with the smallest totale value of the specified Property

        This algorithm may not always produce an optimal result, but does
        produce a reasonable result quickly compared to the brute force
        required to guarantee an optimal result.

    .OUTPUT
        [array[]]

    .PARAMETER InputObject
        Objects to be grouped
        Accepts pipeline input
        Unlike most commands, accepts Null pipeline input

    .PARAMETER Property
        String - Property to use to determine object size for weighted grouping
        Accepts nested property names, e.g. - Members.Count
        Default to "Count"

    .PARAMETER Number
        Int32 - Number of groups to create
        Defaults to 2

    .EXAMPLE
        $Users = Get-ADUser -Filter *
        $Teams = Group-Evenly -InputObject $Users

        Results in two arrays, each with half of the users.

    .EXAMPLE
        $DataChunks = Get-ChildItem C:\Temp -File |
            Group-Evenly -Property Length -Number 4

        Results in four arrays of files, grouped such that the total file sizes
        of the groups are approximately equal.

    .EXAMPLE
        $Meetings = Get-ADGroup -Filter { Name -like "Dept*" } -Properties Members |
            Group-Evenly -Property Members.Count -Number 6

        Results in six arrays of AD department groups, grouped such that the total
        membership of the grouping are approximately equal

    .EXAMPLE
        $Whatever = Get-ChildItem C:\Temp -File |
            GroupEvenly -Property Directory.Parent.FullName.Length

        Results in two arrays of files, grouped evenly but weighted by the length
        of the full path of the parent of the file's directory. That is, of course,
        completely useless, but I didn't feel like taking the time to come up with
        a better example of using a deeply nested property value.

    .NOTES
        v 1.0 Tim Curwick Created
    #>

    [cmdletbinding()]
    Param (
        [parameter( ValueFromPipeline = $True )]
        [array]$InputObject,
        [string]$Property = 'Count',
        [int]$Number = 2 )

    Begin
        {
        # Initialize array
        $RawItems = @()
        }
    Process
        {
        # If input is from pipeline
        # Treat an array as a single input item
        If ( $PSCmdlet.MyInvocation.ExpectingInput )
            {
            $RawItems += ,$InputObject
            }

        # Else (input is from paramter)
        # Treat an array as a collection of input items
        Else
            {
            $RawItems += $InputObject
            }
        }
    End
        {
        ## Test for code injection

        # Build property string
        $SizeString = "`$_.$Property"

        # Use PowerShell parser to tokensize the property string
        $TokenErrors = [System.Collections.ObjectModel.Collection[System.Management.Automation.PSParseError]]@()
        $Tokens = [System.Management.Automation.PSParser]::Tokenize( $SizeString, [ref]$TokenErrors )

        # If there are errors, it won't work anyway; set to invalid
        $PropertyValid = $TokenErrors.Count -eq 0

        # If there are any tokens after the $_ other than .PropertyName.PropertyName.etc
        # (Bad -Property value (or code injection))
        # Set to invalid
        $Tokens[2..($Tokens.Count-1)].
            Where{
                $_.Type -notin 'Operator', 'Member', 'NewLine' -or
                ( $_.Type -eq 'Operator' -and $_.Content -ne '.' ) }.
            ForEach{ $PropertyValid = $False }
          
        # If property string is valid
        # continue
        If ( $PropertyValid )
            {
            # Initialize array with the desired number of groups
            $Groups = ,@() * $Number

            # Initialize array to hold group sizes
            $Sizes  = @(0* $Number

            # Get highest index number
            $TopIndex = $Number - 1

            # Convert size string to a scriptblock
            $SizeBlock = [ScriptBlock]::Create( $SizeString )

            # Create an array with the items and their calculated sizes
            # Sort by size descending
            $Items = $RawItems |
                Select-Object -Property @(
                    @{ Label = 'Value'; Expression = { $_ } }
                    @{ Label = 'Size' ; Expression = $SizeBlock } ) |
                Sort-Object -Property Size -Descending
      
            # For each item (starting with the largest)
            # Place item in smallest group
            ForEach ( $Item in $Items )
                {
                # Find the index of the smallest group
                $Smallest = 0..$TopIndex | Sort-Object -Property { $Sizes[$_] } | Select-Object -First 1

                # Add the item to the smallest group
                $Groups[$Smallest] += $Item.Value

                # Add the size of the item to the group size
                $Sizes[ $Smallest] += $Item.Size
                }
      
            # Return the results
            return $Groups
            }
      
        # Else (invalid Property value)
        # Throw error (respecting ErrorAction)
        Else
            {
            Write-Error -Message "Invalid Property value."
            }
        }
    }

Monday, February 12, 2018

Alternatives to Arrays for enhanced performance in PowerShell

PowerShell enhances Arrays with advanced functionality. The tradeoff is performance. For most scripts, this is acceptable. But sometimes unintended side effects of Array enhancement or the need for extreme speed require alternatives.

Array enhancement in PowerShell

PowerShell is a .Net language. When a script runs, the PowerShell engine compiles it on the fly and feeds it to the .Net engine for execution. But the PowerShell team wanted a way to enhance various aspects of .Net, so the PowerShell engine doesn’t just handle the compiling and interaction with .Net. It also adds a layer of functional enhancements that allow us to simply do things in PowerShell that can’t be done simply in .Net.

One of the most useful enhancements, which has been part of PowerShell from the beginning, is Array handling. Many basic PowerShell techniques, such as working with Where-Object, Select-Object, etc., are built around Arrays. So we need enhancements to make it easy to work with the Arrays we use to work with everything else.

Adding elements to an Array

The most notable Array enhancement, and the one that ultimately leads to the issues discussed in this article, is the ability to add new elements to an Array.

$Names1 = @( 'Tim', 'Joe' )
$Names1 += 'Hannah'

You can’t really add new elements to Arrays, as they are defined in .Net as being of fixed size. When we added string 'Hannah' to Array $Names1, above, PowerShell created an entirely new Array with room for 3 elements, copied 'Tim' and 'Joe' from the old Array to the new Array, set the third element’s value to 'Hannah', and then repointed the variable $Names1 to the new Array. The old Array was abandoned in memory.

Most of the time, this is great. It makes thing much easier. For typical scripts, the performance tradeoff is barely measureable. In the above example, it takes only 20 microseconds to add the new element.

Problem 1 - performance at scale

What works well for a small number of things doesn’t always work well at large scales.

If adding 'Hannah' only takes 20 microseconds, you might thing that adding additional Array elements would only take additional 20-microsecond blocks.

But if we add 'Mike', PowerShell has to copy 3 elements to the new Array before adding 'Mike'. And if we then add 'Derek', it has to copy 4 elements to the new Array, etc. And now we have the original Array with two elements, the next version with 3 elements, and another version with 4 elements, etc., all sitting in memory unused. If there is memory pressure, the .Net engine will look around and find and delete the abandoned Arrays, but that takes time and yet more CPU effort.

Each new element takes longer to add than the one before it. Eventually it becomes noticeable. Eventually it becomes unacceptably slow. Eventually memory handling becomes an issue and the slowing accelerates.

It takes about 1 second to add 1,000 elements to an Array one at a time. Not too bad for a script. And well within the Array size requirements for most scripts

Above 1,000 elements, it becomes unacceptable, and an alternative is needed.

Alternative to generic Array - ArrayList

The alternative for a generic Array is an ArrayList. An ArrayList is very similar to an Array. It is a collection of elements in a particular order.

Arrays and ArrayLists can be converted back and forth by simply casting them as the desired type. The easiest way to create a new ArrayList is to cast an empty Array as an ArrayList.

$ArrayList = [System.Collections.ArrayList]@()
$Array     = [Array]$ArrayList
$ArrayList = [System.Collections.ArrayList]$Array

ArrayLists do not have PowerShell enhancements, so they are not as simple to work with, which is why we don’t use them unless we need them. You cannot add elements to an ArrayList using the PlusEquals operator. This does not work:

# This does not work as expected
$Names2 = [System.Collections.ArrayList]@( 'Tim', 'Joe' )
$Names2 += 'Hannah'
# because PowerShell converts $Names to an [Array] to perform the addition

It appears to work (and it sort of does work), but only because PowerShell “helpfully” converts the ArrayList into an Array before doing the addition, so we didn’t gain anything.

To add an element to an ArrayList, we use its .Add() method. The existing elements are left alone, an additional element slot is added to the ArrayList, and the new element is assigned to it.

$Names3 = [System.Collections.ArrayList]@( 'Tim', 'Joe' )
$Names3.Add( 'Hannah' )

If we want to add multiple elements at one time, we can use the .AddRange() method. Another nice feature of ArrayLists is that we can use the .Remove() and .RemoveAt() methods to remove elements, something we can’t do with Arrays even with enhancements.

Here is what it looks like to use an ArrayList to optimize adding elements one at a time. If we are working against a list of tens of thousands of employees, this is much faster than using an Array. As Arrays are easier to work with, we convert the ArrayList to an Array at the end for any further handling.
$Invites = [System.Collections.ArrayList]@()

ForEach ( $Person in $Employees )
    {
    $Details = Get-PersonDetails $Person
        {
        If ( $Details.IsPersonILike )
            {
            $Invites.Add( $Person )
            }
        }
    }

$Invites = [Array]$Invites

Problem 2 - typed Arrays revert to generic Arrays

Another problem with the process PowerShell uses for adding elements to an array is that when PowerShell creates the new, slightly larger array, it does not respect what type of array we started with. It always creates the new array as a generic array.

Arrays are not just Arrays. All Arrays are a subclass of [Array]. The subclass defines what types of elements are in the array. When a particular subclass isn’t specified, PowerShell by default creates an [Object[]] Array. As all types in .Net are derived from the [Object] class, any type of object can be an element in an [Object[]] array. (In depth article here - An array is not an array: Discovering an abstract class in PowerShell)

We can create an Array that only hold elements of a specific type by explicitly telling PowerShell that is what we want to do. The simplest way to create a typed Array is by casting a generic Array as the desired Array subclass. We reference an Array subclass by referencing the desired element type, with an extra set of square brackets inside the end of it.

This creates and assigns a [String[]] Array to the value of variable $Names4.

$Names4 = [String[]]@( 'Tim', 'Joe' )

But when we add an element to it, PowerShell converts it to a generic [Object[]] Array, even if the new element is a string.

$Names4 = [string[]]'Tim', 'Joe' )
$Names4.GetType().FullName  # yields System.String[]
$Names4 += 'Hannah'
$Names4.GetType().FullName  # yields System.Object[] !!!

Strongly type variable to preserve Array subclass

This can be prevented by strongly typing the variable, rather than just the contents of the variable.

[string[]]$Names5 = ( 'Tim', 'Joe' )
$Names5.GetType().FullName  # yields System.String[]
$Names5 += 'Hannah'
$Names5.GetType().FullName  # yields System.String[]

A strongly typed Array variable is useful because it automatically handles confirming an element is of the right type and, when needed and possible, automatically handles the conversion.

If you try to add an integer to a [String[]] array, PowerShell simply converts the number to a string while adding it.

[string[]]$Names6 = ( 'Tim', 'Joe' )
$Names6 += 3
$Names6.GetType().FullName  # yields System.String[]

Problem 1 + 2 - performance at scale for strongly typed Arrays

The performance at scale problem is even worse for strongly typed Array variables than it is for generic Arrays.

That’s because PowerShell still creates the [Object[]] Array, and then when it tries to assign the new array to the strongly typed variable, it has to perform the additional step of creating yet another Array, this time of the correct subclass. Not only does it have to copy everything over yet again, but it also checks every single element to be sure it is of the correct type, even though only the new elements could possibly need conversion to the specified type.

So the performance for adding elements to strongly typed Array variables is worse than for generic Arrays, and the larger the Array, the worse the worseness gets. Adding 1,000 elements to a strongly typed Array variable takes 1.5 times as long as with a generic Array, and adding 10,000 elements takes 5 times longer. (My test for 100,000 elements is still running. Don’t do that. Find an alternative somewhere.)

But we can’t use an ArrayList in place of a strongly typed Array variable, because ArrayLists don’t let you limit the element type. So we need another alternative.

Alternative to a strongly typed Array variable - List

The typed version of an ArrayList is a List.

Referencing the List type is different from a typed Array. We use the fully qualified name of List type with the desired element type in square brackets within the end of the parent square brackets. As with the typed Arrays and ArrayLists, we can create Lists by casting empty or existing Arrays. As with ArrayLists, we add items to a List using its .Add() method.

$Names7 = [System.Collections.Generic.List[String]]@( 'Tim', 'Joe' )
$Names7.Add( 'Hannah' )

And as with ArrayLists, we can easily convert a List back to an array by casting.

$Invites = [System.Collections.Generic.List[String]]@()

ForEach ( $Person in $Employees )
    {
    $Details = Get-PersonDetails $Person
        {
        If ( $Details.IsPersonILike )

            {
            $Invites.Add( $Person )
            }
        }
    }

$Invites = [String[]]$Invites

Thanks to Dave Wyatt for the tip on using Lists.

PowerShell Slack channel

The PowerShell Slack channel is a good resource for getting quick answers to PowerShell questions from a great community, including many PowerShell MVP’s. Many of my blog articles have grown out of answering questions there.

To receive an auto invite go here: http://Slack.PoshCode.org

The PowerShell Slack is here: https://PowerShell.Slack.com

There are many channels, but the main two for general PowerShell discussions are irc-bridge and powershell-help. As the irc-bridge channel has an irc bridge, no snippets or other attachments should be posted on that channel; text only.