It bugged me that the math of the GeneratePassword method from Write-RandomText didn’t work out. It outputs a specific number of characters, each should only be one byte, so it should have byte-level accuracy. However, it wasn’t perfectly accurate even after tweaking the method inputs and loop counter. There was overhead from somewhere, but where?
The best solution for perfect accuracy is a non-random file, similar to this Microsoft Scripting DevBlog article. The article assumes that a single character is a byte (namely a period .), but still had two bytes of overhead that seemed to be from the end-of-file marker.
With that in mind, it should be possible to rewrite Write-RandomText to output a single character of known size to an output file to generate a file of a specific size. However, I still couldn’t get it to work even without the GeneratePassword method:
PS C:\Users\User\Documents\Dev\projects\powershell\PowerShell-tutorials> '.' | Out-File -FilePath .\output.txt
PS C:\Users\User\Documents\Dev\projects\powershell\PowerShell-tutorials> ls
Directory: C:\Users\User\Documents\Dev\projects\powershell\PowerShell-tutorials
Mode LastWriteTime Length Name
---- ------------- ------ ----
-a---- 5/21/2022 3:35 PM 8 output.txt
That lead me to research more about the Out-File cmdlet. The Out-File cmdlet has a few parameters that help remove overhead. Namely, -NoNewlines, which prevents each character from printing on a new line. However, there was still overhead from somewhere, even without newlines and using a period as the only character:
PS C:\Users\User\Documents\Dev\projects\powershell\PowerShell-tutorials> '.' | Out-File -NoNewline -FilePath .\output.txt
PS C:\Users\User\Documents\Dev\projects\powershell\PowerShell-tutorials> ls
Directory: C:\Users\User\Documents\Dev\projects\powershell\PowerShell-tutorials
Mode LastWriteTime Length Name
---- ------------- ------ ----
-a---- 5/21/2022 3:43 PM 4 output.txt
After more digging into Out-File, the main discovery was that the default encoding of Out-File is utf8NoBOM.
The Byte Order Mark (BOM) is a different rabbit hole outside the scope of a Techbit, so I defer to the byte order mark Wikipedia page for more details. To sum up the article, the BOM is an identifier at the start of a text stream to identify the type of Unicode encoding used. As it turns out, many Windows applications require a BOM because they automatically assume ASCII encoding.
We’re only using a period as our output character, so we can safely specify ASCII encoding to remove all Unicode overhead, which worked:
PS C:\Users\User\Documents\Dev\projects\powershell\PowerShell-tutorials> '.' | Out-File -NoNewline -Encoding ASCII -FilePath .\output.txt
PS C:\Users\User\Documents\Dev\projects\powershell\PowerShell-tutorials> ls
Directory: C:\Users\User\Documents\Dev\projects\powershell\PowerShell-tutorials
Mode LastWriteTime Length Name
---- ------------- ------ ----
-a---- 5/21/2022 3:50 PM 1 output.txt
With the proof of concept done, I created a PowerShell module called Write-ByteText that accepts a filesize in Bytes and a filename, then makes the file with those parameters:
PS C:\Users\User\Documents\Dev\projects\powershell\PowerShell-tutorials> Import-Module .\src\modules\Write-ByteText\Write-ByteText.psm1
PS C:\Users\User\Documents\Dev\projects\powershell\PowerShell-tutorials> Write-ByteText 18 output.txt
PS C:\Users\User\Documents\Dev\projects\powershell\PowerShell-tutorials> ls
Directory: C:\Users\User\Documents\Dev\projects\powershell\PowerShell-tutorials
Mode LastWriteTime Length Name
---- ------------- ------ ----
-a---- 5/21/2022 4:34 PM 18 output.txt
It generates a file of precisely 18 bytes because it contains 18 periods and nothing else. With the newfound knowledge gained from making Write-ByteText, I fixed the Write-RandomText module to have MiB accuracy.
Yeah, it’s safe to say that Windows forces you to learn more.