I recently ran into a situation where I needed a test file of a specific size. This task is relatively straightforward; however, the catch was that the file had to contain valid text because the test case did a MIME type validation check. This article covers how easy and difficult it can be.

Generate a file with a specific size on Unix

Note

I’ll not get into the semantics of Unix vs. Linux vs. macOS. I defer to the discussion of the subject in askubuntu for that debate.

As with many technical things, it’s relatively easy on Linux using base64 and head with a pipe:

base64 -w 0 /dev/urandom | head -c 18M > output.txt

macOS is just as easy, although the -w flag isn’t needed because it defaults to 0:

base64 /dev/urandom | head -c 18M > output.txt

What is this doing? We’re using base64 to take the binary data from /dev/urandom and encode it into printable characters. The -w 0 flag tells base64 not to wrap the output every 76 characters (the default so that it fits in a standard-sized terminal window).

Next, we pipe the output through to head and use it to print only the first 18 MiB, then output that to a file.

Why is all of this important? Because the file we’re reading from is /dev/urandom, which is a device file that collects entropy from various system sources to serve as a pseudorandom number generator. Device files aren’t files per se but rather an interface to a driver; in this case, the pseudorandom number generator.

Because it’s not a “file,” it doesn’t have a file size. So we’re indirectly asking the pseudorandom number generator how many bytes of data we want.

Running it looks like this:

[jose@webdevbox Documents]$ base64 -w 0 /dev/urandom | head -c 1024 > output.txt

And the output looks like this:

NLnCSJJoKUO6F+XR3xEDL/ktlz6hsyGwHewSumRpB9BI/8R7UL+kZ4h54Fg3CRAqqWaABxmqzCNLMqE8oy3xMlSBwk9pe44wJTs75QhT4QFuUfupYWFxJA

We can use file to confirm it’s really a text file:

[jose@webdevbox Documents]$ file --mime-type output.txt
output.txt: text/plain

Yeah, Unix’s kids are great, but what about Windows?

Generate a file with a specific size on Windows

Rather than /dev/urandom, Windows has an abstraction layer to a set of dynamically linked libraries for cryptography called Microsoft CryptoAPI which we don’t use directly. Instead, Cryptographic Service Providers typically use it as part of their service’s security.

As a result, the CryptoAPI is only accessible in source code, often with the CryptGenRandom function. That’s great if you’re using a supported programming language, but not as good for a simple script.

The equivalent RandomNumberGenerator class does exist in .NET, and we can use it in PowerShell to get an array of random bytes with the GetBytes method. However, using it to get 18 MiB of random bytes, then converting them to ASCII is too involved and inefficient.

Luckily, there’s another way by using GeneratePassword that condenses down to this PowerShell one-liner which makes about a 14 MiB file:

1..100000 | % { [System.Web.Security.Membership]::GeneratePassword(70, 3) } >> .\output.txt

So, how does it work? With the above parameters, the GeneratePassword method creates a 70-character password with at least three non-alphanumeric characters. It then makes a hundred thousand of these passwords and outputs them to a file.

However, it’s not very accurate or user-friendly. The actual filesize is 14400002 bytes or approximately 13.7 MiB:

PS C:\Users\User\Documents\Dev\projects\docs\website\techbit> ls


Directory: C:\Users\User\Documents\Dev\projects\docs\website\techbit

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
-a----          5/8/2022   9:00 PM       14400002 output.txt

So, I sought to create a proper script that’s more user-friendly and accurate. It turns out that a “proper” PowerShell script is a module. Therefore, convention dictated that I create a PowerShell module called Write-RandomText that accepts a filesize in MiB and a filename, then makes the file with those parameters:

PS C:\Users\User\Documents\Dev\projects\powershell\PowerShell-tutorials> Import-Module .\src\modules\Write-RandomText\Write-RandomText.psm1
PS C:\Users\User\Documents\Dev\projects\powershell\PowerShell-tutorials> Write-RandomText 18 output.txt
PS C:\Users\User\Documents\Dev\projects\powershell\PowerShell-tutorials> ls


Directory: C:\Users\User\Documents\Dev\projects\powershell\PowerShell-tutorials


Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
-a----         5/21/2022   4:34 PM       18874368 output.txt

It generates a 18874368-byte file, which is exactly 18 MiB.

Convoluted? Yes. Are there better ways? Probably. However, creating a solution is how Windows forces you to learn more. At least, that’s what I keep telling myself is a good reason to keep using Windows.


May 21, 2022 Update: Fixed some typos and readability, but mainly fixed the accuracy of the Write-RandomText module.