Migrating your Microsoft PKI infrastructure to Windows Server 2016 (Part 1)

Migrating your Microsoft PKI infrastructure to Windows Server 2016 (Part 1)
Migrating your Microsoft PKI infrastructure to Windows Server 2016 (Part 2)

As part of my efforts to upgrade my POC lab to Windows Server 2016 I got around to migrating my PKI infrastructure. This consists of an offline root CA and an online issuing CA. In Part 1 of this guide I will be migrating my offline root CA to Windows Server 2016.

This guide is written as a guide to upgrade from a Windows Server 2012 R2 CA to a Windows Server 2016 CA, however very little has changed since the Windows Server 2003 days and this guide is equally valid for moving a CA from any older version of Windows server to Windows Server 2016.

I am a big advocate of the core versions of Windows Server and in this guide I will be migrating from and to Windows Server core. A CA is a perfect example of a server that does not need the overhead of the GUI and additional services that comes with the full GUI edition of Windows Server and if you don’t already use core for your CA, this is a perfect opportunity to migrate to one!

Preparation

In preparation for the migration build your new Windows Server 2016 server. I recommend that you give it the same name as your current root CA server – it is possible to give it a different name however this will require changing registry keys later on in the migration process. Take this opportunity to patch it with the latest Microsoft patches!

Migration – Backing up your existing root CA server

The first step is to back up the CA using the command certutil -backup C:\RootCABackup KeepLog. Note that the KeepLog part is optional, however without it the backup will truncate the logs. I prefer to bring the whole lot across in case the logs are ever needed in the future for auditing purposes.

You will need to enter a password, remember it and make it complex. This backup contains your root CA private key, do not make it easy for an attacker to obtain.

certutilBackup

The next thing to backup is the CA configuration, which is stored in the registry in the following location: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\CertSvc. Back it up by typing reg export "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\CertSvc" C:\RootCABackup\CertSvcRegBackup.reg

regBackup

Additionally it is worth backing up your CAPolicy.inf file which you can do easily enough by copying it into the backup folder, by typing copy C:\Windows\CAPolicy.inf C:\RootCABackup

copyPolicy

Finally, copy the RootCABackup folder to your new CA.

Migration – Configuring your new root CA and restoring from the backup

Log on to your new root CA server and start by installing the CA role. The easiest way to do this is with PowerShell, so type powershell into your administrative CMD prompt and enter the following command to install the CA role: Add-WindowsFeature ADCS-Cert-Authority

Now configure this new CA using the backup of the old CA. This can also be done with PowerShell using the following command:

Install-AdcsCertificationAuthority -CAType StandaloneRootCA -CertFile "C:\RootCABackup\LaptopPoc Root CA.p12" -CertFilePassword (Read-Host "Enter password" -AsSecureString)

Replace the value after -CertFile with the path and name of the .p12 file from your root CA backup. When you press enter you will be prompted for the password you used to back up your original root CA.

If this step is successful you will receive ErrorID 0 as your return code.

configureCA

This restores the root CA private key, however next you need to restore the database and logs. Before you do this the CA service must be stopped. Do that by typing in net stop certsvc and pressing enter. Once it has stopped restore the database and logs using the command certutil -f -restore C:\RootCABackup. The -f forces an overwrite of the data that was configured in the barebones CA setup. Once again you must enter the password you used to backup your original root CA.

certutilRestore

Do not start the certificate authority service just yet! Before doing that the registry settings from the previous root CA need to be restored. Do this by typing reg import "C:\RootCABackup\CertSvcRegBackup.reg"

Note: If you chose to change the name of your root CA server you will need to go through the values in this registry file and change any reference to the old server name to your new server name before importing it.

Finally copy the CAPolicy.inf file back into the Windows directory by using the command copy C:\RootCABackup\CAPolicy.inf C:\Windows

Now you can start the root CA by typing net start certsrv. The service should start with out any issues. To verify this you should log on to a management workstation and load the Certificate Authority MMC snap-in, connect to the new server and verify that your issued / revoked certificates are listed (as this is a root CA there should be very few issued certificates!)

Once you are satisfied that the new server is configured correctly and working, make sure that you delete the C:\RootCABackup folder. As previously mentioned, this contains your root CA private key, you do not want to leave that laying around!

Coming soon is Part 2, which will focus on migrating the issuing certificate authority. Thankfully the steps for this are very similar with only small differences due to it being a domain joined server.

Error 80070057 when attempting to update Windows Server 2012 R2

Once when I was updating some servers running the version of Windows Server 2012 R2 I encountered something odd; no patches appeared in Software Center or in the Windows Update panel, even though the server was several years out of date and definitely had applicable updates!

In WindowsUpdate.log I found the following error message repeating:

cidimage001

The fix for this is to manually download and install KB2919355, which is the April 2014 update rollup for Windows Server 2012 R2. After this has been installed and the server has restarted, re-run your updates scan and updates will show up in Windows Update or Software Center.

Increasing the maximum run time for Windows 10 and Windows Server 2016 cumulative updates

One of the things I have noticed since starting to deploy Windows Server 2016 is that the cumulative updates can fail to install when deployed from SCCM. It starts to run but then times out due to the maximum run time having been reached. By default this is set to 10 minutes. However due to the updates being larger and taking longer to install than updates prior to the cumulative updates era 10 minutes doesn’t seem to be long enough. The fix for this is to simply increase the maximum run time for cumulative updates for both Windows Server 2016 and Windows 10 from 10 minutes to 30 minutes.

Screen Shot 2017-06-05 at 23.12.35

Screen Shot 2017-06-05 at 23.12.47

This is a bit tedious as you’ll have to do it every month for both Windows Server 2016 and every version of Windows 10 you have in your environment. Hopefully Microsoft soon catches on to this and changes the default run time to 30 minutes so that this ceases to be an issue. There is already a Configuration Manager UserVoice entry for this idea, so if you’re reading this, pop over and vote to increase its visibility!

The clients that didn’t know they were on the corporate LAN

I recently came across an issue where all of the Windows 10 clients in a particular remote site were unable to access network resources when connected to the local LAN. The strange thing? This LAN was part of the corporate network.

After searching around I found a number of people reporting similar issues with clients configured to use DirectAccess, usually being caused by things such as a corrupt Name Resolution Policy Table (NRPT) or other issues with the DirectAccess configuration. This led me to try exporting the NRPT registry keys  from a working client and importing them on one of these clients that was not working. But… that did not fix the problem. And anyway, there were devices in other offices that worked when they were connected to DA and the corporate LAN. This problem was limited to this particular office, so I started to dismiss DirectAccess as being the problem.

A little more troubleshooting and I discovered something else. When connected to the LAN these clients could not ping the IP address of any internal server hostname I tried pinging. Any time I tried to use ping on an internal server I just got back "Ping request could not find host hostname.local. Please check the name and try again." They could be pinged by IP address though. So perhaps a DNS issue? Perhaps not, as I discovered that nslookup did work.

Checking the Application log in Event Viewer I found the following three critical errors repeating every minute or so:

NETLOGON | Event ID 5719
There are currently no logon servers available to service the logon request.

DNS Client Events | Event ID 8015
The system failed to register host (A or AAAA) resource records (RRs) for network adapter
with settings:

The reason the system could not register these RRs was because the update request it sent to the DNS server timed out. The most likely cause of this is that the DNS server authoritative for the name it was attempting to register or update is not running at this time.

Time-Service | Event ID 129
NtpClient was unable to set a domain peer to use as a time source because of discovery error. NtpClient will try again in 15 minutes and double the reattempt interval thereafter. The error was: The entry is not found. (0x800706E1)

This led me to think there was something wrong with these clients ability to contact DNS, which eventually led me back to DirectAccess. As noted previously, these clients did work when connected via DirectAccess. So I thought, if I remove the DirectAccess configuration, will that make any difference to this broken-when-on-the-corporate-LAN situation? As I couldn’t just remove the DirectAccess client from the DirectAccess security group (because the client could not contact the domain controller to receive the policy change) I had to find the DirectAccess configuration in the registry and delete it there. It resides under HKLM\Software\Policies\Microsoft\Windows NT\DNSClient\DNSPolicyConfig. I my case there were 4 entries underneath this key, composed of DA-{GUID}. I deleted these keys,  restarted the client and found, to my delight, that it was now functioning correctly on the LAN. For good measure I re-added the DirectAccess configuration (this time by simply adding the client back into the DirectAccess security group) and confirmed that with the DirectAccess configuration back in place, the client was broken again on the LAN.

So what was going on? Eventually the following article from Microsoft led me to the answer: Network Location Detection. This article details the process Windows goes through to determine whether or not it is on the corporate network. When a network adapter status change is detected Windows attempts to access the URL that is stored in the registry key HKEY_LOCAL_MACHINE\Software\Policies\Microsoft\Windows\NetworkConnectivityStatusIndicator\CorporateConnectivity\DomainLocationDeterminationUrl. The URL in this key will have been configured by whoever originally set up DirectAccess in your organisation. If Windows does not receive a valid HTTP response (HTTP status 200 OK) it believes you are not connected to the corporate LAN and attempts to connect to DirectAccess. During this process the entry in the NRPT table that governs what DNS server to use when connected to DirectAccess is used and, because you are not actually connected to DirectAccess, name resolution problems start to occur.

What about a fix? Why couldn’t the client access the network location detection URL? This goes back to the way DirectAccess is configured. Best practise states that your DirectAccess servers should have two network adapters; one internal and one external. Only the external adapter should be configured with a default gateway to prevent routing issues. Internal routing has to be done via static routes.

And that was the key. This was a small remote site that had only recently been connected to the corporate network and there was no static route between the DirectAccess server and the IP subnet being used by this site. After hours of investigating, this issue was resolved in a single moment by issuing the following Powershell command:

New-NetRoute -InterfaceAlias InternalAdapterName -DestinationPrefix Subnet/Mask -NextHop GatewayAddress

I went back to my broken client, opened my browser and entered the network location detection URL and was presented with the Microsoft IIS splash page. I restarted all the clients onsite and they correctly detected that they were on the corporate LAN.