Introduction
Slightly concerned all i seem to blog about is certificate deployment in Intune but here we go again...
I recently supported an issue with my colleague Chris Sellar surrounding the Network Device Enrolment Service (NDES) service failing to start after installing the NDES role on a new server to support SCEP certificates in Intune. When attempting to access the internal NDES URL, the customer was presented with:
Oh dear....
The Dreaded HTTP 500 - Internal Server Error
When viewing the Application logs in Event Viewer, we found the dreaded Event IDs 2 and 10.
Event ID: 2 | The Network Device Enrollment Service cannot be started (0x80070057). The parameter is incorrect. |
Event ID: 10 | The Network Device Enrollment Service cannot retrieve one of its required certificates (0x80070057). The parameter is incorrect. |
After checking that we installed all the required SCEP infrastructure prerequisites correctly and the required Registration Authority (RA) certificates had enrolled on the NDES server, I was sure I had read something about this before.... Certificate Revocation Lists (CRLs)!!!
Like any ADCS certificate, Certificate Revocation Lists (CRLs) are crucial for ensuring RA certificates are trusted on the NDES server, if they aren't, NDES will fail to start. In this scenario, the customer had a two-tier PKI infrastructure (an Offline Root CA and an Online Issuing CA that issued the RA certificates). Time to prove my thinking!
CAPI2 Operational Log
A very well detailed Tech Community blog by Rob Greene walks you through how and where to go to prove issues with certificate chaining and CRL issues with NDES. Well worth a read and massively helped me out.
The CAPI2 Operational log can show events related to chaining and revocation checking. Enabling this log in Event Viewer (it's disabled by default), then accessing the internal NDES URL, should provide some good information to support troubleshooting. I'd recommend disabling this log after to reduce the total number of events and logs recorded.
The results...
Now we have some errors to read. I'll talk through some of them which helped me with the fix.
Event ID 11
This log shows shows if the RA Certificates can chain to a valid root certification authority and in addition includes revocation checks to see if all certificates in the chain succeed or fail their revocation check.
Important Note: When reviewing the log for Event ID 11, ensure you see the MSCEP-RA certificate. The subject name is usually defined as the NDES server name-MSCEP-RA. This is what we're interested in seeing!
As Rob Greene's blog advises, we want to pay attention to the TrustStatus field in the Details section. The first TrustStatus is the overall TrustStatus. This tells you about the entire chain and specifically that one of the certificates in the trust path failed revocation. We can see in this log one of the RA certificates is failing a revocation check.
Again we can see a failure for a revocation check.
Event 42
This event shows that CryptoAPI cached data is being rejected as it is either stale or needs to go off system to get the latest CRL / OCSP response from the network. I'm not going to pretend to know all the intricoes within this log, however, we want to pay attention to the action detail within this error log. The below is what we received.
This tells us that the CRL at the HTTP URL path is no longer valid and could not be retrieved.
PKIVIEW
Ok we know the issue is chaining and/or CRLs. What else can we use to prove it.
Pkiview.msc (Enterprise PKI) gathers information through Active Directory about the CA certificates and certificate revocation lists (CRLs) from each CA in the enterprise. Opening pkiview on the NDES server can help validate if your publication points and CRLs are healthy.
Unfortunately i was unable to take a screenshot of this particular customer issue, however, pkiview showed the offline Root CA CRL had expired.
This was the root cause.
As this customer has a two-tier PKI hierarchy, the following checks will happen:
The RA certificate/s is analysed to determine the download locations of the CRL issued by the Issuing CA. Once located, the system verifies that the RA certificate has not been revoked by the Issuing CA Manager.
The CRL of the Issuing CA is then examined to find the download locations of the Offline Root CA’s CRL. Subsequently, the system confirms that the Issuing CA certificate has not been revoked by the Offline Root CA Manager.
To keep it simple, the Root CA CRL plays a role in the RA Certificate/s validation check, thus it's importance to not let it expire and verify it's reachable.
The Fix
Hopefully this should be obvious by now, but Root CA was switched on and the new CRL was renewed which got the NDES service started and fixed the 500 internal server error.
Based upon other posts relating to this issue, it appears when an HTTP 500 error is seen and is related to revocation checking, it is an unreachable or expired CRL, which in this case was the Root CA CRL.