Article · Advisory

Data Center Operational Due Diligence: A Buyer's Checklist

A practical operational due diligence checklist for data center acquisitions — what to validate beyond the technical inspection report, with the questions investors and asset managers actually need answered.

Published Apr 30, 202614 min readAdvisory pillar

Most data center diligence packages stop at the technical inspection report. The walls are sound, the generators run, the chillers spin, the switchgear is recently maintained, the customers are paying. The deal clears IC. Six months later, the asset is missing its plan, the new operator is fighting the old vendor stack, and the next quarterly review is a discussion about why operating costs ran 18 percent over underwrite.

Operational due diligence is the practice that closes that gap. It is the parallel investigation to technical and financial diligence — and often the most predictive of post-close performance. This is the checklist we run for buyers, lenders, and joint-venture partners evaluating data center acquisitions. It is operational, opinionated, and biased toward the questions that actually move underwrite.

What operational due diligence is — and what it is not

Operational due diligence answers a single question: can the operating posture this asset has been running on continue under your ownership, and if not, what does it cost to fix? The answer is a composite — staffing, runbooks, vendor structure, tooling discipline, compliance evidence, tenant satisfaction, and the engineering reality beneath the operations team. It overlaps with technical diligence at the edges (the cooling-plant condition affects operations) and with financial diligence at the seams (operating cost variance is itself an operational question), but it is distinct from both.

Operational diligence is not a tour with the GM. The tour is the start. The diligence is what you do with it.

1. Staffing and skill coverage

The single highest-leverage operational input is the team. Three lines of inquiry:

Headcount and shift model

  • Total operations headcount. Split by site, role (DCT, CIE, manager, NOC), and shift.
  • Coverage analysis: how many shifts run thin? Is the on-call rotation formal or a "first to answer the phone" arrangement?
  • Tenure distribution. Concentrated tenure (everyone hired in the last 18 months, or everyone hired before 2015) is a flag.

Skill coverage

  • Skill matrix by individual against power, cooling, network, security, and application domains. If no matrix exists, that is the finding.
  • Certification cadence. BICSI, BMS-vendor, OEM-specific certifications.
  • Cross-training discipline. Who is the only person who can do X? List three.

Compensation and retention

  • Compensation benchmarks vs. local market. Below-market salaries forecast turnover under new ownership.
  • Voluntary turnover trailing 24 months. Above 25 percent is a real flag for an operations function.
  • Open requisitions. Long-open senior roles often correlate with burnout in the team that's covering them.

2. Runbooks and operating discipline

Runbooks are the part of an operations practice that survives a turnover and the part most likely to be overstated in a CIM. Validate them.

  • Get the actual runbook library. PDF count alone is not the answer — the question is whether the documents are current. Spot-check three runbooks against the most recent change record they describe.
  • Method-of-procedure (MOP) discipline for change windows. Are MOPs formal? Reviewed? Approved? Or is "we'll figure it out at 11pm" the operating norm?
  • Escalation matrix. From whom, to whom, on what trigger, in what timeframe. Walk through one real incident in the last 90 days.
  • Operating-policy archive. ISMS-style policies tied to the compliance frameworks the facility runs against (SOC 2, ISO 27001, PCI, HIPAA, NIST).

A facility with a robust runbook library and a culture of writing things down typically rides through a transition with low value erosion. A facility without one — even if everything else looks good — is a stabilization project disguised as an operating asset.

3. Tooling and DCIM hygiene

The DCIM and operations tooling stack is where operational reality lives or dies. Inspect:

  • DCIM platform, version, integrations. Is the asset model current? Run a sample audit: pick five racks, walk them, compare to DCIM. Variance over 5 percent on cabinets or U-positions is a finding.
  • BAS / EPMS coverage. Is environmental and power telemetry in one place? Does the operations team trust it?
  • ITSM. How are tickets actually created and closed? Counts, MTTR, backlog age, ratio of preventive to corrective work.
  • Sensor coverage and calibration. When were thermal and power sensors last calibrated? Are there blind spots?

4. Vendor structure and commercial leakage

The vendor ecosystem under an operations contract is where most post-close cost surprises hide. Look for:

  • MSAs and statements of work for every recurring vendor. Auto-renewal terms, termination clauses, price-escalation language.
  • Specialty engineering pricing structure. How is outside licensed work billed back? What's the operator's margin posture, and is it disclosed? Bring this into the room before, not after, the deal closes.
  • Sole-source dependencies. The single OEM that the cooling plant depends on. The single connectivity provider with a 10-year contract. The single specialty contractor on speed dial.
  • Insurance, bonding, and indemnification posture across the vendor stack.

5. Compliance posture

Compliance work either compounds or accumulates as deferred maintenance. Validate:

  • Active certifications. SOC 2 Type II (most common), ISO 27001, PCI, HIPAA, HITRUST. Get the actual report with auditor opinion. Read the management responses.
  • Findings status. Open exceptions, in-progress remediations, repeat findings.
  • Evidence collection cadence. Is it a quarterly fire drill or continuous?
  • Tenant compliance enablement. Are auditors working through the facility multiple times per year because tenants are hitting their own audit cycles? Is that supported well or grudgingly?

6. Tenant management and retention signals

Tenant retention is an operating outcome. Look at:

  • Tenant tenure distribution. Long-tenure tenants are operational assets. Recent churn is a flag.
  • QBR cadence and content. Are quarterly business reviews actually happening? With what reporting?
  • NPS or analogous tenant satisfaction signal. Trended over 24 months.
  • MAC velocity. How long from request to delivery for a typical add or change?
  • Tenant-driven escalations. Frequency, root-cause patterns, resolution time.

Three months of tenant-driven escalations tell you more about operating reality than a 30-page CIM section ever will.

7. Capacity and lifecycle reality

Capacity claims in CIMs are aspirational. Operating reality is different. Check:

  • Current sold vs. installed vs. design capacity by power, cooling, and floor space.
  • Power profile by tenant — peak vs. average. Underwriting against nameplate without operational measurement is one of the most common errors in data center acquisition.
  • Cooling headroom under summer-peak conditions, not nominal.
  • Asset lifecycle status. Equipment age distribution, end-of-life hits in the next 5 years, replacement cost reserve.

8. Engineering coordination

Operations and engineering are the same team in functional operations and split functions in dysfunctional ones. Check:

  • Who owns engineering? In-house, outsourced, fractional, none?
  • Engineering documentation status. As-built drawings current? One-line diagrams matched to physical reality? Cooling-plant schematics current after the last upgrade?
  • Capital plan integrity. Is there a 5-year capital plan with engineering work distinguished from like-for-like replacement?
  • AI / heavy-density readiness. Has anyone modeled the implications of supporting 30+ kW per rack on the existing chilled-water plant?

9. Operational outcomes — the numbers under the cover

The operating story is told in numbers the management presentation rarely surfaces:

  • Major incident count, trailing 24 months. Severity distribution.
  • Mean time to detection, mean time to resolution per severity.
  • Preventive maintenance compliance rate.
  • Audit finding density.
  • Tenant ticket volume, response time, resolution time, reopens.
  • Operating cost variance vs. plan.

10. The hand-off — what changes day one

Finally, model the operational hand-off as part of diligence:

  • Which staff are at risk of leaving? What does retention look like at 12 months?
  • Which vendors will need re-papering? Which need re-pricing?
  • What systems will need to be replaced or integrated under your operating model? At what cost?
  • What does the first 90 days of operations look like? See our piece on post-acquisition data center operations stabilization in 90 days.

How CR Technology runs this work

We deliver this checklist as a written assessment with severity-ranked findings, capex implications, and a remediation plan a buyer's operations team can pick from. Engagements are independent — we hold no equipment lines, no referral fees, and no developer GMP. Where licensed specialty engineering is required for a deeper inspection, we coordinate and integrate that work into a single accountable deliverable.

See the Advisory Services page for the full diligence shape, or reach out at info@castlerocktechnology.com to scope a package against a live deal.

Ready when you are

Bring this conversation into your facility.

If something in this article maps to a decision you're making, a 30-minute call is the fastest way to know whether we're the right partner.