← All case files // case file · RAID / Server

An HP RAID 5 that wouldn't boot after a 'successful' rebuild.

A four-disk array rebuilt to 100%, reported healthy, and then refused to POST into the operating system — taking two production virtual machines and the business's databases with it. There was no backup. We imaged every member disk read-only, reconstructed the array away from the controller, and rebuilt both VMs from the parity-corrected data.

DeviceHP ProLiant · 4-disk RAID 5

FaultRebuilt to 100%, then unbootable

Payload2 × VHDX / VMDK virtual machines

Turnaround6 days

Outcome98.5% recovered

The situation

A small business ran its whole operation on a single HP ProLiant server: two virtual machines on a four-disk RAID 5, one hosting the line-of-business application and its SQL database, the other file and print services. A disk had failed — a normal, survivable event for RAID 5 — and a replacement was fitted. The controller rebuilt onto the new disk and reported the array as 100% healthy. On the next reboot the server would not come up: the hypervisor loaded but neither virtual machine would start, and the vendor's own support line, after a firmware update and a round of diagnostics, could not get it back. With no backup, the only copy of the company's data was locked inside an array that technically said it was fine.

Why a “100%” rebuild can still lose everything

This is one of the most misunderstood failures in storage, and one of the most damaging. RAID 5 keeps a single disk's worth of redundancy: lose one member and the missing data is recalculated on the fly from the surviving disks plus parity. The danger is that a rebuild reads every sector of every remaining disk to regenerate the replacement — and in an array that has been powered for years, a second disk is very often quietly weak. If that disk throws read errors during the rebuild, the controller fills the gaps with mathematically valid but factually wrong data and writes the result to the new member. The array ends up “100% rebuilt” and internally consistent, yet the file systems and virtual-machine containers sitting on top are now peppered with silent corruption. The rebuild didn't fix the array; it overwrote good redundancy with bad.

First look — imaging every member read-only

The single most important decision on a job like this is made before any recovery is attempted: nothing is written to the original disks, and the array is never asked to rebuild again. Each of the four members was removed, labelled with its bay position, and cloned individually to a fresh disk through a hardware imager behind a write blocker. Two of the disks read cleanly. The third — the weak member that had corrupted the rebuild — had a band of unstable sectors, so it was imaged with an adaptive strategy: fast passes first to secure the healthy majority, then slow, retimed retries over the difficult zones to pull back the maximum readable data without stressing the drive into total failure. Working only from these clones means the original media is preserved and every later step is reversible.

Reconstructing the array off the controller

With four images in hand, the array was rebuilt in software rather than on the hardware controller. The RAID parameters an HP controller uses — stripe (block) size, the left/right and synchronous/asynchronous parity rotation, the disk order and the starting offset — were derived by analysing the raw images: entropy patterns reveal where parity blocks sit, and known file-system structures act as anchors to confirm the stripe map is correct. The corrupted metadata the controller had written was ignored entirely. Because we held images of all four members, parity could be used the way it is meant to be used — to check and correct — rather than blindly trusted, so the bad data the rebuild had introduced was identified and repaired against the surviving copies.

Repairing and remounting the virtual machines

Reconstructing the array exposed the underlying volume and, within it, the two virtual-machine containers. Corruption from the failed rebuild had left both the VHDX and VMDK files with damaged internal structures — broken block-allocation tables and inconsistent headers — so each was repaired at the container level, then its guest file system checked and mended. From there the SQL database was validated page by page, the file-services volume rebuilt, and the virtual machines' configuration reconstructed so they would boot rather than sit at a black screen. A recovered VM that will not start is only half a recovery, so both were brought up in an isolated test environment and confirmed running before anything was handed back.

Verifying and returning the data

Recovered data is worth nothing until it is proven to open. The line-of-business application was launched against its restored database, files were opened at random across the file-services volume, and record counts were checked against what the business expected. Around 98.5% of the data came back, including both bootable virtual machines, the SQL database, company applications, and the financial and operational files that ran the business. The small remainder that could not be recovered corresponded exactly to the sectors the rebuild had already overwritten before the server was switched off.

Outcome

Everything essential was returned on a fresh drive, with the virtual machines in a state the business could redeploy. We also walked them through why the rebuild had gone the way it did, and what a working backup — a real second copy, off the array — would have changed. The single most useful thing anyone can do when a RAID member fails is resist the urge to rebuild onto a suspect disk: image first, then rebuild from copies.

Tools & techniques on this job

Hardware imager with write blocker · adaptive read for unstable sectors · PC-3000 RAID and Atola Insight for stripe, parity and disk-order reconstruction · container-level VHDX/VMDK repair. All imaging read-only, all work carried out in-house in Belfast.

Facing something similar?

Send it to us for a free, no-obligation diagnostic. We’ll tell you what can be recovered and put a fixed price in writing before any work starts — and on most jobs, if we can’t get your data back, there’s nothing to pay. Post your device in, or drop it to us by appointment.

Start a free diagnostic → Or call 028 9002 0144

Common questions

Can you recover a RAID that failed during or after a rebuild?

Yes — a failed rebuild is one of the most common RAID jobs we see, at any level and on any controller. We image every member disk read-only and reconstruct the array in software from the copies, so a botched rebuild on the hardware controller doesn't stand in the way. Send us every disk, including the one that originally failed.

Why did rebuilding the array make things worse?

A rebuild reads every sector of every remaining disk. If a second disk is quietly weak — common in an array that has run for years — its read errors get filled in with incorrect data and written to the new disk, so the array reports success while the data on top is corrupted. That's why imaging before rebuilding matters so much.

My array has failed — should I try to rebuild it again?

No. Stop, and don't let it rebuild or re-initialise. Power the system down, remove the disks, label each with its bay order, and get them to us. A free diagnostic will tell you exactly what's recoverable before any chargeable work — and on most jobs it's no fix, no fee.

RAID recovery · Server recovery · Virtual machine recovery · Business data recovery

Free written diagnostic in 24 hours