Skip to main content

Command Palette

Search for a command to run...

The Identity Crisis in Microsoft Purview eDiscovery: Why You’re Likely Under-Collecting

Updated
4 min read
The Identity Crisis in Microsoft Purview eDiscovery: Why You’re Likely Under-Collecting

If you’ve spent any time managing eDiscovery in a large organisation, you’ll know that an investigator is only as good as the information they have at the start of a case. In Microsoft Purview, there is a growing challenge that I’ve started calling the "Identity Crisis."

The problem is simple: Purview is heavily reliant on what is currently present in Microsoft Entra ID (formerly Azure AD). While this works fine for active employees who have never changed roles or names, it creates a significant risk of under-collection for everyone else.

1. The "Ghost" Custodian: Employees Who Have Left

When an investigator needs to search the mailbox of someone who left the organisation two years ago, they hit an immediate roadblock. KQL searches require exact matches for identifiers like SMTP addresses or UPNs to pull data accurately from the index.

However, once an employee leaves, their Entra ID record is often stripped back or deleted entirely. Finding a comprehensive list of every SMTP address that person ever used during their tenure becomes a manual, often impossible, task. If you don't have those historical identifiers to include in your sender: or recipient: KQL queries, those emails will stay hidden in the index.

2. The Name Change Trap

This issue isn't limited to leavers. We see it constantly with active employees who have changed their names—due to marriage, for example. When a name change occurs, the UPN, primary email address, and display name are typically updated.

Microsoft Purview has a "Recipient Expansion" feature designed to help, but it is notoriously focused on the "here and now." It often only returns the current identifiers. If you search using the current identity, the index may not map that back to items stamped with the previous identity from three years ago. The result? You only collect half the story, and often, you won't even realise the earlier data is missing until the case is well underway.

The Risk: Defensibility and Under-Collection

In a legal context, this is a major red flag. If you are relying on KQL to be inclusive, but your query only looks for the custodian’s current "identity," you are effectively self-selecting a subset of the data. This isn't just a technical glitch; it’s a defensibility risk.

The Solutions: How to Solve the Identity Crisis

So, how do we fix this? There are two main paths, depending on your setup and your budget.

Solution A: Build a Bespoke Identity Tool
If KQL is your preferred method of searching, you need to be better informed before you even open Purview. The most effective way to do this is to build a custom tool or dashboard that pulls historical identity information from your HR systems or take a feed from Active Directory to gather identifiers as they change over time.

This tool should provide investigators with a "Full Identity Map" for any custodian, including:

  • Every SMTP address ever assigned to them.

  • Previous UPNs and Display Names.

  • Tenure dates.

By surfacing this information up front, investigators can build KQL queries that include every identifier the custodian has ever used, ensuring the index returns a complete set of results.

Solution B: The "Premium" Approach (E5/Review Sets)
If you have eDiscovery Premium (E5), there is a much more robust way to handle this: Stop using inclusive KQL at the search stage.

Instead of trying to match specific identifiers up front, you should target the custodian’s mailbox directly as a data source. Run a broad search—using only date ranges or specific keywords—and move those results directly into a Review Set.

By targeting the mailbox (the container) rather than trying to match the person (the identity) via KQL, you bypass the identity crisis entirely. The Review Set will pull in all data that matches your keywords or dates, regardless of which version of the custodian's name or email address was stamped on the metadata at the time the email was sent.

Summary

We have to accept that Microsoft Purview isn’t a time machine; it’s a reflection of your current Entra ID state. To avoid the trap of under-collection, you either need to arm your investigators with a complete historical map of custodian identities or shift your workflow toward targeted mailbox collections in eDiscovery Premium.

In eDiscovery, what you don't know can hurt you—and usually, it's the identifiers you didn't know existed that cause the most trouble.