In the past few months, we’ve amassed data from hundreds of interviews, and when we looked at how the same people performed from interview to interview, we were really surprised to find quite a bit of volatility, which, in turn, made us question the reliability of single interview outcomes.

Performance from interview to interview is pretty volatile.

The y-axis is standard deviation of performance, so the higher up you go, the more volatile interview performance becomes.

One thing in particular we’re very excited about is tracking interview performance as a function of interview type, as we get more and more different interviewing types/approaches happening on the platform.

One of our long-term goals is to really dig into our data, look at the landscape of different interview styles, and make some serious data-driven statements about what types of technical interviews lead to the highest signal.

Not only can aggregative performance help correct for an uncharacteristically poor performance, but it can also weed out people who eventually do well in an interview by chance or those who, over time, submit to the beast and memorize Cracking the Coding Interview.

I know it’s not always practical or possible to gather aggregate performance data in the wild, but at the very least, in cases where a candidate’s performance is borderline or where their performance differs wildly from what you’d expect, it might make sense to interview them one more time, perhaps focusing on slightly different material, before making the final decision.