What you won’t find in this post is a list of data science tools, degree-granting programs, or the ability to estimate the number of ping-pong balls needed to fill a 747.
These do not make a great data scientist.
On the other hand, if you have a candidate who has the three traits below, regardless of which set of tools they already know, then you can train them in the tools your organisation uses.
Actually, they’ll train themselves.
Trait #1: The ability to ask questions that matter
A question matters if the answer has an impact. Often, this means that it drives an important decision.
For a business, it could be a decision to enter one market versus another. For a non-profit, it could be a decision on which strategies to use to help those in need. For a doctor, it could be a decision on which drugs to use to treat a patient’s cancer.
Trait #1 is all about the ability to direct the natural curiosity that we data scientists have towards goals of significance. That requires first understanding what those goals are, and then defining the questions we ask of the data — that is, the hypotheses — so that the answers will tell us how to achieve our goals.
Trait #2: The ability to determine that the question has been answered, part 1: logical thinking
It’s necessary to make sure of the correctness of every step of the reasoning from the data to the analytical result, and from the analytical result to the conclusion about the hypothesis. Part of logical thinking is deductive reasoning. Another part is understanding all of one’s assumptions and validating that each one is satisfied.
Logical thinking requires one additional, vital component: a commitment to intellectual honesty. That means not allowing oneself to bend to one’s desire for a particular outcome.
Trait #3: The ability to determine that the question has been answered, part 2: creative thinking
Data scientists must be vigorous in their attempts to fail their own answers.
That means asking questions such as: What are the possible confounding variables? Could the cause and effect be the opposite of what I think, however counter-intuitive that may be? Is it possible that the results are due to two or more effects that happened to combine at this moment in history, not to be repeated? Is my data set representative of the real world?
Trait #3 is a form of creative brainstorming. Out of all three traits, it is the one that takes the most work. In my experience, it can lead to doing several additional investigations each as large as the original one. To devote the energy it takes to do this also requires a commitment to intellectual honesty.
Why is it necessary? Because if the question matters, then ensuring that the question is correctly answered matters. Answering the question correctly can make the difference of millions of dollars of business revenue made or lost, at-need populations helped or not, patients cured or not.
Good news for the data scientist
If you’re a data scientist, and this way of thinking about the job of the data scientist is new to you, don’t be discouraged.
These three traits are cognitive skills that are learned and made into habits of work. They are not a matter of instinct.
This is good news, because however much or little you exercise these traits now, you can improve them through training and through conscious use, until they become habitual, as they need to be.
How to hire the best data scientists
If you hire data scientists, then you know that software, tools, and mathematical skills are important considerations. Assessing these should be part of your interviewing process. However, I have found that the data scientists who embody the three traits are also the ones who can most easily adapt to different sets of tools, techniques, and programming languages.
An easy way to assess job candidates on the three traits is to ask them to briefly describe one or more of their past projects, and then ask them many follow-up questions about whether and how they applied the three traits. What made their project important (Trait 1)? What were the assumptions of their analyses (Trait 2)? How did they check the assumptions they named (Trait 2)? What other explanations did they consider (Trait 3)?
Even if a candidate has had data science experience only in school projects, which are sometimes toy problems, their answers to the first question (Trait 1) can be revealing about the extent to which they think about the context of a problem as opposed to just carrying out an analysis. Their answers to the second and third questions (Traits 2 and 3) are just as meaningful for school projects as they are for work projects.
The three traits are characteristics of good scientists of any kind. By engaging your candidates in this type of interview discussion, you are participating with them in the type of discussion that scientists engage in. This is what helps you determine whether they are good data scientists.
Bonus benefits for you
It turns out that including the above in your interviewing process has two bonuses. First, it replicates collaboration that occurs within a data science team, and thus gives you a sense for whether the candidate will contribute well to the kind of discussions that you want your team to have.
Second, having these discussions with your candidates reinforces the three traits in you, the hiring manager. In fact, your best candidates will challenge you in these discussions. Hire them!
Cover image: Shutterstock
This article first appeared on Medium on 14 August 2018.
Struggling to gain visibility & control over your workforce? Join us at the Contingent Workforce Workshop 2018 in Melbourne or Sydney to learn the skills, models and case studies you need to solve these challenges!
Leave a Reply