So, you want to build a data science team? Here’s some stuff to think about.
Before long, just like this stock photo, you’ll have a team of weird orange people with big bulbous heads, who can sit around a table looking at an enormous hologram of a simple bar chart.
In this article we will cover:
- Definitions of data science
- The purpose of data science
- How data science teams should integrate into the organisation
- Recruiting for data science
- Team roles
Though Econsultancy is marketing focused, there’s plenty in here to appeal more broadly.
First, an attempt at a definition
It seems trite to say that data science’s applications are broad, but they are. And data science teams come in different forms, within different organisational structures and under different names.
There’s a pretty good Venn diagram developed by Drew Conway which gets to the heart of the ambiguous phrase ‘data science’. Where hacking skills, maths and statistics knowledge, and substantive expertise overlap, this is data science.
In Conway’s words, “The difficulty in defining these skills is that the split between substance and methodology is ambiguous, and as such it is unclear how to distinguish among hackers, statisticians, subject matter experts, their overlaps and where data science fits.”
I recommend heading over to Conway’s article to read more of his thoughts. But the basic takeaway for a layman like me is – there’s a hell of a lot to learn and many different skillsets that can be brought to bear on data.
Whilst data science has many grey edges, it’s probably worth including some fairly dry definitions of two common teams – ‘Big data analytics’ teams and ‘data product’ teams. The former looks for predictive patterns in data without necessarily having a preconceived notion of what they are looking for, and the latter works to implement automated systems that are data-driven.
Data products – Ben Chamberlain, senior data scientist at ASOS, describes a data product as “an automated system that generates derived information about our customers such as predicting their lifetime value. This information is then used to automatically take actions like sending marketing messages or it gets sent to another team who use it for insight.”
If you don’t have any statistical knowledge and you fancy a challenge, you can read one of Chamberlain’s papers about this very ASOS CLV data product.
Big data analytics – IBM gives us a serviceable definition of big data analytics: “..a term applied to data sets whose size or type is beyond the ability of traditional relational databases to capture, manage, and process the data with low-latency.
“..it has one or more of the following characteristics – high volume, high velocity, or high variety. Big data comes from sensors, devices, video/audio, networks, log files, transactional applications, web, and social media – much of it generated in real time and in a very large scale.”
Remember, data science must tackle a problem (duh!)
As I read in a Harvard Business Review article, economist and Harvard professor Theodore Levitt once said that “People don’t want to buy a quarter-inch drill, they want a quarter-inch hole.”
The same applies to data science – the business needs to see a solution. It’s another obvious thing to say, but I’m writing it because new(ish) and complicated disciplines such as cognitive computing can temporarily blind marketers to the fact that normal rules of business apply – what is the problem that needs solving? What data can be brought to bear, and how can the data be used to create most value?
This is something summed up very nicely with another trusty Venn diagram on a Juice Analytics article. (The intersection of the three circles is where successful data products live.)
Parry Malm, co-founder of Phrasee (email marketing language generation software), takes a pragmatic tone and warns about employing a data science team before you know exactly what you want to achieve.
“The first step,” he says, “is to really, really, really clearly define what problem you’re trying to solve… only then consider whether or not an analytics team or whatever is the right approach. What you DON’T want to do is to hire 10 ‘data scientists’ or something, and then have a huge working capital hit for an undefined outcome, when the money could potentially be spent better somewhere else.”
How data science should interact with the wider org
Before we move on to all the roles in a data science team and the challenges involved in setting one up, it’s worthwhile considering how the team will interact with the rest of the organisation.
Simply parachuting data scientists into a company ignores the differences in culture and skills between marketing and finance teams, and these statisticians and programmers.
To get full value out of your data science team you need to consider what peripheral roles and processes are needed.
1) Transparency and a customer service culture
The danger is that data products or big data analytics will either be implemented and deliver no business benefit or will be underutilised / underprioritised by a business which fails to recognise their value.
Writing in Harvard Business Review, various members of McKinsey’s analytics teams say there is a need for data teams to operate in a customer service culture.
“..[think] of the business owners as customers. As any good retailer will tell you, you need to understand your customers to be successful. Have regular meetings with them to understand their needs and get feedback on the performance of the team’s models. Always ask yourself, “Who in the business will be helped by my analytics?” and “Do they agree you helped them succeed?”
Again, this all feels pretty obvious but will be integral to success. Is the business ready to accept suggestions from a data-led team? If not, what education is needed in the first instance or how can stakeholders be more involved in the effort?
2) Data-science communication
Science communication generally is a noble cause. In an article in The Guardian in 2016, Richard Holliman reports that it is an undervalued vocation. He writes that “For too long, research has shown that science communication is seen as a second-class option for academics.”
Holliman continues, adding that though science communication has improved, “There is still work to be done to ensure that excellence rather than acceptability becomes the hallmark of these activities. The introduction of new ways to discuss and publish the outputs from research, and alternative mechanisms for reward and recognition suggest that a shift in this direction is underway.”
I’m going off topic here, but there’s a corollary with how data science teamwork is translated within businesses and to the end consumer. There needs to be a surrounding network of skillful communicators.
These communicators can include:
- Data visualisation specialists – To make outcomes more readable and accessible.
- Data strategists – In a recent interview with Econsultancy, Channel 4’s director of consumer insights Sarah Rose described this role as “the bridging point between the data science team, who work on the models that we put into our products, and the rest of the business.” Their knowledge may include some data science and some industry expertise.
- Campaign experts – With knowledge of tech and marketing (could be a developer).
- T-shaped leaders – The leader of the data science team must absolutely be all about data science; it’s integral they be an expert in the field. But if you can also find one with business skills, then all the better.
Idrees Kahloon, data journalist at The Economist says that “Often, the best way to present data is the simplest: people readily understand means, medians and sums. Fancier statistical models appeal to wonks, but are harder to explain to a general audience.”
Of course, Kahloon is talking about data journalism and getting a concept across to general readers, but there’s still plenty of wisdom to be applied to business communication. How do you present data science findings in a way the business can understand?
(If you’re a bit bored at this, the halfway point in the article, why not watch David McCandless on the beauty of data visualisation, below.)
Recruiting for data science teams
On to the finer detail of how to actually get hold of some of these data science people. It’s worth starting with a reality check from Neil Yager, Chief Scientist at Phrasee. He says, “In general, this is a challenging task and people should manage their expectations up front. This is a relatively new field and demand is high. Therefore, the pool of available talent is rather limited.”
It’s for this reason that the number of vendors offering embedded cognitive computing functionality has skyrocketed over the last couple of years. There are now hundreds that offer some machine learning capability, with martech a particular growth area.
The shortage of expertise means that if you are going ahead with your own in-house team, your first hire and the team leader is particularly important. Yager explains: “..due to high demand and short supply, salaries tend to be at the high end. My recommendation is that the first hire be someone relatively senior and experienced. Don’t be tempted to build a larger team of less experienced people — this will be counter productive in the long run.”
Neil goes on to recommend companies “attend or host local meetup events for big data, data science, AI or machine learning. These are active communities, and the people who attend these events tend to be very engaged.”
However, even if you dive into your local Hadoop meetup, you may not find the person you need straight away. Data science teams often employ people from a variety of analytical or scientific backgrounds, precisely because it’s hard to find somebody with all the skills you need.
Maloy Manna writing on the Data Science Central blog says:
“There are actually probably just a handful of the “unicorn” data scientists on the planet, who have superpowers in maths/stats, AI/machine learning, a variety of programming languages, an even wider variety of tools and techniques, and of course are great in understanding business problems and articulating complex models and maths in business-speak.”
Of course, maintaining links with academia is also important (these will probably cross over with meetup groups). Most companies using data science (including the previously mentioned ASOS and Channel 4) will work with PhD students and a university, as well as employing graduates into their first jobs.
Finally, if you want to read how a tech unicorn goes through the recruitment process, Riley Newman, head of analytics at Airbnb, has discussed how they interview their data science candidates over on Quora.
Data science team roles
Here are a selection of roles that may be needed in your data science team. Ultimately, some of these roles may overlap, and you may not need one of each – it depends on what your team wants to achieve.
The team leader must have chops when it comes to data science. Leadership and business skills alone are not enough. Christopher Doyle, who works in pricing and analysis at Aspen Dental, sums this up well:
“A new analytics team absolutely needs a leader who possesses strong mathematical modeling skills. The reason is simple: Mathematical modeling skills are hard to learn and require years of experience working under experts. While data mining and business savvy skills are certainly valuable, these should ultimately be secondary considerations, since they are skills that can be easily learned.”
We have discussed this role already as the bridging point between the data science team and the business. This role may work with campaign experts from the marketing team.
Data strategists may be similar to product managers, and may need to work with front-end developers and UX professionals as part of a wider data product team.
There may also be data analysts involved, much like on a more descriptive analytics team, who do data processing and may also visualise data.
As the Venn diagram of data science suggests, a data scientist should have expertise in both statistics and software development. They will likely be able to use Hadoop or Spark to analyse large datasets and they will be familiar with R or Python. The team leader will be a data scientist.
Data engineers and architects
These roles are about understanding how data is structured in the organisation. That means databases, cloud computing, distributed frameworks like Hadoop and some programming languages expertise.
Data architects capture, organise and centralise data. Engineers then test, maintain and get the data ready for analysis.
Elizabeth Mazenko has done some great research for BetterBuys on what capabilities various members of the data science team typically have, and provides a chart which makes a useful rough guide.
You’ll see many more job titles mentioned in other articles – data hygienists or business solutions architects, for example – but most should correlate with the three or four roles outlined here.
A final thought for marketers
Christopher Doyle, director of market analysis at Aspen Dental, writes:
“Even though the marketing department is our top customer, I prefer keeping them at arm’s length. Everything in the marketing department needs to happen immediately, so keeping some distance between them and the analytics team allows the analysts to manage the workflow more efficiently.”
Though marketers and Agile digital teams may have just got a taste for iterating and innovating, data science can take time. From data cleansing (which could take months) to developing models and implementing products, marketers need to understand the scale of investment (both time and money) required in data science teams.
However, once these teams start to bear fruit, advantage over the competition can be significant.
A final, final thought
Parry Malm, co-founder of Phrasee:
“Here’s another option: in 1997, I sat next to a guy in a university computer science class who was called Neil, who’s now known as Dr Neil Yager, our chief scientist. So another option is to build a time machine, go back 20 years, and make sure you’re sat next to the right person.”
Take a look at Econsultancy’s Advanced Data & Analytics Training.