Skip to main content

Data Science and Analytics Interview with Siddhartha Ghosh

We spoke with Siddhartha Ghosh, an accomplished Chief Data Officer to get his thoughts on data science, artificial intelligence and how to build a world-class analytics function;

Can you tell us a little about your career so far?

My career is a career of two halves. The first part of my career was spent as an academic, as a scientist building solutions and technology. Parts of my postdoctoral research found commercial application in the form of a startup called Joulo.

The second half of my career has been spent across start-ups, in consulting and in some larger organisations, building and leading data and analytics teams. I’ve worked across a range of sectors including fintech, e-commerce (Glovo), software development tools (Builder.ai) and telecommunications (Inmarsat).

 

Do you think that data science is perceived to be more important than ever before?

Yes, I think so. One of the challenges with data science is how to get demonstrable value from it. If I think of my own career, being able to articulate the value that data science can add is important. That’s why you need a senior data professional at the helm of the function to really understand the business needs and requirements, and who can translate them to the data science organisation. You need to know the right questions to ask, and whether data science can truly answer them.

 

How do you build a data science oriented culture within a business?

It starts with defining the business goals and aims. Let’s start with the “why?”, which is “what problems are we solving?” and “why are they important?”. This is why a senior data science leader is important – they can work with other stakeholders to understand their priorities. Once you understand the key priorities you can translate them into mathematical problems that a data science team can solve.

There’s a triage in place. Start with the goal, map the goal into a business problem, and then map that business problem into a mathematical one.

 

How easy is it to build a modern data infrastructure given all the different tools and technologies available today?

It’s gotten a lot easier. Cloud providers like AWS and Google are offering a lot of technology as services that are easy to use and consume. For example, an Amazon Redshift data warehouse often requires relatively little coding before it can be deployed. Building your first data warehouse now compared to what it was like ten years ago is much more straightforward.

 

If you’re an established business with legacy, on-premise data infrastructure, how easy is it to migrate to these modern cloud platforms?

Again, it has become a lot easier. When I got into the industry, you used to need a team of engineers to manage a migration project. A lot of this has gone away because now you have tools like Stitch Data that makes it easy to transfer data from multiple sources into your data warehouse. It’s one such tool that my teams have heavily adopted in different settings.

 

How easy is it to hire in the right talent into a data science organisation?

There is immense competition for talent in industry. It is essential to understand the skillsets you need since talent can come from academic disciplines where they often think about problems and solutions in very different ways. You need to understand what kind of data scientists you really need. Do you need someone very theoretical working on fundamental problems? Or perhaps someone who is more commercial and is good at applying data science to business problems?

 

“You need to understand what kind of data scientists you really need. Do you need someone very theoretical working on fundamental problems? Or perhaps someone who is more commercial and is good at applying data science to business problems?”

 

How do you ensure that both the business and the data science organisation have a good understanding of each other’s needs and operations?

I think that’s primarily the responsibility of the Head of Data Science. Their fundamental role is to understand the needs of the business and to shape a data science organization that is fit for purpose in that regard. The Head of Data Science should also make sure that their team is always aware of the key objectives of the broader business.

 

How much impact has AI had on data science over the last few years?

I’m not sure some people fully understand what AI means. Most “AI” today is in fact “machine learning”. It’s about making predictions using data and building models that can predict different outcomes and events. In most companies, people are interested in machine learning, not artificial general intelligence where a machine is making decisions for you in every context.

 

That’s interesting. For us amateurs out there, is machine learning therefore just analysing things by rote without independent thought, whereas AI would entail a greater degree of decision making and autonomy from the agent?

Yes, that’s precisely what it is. The fundamental difference between ML and AI is that an ML model learns from data to make predictions. With AI, the agent or entity making the decisions is starting to visualize and generate hypotheses about new futures and new outcomes which you haven’t necessarily seen in your data yet. That ability to reason and extrapolate beyond what’s known today is a fundamental distinction.

 

Without wishing to go too deeply into the philosophical side of AI, it does raise an interesting question. If computers are able to reason and extrapolate in the way that you describe, is there a sense in which they are alive like we are?

No, I think one of the key distinctions is that we as humans are able to reason this way in many different situations and contexts. The reasoning of AI today falls within a very limited context. An AI agent might be brilliant at playing chess, but has no idea how to make a cup of tea. We are designing agents that can be effective in limited domains. I don’t think we’ve yet seen an agent that can do effective reasoning across multiple domains.

 

“An AI agent might be brilliant at playing chess, but has no idea how to make a cup of tea.”

 

Many consumers are worried about companies holding too much data on them, and the impact that can have on their privacy and freedom. Will increased regulation be necessary in the future to protect consumers, and what implications might this have on the data science as a function?

A lot of regulation is either motivated by a desire for greater transparency, making people understand how an algorithm makes decisions, and fairness. You need to ensure that any AI you deploy today isn’t favouring a particular demography.

Data scientists need to think carefully about how they build models that are explainable and transparent, and fair in how they treat different groups or cohorts of people. In the future, regulation will be much more tighter around these issues as models continue to proliferate in domains where critical decisions rely on predictions such as credit lending, risk and insurance.

 

Given that companies will naturally want to have as much data on their customers as possible to help inform their decision making, is it inevitable that governmental institutions will have to step in and provide regulations around data use, or do you think that industry can regulate itself?

Perhaps we could empower people to have more control over their own data. We should be able to tell companies what data on us they can or cannot use. You could even consider a revenue share agreement where consumers get a share of the revenues that their data creates.

 

How do you retain data scientists? Why might a data scientist leave a particular business?

I think that sometimes a data scientist might leave a business because they don’t see the results of their work. They don’t see their outputs being adopted and stuff gets lost in translation. Often there isn’t a clear path to production of data science learnings, or they aren’t communicated well internally. Secondly, the data science function needs to be aligned properly within the business. Data science projects take time, and the impact of data science won’t always be immediate. You need to invest in data science and it won’t always add tremendous value in the short term, although it likely will in the longer term.

 

From a diversity perspective, how do we encourage more women and other underrepresented groups into data science careers?

It’s a challenge across the technology industry. Representation of women in tech is particularly problematic. The problem often starts at school and university. Not enough women study technical subjects at university, and we must inspire more women to pursue careers in areas like maths and technology.

In industry, women face barriers and challenges too. It’s important to have role models, women and other underrepresented groups in leadership positions and companies need to wilfully over-index in this regard. We also need to make it easier for women to balance their careers with any family or health challenges they have.

 

See also some of our other interviews with leading industry experts;