Our innovations begin with the Data Science team. After all, we’re a data-driven business that lives to explore the possibilities of technology, so we’re creative, take risks and use customer insights to develop the experiences that engage and delight them. Our data experts gather rich insights from our wide customer base and use those to inspire industry-leading features within our apps.
So, what does that mean for our teams? Go time! With access to the best tech stack in the data science, machine learning and data engineering space, they’re always coming up with something new and exciting – it’s all in a day’s work here!
Here’s what members of the Data Science team have to say about the tech stack that the team uses.
Mia O'Dell, GM Data Science
We have a pretty great tech stack in Data Science!
We build many of our models in Databricks using Spark, with data sourced from our growing Data Lake (parquet and delta format) and from our massive AWS Redshift warehouse.
We also build highly custom solutions combining simulation and ML where necessary.
But we don't want to stop there. We're investing heavily in people, process and tech to lift our game further in ML, with a state-of-the-art platform to empower our Data Scientists as we head toward true MLOps.
Kaushik Lakshman, Head of Data Science
The Data Science team uses a variety of tools to deliver their models.
In the risk and trading space, most models are prototyped in R and productionised by specialised devs in C#. In the other DS teams, most of our workflow is Python code developed in a Databricks environment because of its ability to add a Spark layer to our AWS Datastores.
This work is also typically productionised within other Databricks environments using Jenkins CI/CD pipelines, to then be consumed by downstream processes and systems.
While we are in a lucky position to have multiple models in production, our processes are evolving as the ecosystem of tools matures. In addition to continuing to deliver models for business-critical use cases, our key areas of focus in the coming months are ML Ops infrastructure, ML Platforms, Feature Stores etc.
Andre Easom, Head of Data Science
Everything we put into production is in the AWS cloud.
For my team, we have a hybrid approach - we use 'traditional' tools for data science like a range of AWS services, Databricks, R and Python for some of our projects.
The models that power Same Game multi-product offerings are all developed in .net core, so we also work closely with technology teams that are experts in that domain.
John Hannebery, Data Scientist
We are very lucky to have access to a market-leading tech stack in the data science, ML and data engineering space.
We are fully cloud-based now and work very closely with Databricks and AWS, not just as consumers of their products, but on a collaborative level as well - we’ve had hack days with both companies where we have ideated and generated new ideas for the data science, ML and data engineering worlds.
Typically, I use: Python (multiple ML and data analytic libraries), Spark and SQL with Databricks for model development and data exploration, including experiment and model tracking via MLFlow.
Our current ML platform uses Jenkins which we integrate in the back-end with Databricks to allow productionisation of batch machine learning pipelines, and for real-time services (such as our recommender models) we use Docker for containerisation. We are fully on AWS, using a suite of their tools including S3, Glue, CloudFormation, EMR, DynamoDB, EC2 and Kinesis, amongst many others.
Paul Foerste, Senior Data Scientist
Within the Quants team itself, the bulk of the modelling is performed in R, Python and C#, utilising both standard and complex machine learning methodologies including neural networks and gradient boosted.
The team set-up is good in that we are given the opportunity to use the software and methodologies we are familiar with, but at the same time, we are encouraged to work on projects that may expose us to software we aren't as familiar with.
Other software in daily use with the team includes Databricks, and version control software such as Github and Stash. We’re also increasingly making use of various AWS services such as Redshift, Athena, Sagemaker, S3 and Workspaces.