Deploying data science solutions has changed since the pandemic.
Instead of living in silos, data must now be accessible across teams—and the entire enterprise. As Alex Ratner, Co-founder and CEO of Snorkel AI, recently told McKinsey:
“Many technologies today are highly cross-functional, and to successfully implement and evaluate them, you need cross-functional teams. An AI project that relies on institutionally separated data, IT, infrastructure, data science, subject-matter experts, and business-line teams that don’t work closely together is therefore almost guaranteed to fail.”
And this wasn’t the only lesson learned about data science during the pandemic—just think about how remote team communication went to the top of many organizations’ priority lists in a matter of weeks.
In this post, we give our engineers’ top advice for the dos and don’ts they’ve learned over the past few years, including what to do and what to avoid at all costs if you want to successfully deploy efficient data science solutions.
DOs
Take your time defining the problem.
Problem definition tends to be overlooked by a great portion of data teams nowadays. That’s because many teams assume that the less time you spend on “soft” stuff, the more time you will have to code and build your data science project.
This couldn’t be further from the truth. After all, having a well-defined and properly-scoped problem makes the job easier and more efficient, while reducing the time spent on finding a solution. You won’t be able to take the fastest route if you don’t know the destination you’re heading to!
Always start by establishing your users.
Starting a project is never easy—and it only gets worse if there is no clear starting point. So where should your team start?
The best way to start sketching a data science project is by determining its potential users, which leads to the definition of specific use cases for each type of user. This enables a more organized way of developing, since it can be divided into small products (for each use case) that will be integrated at the end.
For example, if we take a simplification of Amazon, we could determine use cases by focusing on the user types:
Buyers
- Search for products by name, category, price, etc.
- Add products to the cart
- Purchase all (or some) products from the cart
Sellers
- Offer a product in the platform
- Accept or deny a purchase
- See client shipping information
After having these use cases, the project can be developed in a much more organized way.
DON’Ts
Avoid trying to eat the whole cake in one bite.
Many projects go through a lot of bumps, twists, turns, and crashes due to the desire to finish everything quickly and include every possible feature from the start. This is not recommended, since it can lead to big blockers down the road.
Remember, deploying a solution all at once means that you will receive user feedback for all features at the same time, possibly causing an overflow of issues and bugs. Focus on a Minimum Valuable Product (MVP) first, deploy it, receive feedback, fix issues, and then continue with new additions or features to the product.
Do not choose complexity over explainability.
One of the most common errors we see when deploying data science solutions—especially machine learning predictive models—is to go for a very complex approach that has better performance, but isn’t easy, and is at times simply impossible to explain.
Balancing both traits is difficult, but a simple and understandable solution with good performance should almost always prevail over a “black-box” solution with slightly higher performance. Always choose what is best for the business, not for the data team; trust us, your stakeholders and your future self will thank you!
Moving Forward With Data Science Solutions
So what’s the way forward? MLOps, or machine learning operations. As Rodney Zemmel, senior partner at McKinsey & Company puts it:
“It isn’t just about clever data science algorithms, but everything from data quality to testing and validation to bias checking and security. MLOps provides the opportunity to create an operating system of people and technology that can make the process of creating each new application dramatically easier, helping to enable the whole business.”
To sum it all up, don’t waste your time struggling with deploying data science solutions and don’t just guess. Instead, get experts on board; that way, you’ll see results in no time.
Book a meeting with Factored today to find out how we can help you start deploying data science solutions right away.