Advancing the Data Science Industry: Factored Gives Back

02.12.2021

Factored Data Experts

At Factored, we pride ourselves on our high-caliber data engineers, data analysts, and machine learning engineers. They also pride themselves on their high quality work. Something we also value is giving back and contributing to our wider community. This is why we launched Factored Gives Back, an initiative where Factors have the opportunity to share their time and knowledge with others in a bid to advance the data science, AI and wider technology ecosystem in Latin America.

One such initiative in 2021 was our work with Data Science Fem, a group of women based in Colombia who are dedicated to studying data science with the aim of contributing to the industry, enhancing their careers and discovering ways to improve society thanks to data science and its various applications and implementations.

Our work with Data Science Fem consisted of three 3-hour Saturday morning sessions. These sessions covered overviews of crucial elements to data science but also, most importantly, insight into practical applications of data science and useful tools necessary for building a solid career in data science and machine learning. Our analysts and engineers were not only teachers but also willing mentors available to Data Science Fem members to answer any queries they wanted to raise with people actively working in the industry.

Session 1: Data Exploration and Visualization

The first of the Saturday sessions covered the basics of data exploration and visualization, key to building any data science career.

Specific topics taught included how to manage Python environments, managing data using pandas and scikit-learn, using datasets stored in the cloud, and best practices for processing using pandas.

In terms of data visualization, our data engineers and analysts discussed how to visualize data with matplotlib and seaborn as well as insight into the different graphic types and how to manage graphic formats using axes and figures.

Finally, our team touched on the ever-important topic of collaborative workflow using GitHub, including how to create a repository, Git commit best practices, and how to submit a Pull request. All of these topics, and their discussion during the session, provided a solid foundation in the area of data visualization and exploration for Data Science Fem attendees.

Session 2: Training Machine Learning Models

The second session saw our team dive into the exciting world of machine learning and how to train machine learning models.

First, they covered methods for dividing data to train, validate and test. Then they looked at how to use scikit-learn pipelines (and why) for processing data with .fit() and .transform() functions.

The session also addressed the important topic of training a machine learning model that surpasses a certain performance threshold and necessary subtopics in this area including the API for scikit-learn models, how to manage evaluation metrics and how to manage processes using MLflow.

At the end of this second session, our team shared their code solutions with the Data Science Fem attendees so they could compare their code and detect any potential errors, all in the name of improving their skills and better preparing them for a promising career in data science.

Session 3: Model Deployment

The third and final session detailed how to actually deploy machine learning models, after having prepared all the data and trained the models.

Our expert engineers and analysts discussed how to write a model’s API using FastAPI, how to generate endpoints, and how to run an application locally and check it with data inputs.

Finally, they outlined how to build a Docker container to deploy the API, how to write a Dockerfile, and how to create and run Docker containers to put the finishing touches on model deployment. That’s how our team covered the foundations for cleaning and preparing data, training machine learning models and finally getting to the exciting point of deploying models in real-world contexts, important topics not just for a data science education but for understanding how data science is applied in wider society.

How Did the Data Science Sessions Go?

Well, we’re presuming that the data science sessions went pretty well, considering the positive feedback we received from participants.

To be honest, our team was just happy to be able to share their knowledge with an engaged audience and contribute in a small way to the wider data science community in Latin America but we have to admit that we enjoyed receiving the feedback we did. Here are some of the comments from the Data Science Fem participants:

“Thank you for your time and your patience as teachers who were always willing to respond to all our questions.“

“This training came at the best time for me to help me improve my skills and find a better job.”

“Everything was very challenging, I loved every single presentation!”

Being great at what you do is certainly motivating but at Factored we believe that giving back to the wider community and data science industry is what truly matters, which is why we’re so proud to have launched our Factored Gives Back initiative! Watch this space for future workshops, sessions and events.

If you’re interested in finding out more about what it’s like to work at Factored and what a compelling career in data analytics, data engineering or machine learning engineering could look like, get in touch today.

Many thanks to all Factored engineers and analysts involved in this initiative, namely Carlos P, Valentina C and Daniela S.