Data is quickly becoming the world’s currency and one that businesses are increasingly making use of. However, it’s not enough to simply own data. For data to become useful to a business, it needs to be cleaned, organized, and presented in a digestible way for all stakeholders. This need is paving the way for a new type of hybrid employee, the Analytics Engineer. Here, we’ll take a look at what an Analytics Engineer does and why this is becoming a vital role in data science.
You might think that once the data is in the data warehouse, you should be able to create machine learning models, deliver insightful business analysis or create dashboards to help make key decisions. But is the data really ready to use? If you’ve felt the pain of an incomplete data catalog, stale data and unreliable business definitions, then you know that the road from data gathering to tangible business outcomes is a long and winding one…
The Data Conundrum
Every company has felt this pain: knowing that relevant data exists in your organization, yet every attempt at using it to deliver business value takes longer than it should. Let’s explore this problem through the following situation: the data engineer has built the systems to collect your business data into the data warehouse.
However, there are thousands of tables, and none of them have documentation. As a result, the data is far from ready to be used by analysts, data scientists, machine learning engineers (MLEs), or business stakeholders. Frankly, you know that your business would be much better off if you stopped for a second to iterate on the data curation, and your analyses, dashboards, and machine learning models would also be grateful. This is the basic tenet of data-centric-AI, after all.
At this point, you won’t disagree with us when we say that fundamental steps are missing from your data pipeline, including data cataloging, data quality assessment, business logic validation, as well as business-aware transformations.
So, the question then becomes: how can we catalog, test, and transform the data in a reliable, reproducible, and scalable way so that we can bridge that gap? To answer that question, you need a new type of analyst. One that understands how to test, transform, model and catalog data to be ready for the MLEs, data scientists, or data analysts to consume.
Why You Might Need an Analytics Engineer
With a combination of business and data expertise, an analytics engineer uses the systems created by the data engineer to design and implement the transformation steps of an ELT pipeline to transform the correct data into the proper format for data consumption. The following skills enable these crucial steps:
- With a correct understanding of the business and the data needs, they can perform basic data validation at the beginning of the pipeline.
- Thanks to their knowledge of good software practices, they write the modular data transformations in clean, testable code using a version control system like git. They also continuously build and test the code following CI/CD best practices;
- They perform the transformations using state-of-the-art data transformation tools, like dbt;
- They ensure that the data is ready to be consumed by writing business-aware tests at the end of the pipeline via frameworks like Great Expectations.
The Analytics Engineer owns the data catalog. Having ensured the data validation and having completed the data transformation, they are the ideal person to write the data catalog. This encompasses field definitions; Service Level Agreements (SLA) for the freshness of the data; data transforming pipeline documentation, both at the code level as well as to document the business logic behind them; and writing the tests that ensure the accuracy of the data.
They can publish data products so that all data consumers can now leverage their work. Usually, the data documentation, data construction pipelines, and tests are made available to the whole company through a company-wide data discovery tool. In turn, this cataloging process guarantees that everyone in the company can readily understand how the data is generated, what it means for the data to be correct, agree on vital metric definitions, and what fields can and should be used for a given analysis.
And, when the need arises, this new analyst must broaden their scope and complete the pipeline to deliver relevant and accurate insights to business stakeholders.
What Skills Do You Need to Look For in a Successful Analytics Engineer?
The analytics engineer, by definition, sits at the intersection of multiple disciplines. Therefore, it is crucial that they can talk business just as readily as they can understand what restricts and hinders successful data analysis and model building. In addition, the analytics engineer is a bridge between data engineering and the data consumers in the company. Thus, they are critical to the success of the entire data journey.
If you think that this description sounds like a unicorn, think twice. Most likely, there are multiple people in your organization that have taken on these sorts of tasks. They are the ones that everyone looks for when embarking on a new data project. They are the repository of data knowledge that glues together several parts of the company.
Most of these professionals have a background that taught them some combination of the following skills:
- They were data analysts in the past. Thus, they understand what it takes to translate business problems into analytics solutions.
- They can perform advanced, modular SQL-based transformations to create data models ready for data consumption, often leveraging new data stack tools like dbt.
- They have enhanced their software skills by following current best practices like version control & CI/CD. Therefore, they have written SQL based transformation pipelines with clean, testable code, tracked with version control systems like git, and continuously tested following CI/CD best practices.
- They have owned and cataloged data products so they can be used for a wider audience. They can build excellent data documentation in company-wide data discovery tools.
The Future of Data
By bringing this new type of analytics and engineering hybrid to the forefront, we hope to shine some light on some underappreciated but crucial parts of every company’s data journey. If you’re finding that you could benefit from the skills of an analytics engineer, reach out! At Factored, we have been involved in the data ecosystem for years, and it’s at our very core, so we know the pains and gains of it.
If you feel that you lack any of the skills or capabilities mentioned above, we can help you. Our extensive expertise in analytics, data engineering, and machine learning engineering makes us the ideal partners to help you get started on your analytics engineer journey.
Head here to book a meeting and chat about adding a world-class analytics engineer to your team.
Thanks to Adriana C, David S, Antony H et al. for their work on this initiative.