Uncategorized

Data Engineers vs Data Scientists

March 1, 2022

1552

Interesting article on data engineers vs data scientists from https://gradientflow.com/ is given below. One of the main issues for building ML/AI/DL models is availability of good, clean, relevant and large datasets. Very few companies have the required data for building advanced deep learning / AI models. Companies that have been collecting all sorts of business data for decades like Google, Amazon, etc have a huge advantage. It is visible in their products, services, recommendations, etc. One hot topic being debated in the ML/DL/AI field is whether we need more data engineers or data scientists. The answer is not a simple yes/no, it depends on the data maturity of the company and how much business value it wants to derive from building these models. Depending on that the company will invest/update/use its technology infrastructure to derive real business benefit from building the models. But data is the key, the models will only be as good as the data.

There are ways to generate synthetic data when there is a lack of data, but that is a different topic and will be covered later in detail.

Ratio of Data Scientists to Data Engineers

As companies get more proficient in using data and AI to drive decision making and operations, team members with disparate backgrounds – analysts, product mangers, decision makers – begin using data on a regular basis. But when they’re first starting out, the requisite data may not be in place, and data processing and analysis tend to be left to data scientists and data engineers. At that early stage, a data platform that is accessible to less technical users may still be under development.

A **fun** topic of discussion among leaders of data teams is the ratio between the number of data scientists and data engineers. There is no ideal answer. It really depends on the tools and infrastructure you have in place, the maturity and availability of use cases for data and AI, and how you exactly define specific roles and titles. Usage of the title “data scientist” varies widely. Some companies have data scientists who are essentially business analysts capable of running adhoc queries (SQL) and advanced analytics (using some GUI based tool). At the other extreme are companies who employ data scientists who routinely write production code and deploy data pipelines and machine learning models. To add to the confusion, some companies even use the same job title – “data scientist” – for different sets of employees who closely resemble the two very different examples I just laid out!

With that said, the ratio of data scientists to data engineers may still be a useful indicator to gauge the level of engagement and maturity of a data team. As a data team grows and their tools improve, data engineers are able to “support” more data scientists, and those data scientists are empowered to do more on their own. In the chart¹ below, smaller data teams (45 members or less) have on average about the same number of data scientists and data engineers. As a data team grows in size – a likely sign that a company has invested in better tools and processes – the ratio shifts to about two data scientists per data engineer.

RoboAdvisory Algorithm using Macroeconomic data

RandomForest Regression model for predicting US 10 year Treasury Bond Prices…

DataWisdomX – Data Science course – Introductory videos to all lectures

Data Science – End 2 End Beginners Course Part 1 –…

RoboAdvisory Algorithm using Macroeconomic data

RandomForest Regression model for predicting US 10 year Treasury Bond Prices…

DataWisdomX – Data Science course – Introductory videos to all lectures

Data Science – End 2 End Beginners Course Part 1 –…

RandomForest Regression model for predicting US 10 year Treasury Bond Prices…

DataWisdomX – Data Science course – Introductory videos to all lectures

Data Science – End 2 End Beginners Course Part 1 –…

KDnuggets – Top Data Science, Machine Learning Methods Used, 2018/2019

RandomForest Regression model for predicting US 10 year Treasury Bond Prices…

DataWisdomX – Data Science course – Introductory videos to all lectures

Data Science – End 2 End Beginners Course Part 1 –…

Youtube – MIT OpenCourseWare – Statistics lecture series

YouTube tutorials – Stanford NLP Lecture series

Data Engineers vs Data Scientists

EDITOR PICKS

RoboAdvisory Algorithm using Macroeconomic data

RandomForest Regression model for predicting US 10 year Treasury Bond Prices...

DataWisdomX – Data Science course – Introductory videos to all lectures

POPULAR POSTS

Pandas for Data Wrangling – tutorial, cheat sheet

ML Map – Choosing the right algorithm for your problem

Geoffrey Hinton, Father of Deep Learning, research articles page

POPULAR CATEGORY

Multimodal model for healthcare

Comparing AWS, Azure, Google Cloud for AI/ML model training, MLOps, GPU...