So You Want To Be A Data Engineer

So You Want To Be A Data Engineer
6 min read
05 November 2020

You want to be a Data Engineer? Great choice, Data Engineers, help make machine learning possible. But what does engineering have to do with data? What does the data engineer even do? 

Before we can answer those. We need to figure out what data is.

What is data?

Data is discrete pieces of information and it's everywhere. Each color, every pixel or dot in the picture, is a piece of data. Together, they form shades and textures and the whole image. The audio is made of data that describes the frequency and volume for each point in time.  With the ease of capturing and storing digital information has come a massive explosion in data. According to the World Economic Forum, the amount of data or captured information in the world is in the tens of exabytes.

How big is an exabyte you ask? The device or computer you're having can probably hold between 4 and 16 gigabytes of memory. Let's say a gigabyte is the size of the earth. And an exabyte is the size of the sun. So You Want To Be A Data EngineerWhen we're dealing with this much data, we need specialized ways to manage, manipulate, and work with all of it.

Data is all around us

Data can be used for many types of decisions. And so many companies have changed their business models to be data-driven.

Netflix is a great example, as they use their vast internal data to improve its service for the users. Ever wonder how you seem to always have a great movie or TV recommendation? Each time you read it, a TV show or movie, you were telling Netflix a bit about your preferences. With your ratings, data and other ratings, more data from users like you, they were able to predict how much you'd like a movie or TV show on their service.

Implications of cloud-scale data

With big amounts of data, you need big computing systems such as the cloud. Before the world of cloud, you'd have to figure out how much data you'd want to generate over the next few years, create hardware requirements, purchase vast arrays of disks, pay for somewhere to store it all, maintain them, uh, set it up, deal with disk failures and on and on and on.

With cloud computing, you can store data in the cloud by paying a very small fee. You no longer need to focus on data storage and can instead focus on data processing and getting value out of it all. The data is in the cloud. 

Working with cloud data

If data is the new oil, as some people think it is your job as a Data Engineer is to build pipelines, to get the data where it needs to be. In fact, pipeline is the formal name for the series of steps it takes to get data from where it is to where it can be used.

Imagine all the data in the world is oil fields. Your pipeline covers every step from the original mining, to transport, refining and finally delivery to your local fuel station, where you can add it to your car to get where you want to go.

In data engineering, this means building a system that extracts data from one or multiple sources, convert it into something we can use and then makes it available. The system building is where engineering meets data. Delivering data to the cloud is now automated. 

I don't know about you, but staring at all the letters and numbers as raw data, isn't very appealing to me. Luckily, this is where data meets art, translating this alphabet soup of data into something visual. It means people can now see the importance of the information in an instant.

Data Engineers gather data from all these different sources, which data analysts and other data professionals use to create meaningful representations of it. Such as maps, charts, and reports beyond just representing the data, data scientists, run experiments and comparisons on the data. Trying to find new questions to ask as well as new avenues of exploration to answer those questions. And now the data makes sense.

Skills

 If this sounds and looks like just your thing. What are the skills you need?

Database

Database and file systems will be your friend. Understand how they work, after all you have to both get the data from somewhere, make sense of it and then store it again.

Process

Once data is needed in a format it becomes very important to be able to consistently deliver that data in the same format into, well, eternity. Creating processes that ensure this is part of the job, using cloud computing a lot of this work can be automated.

Cloud skills

Any modern Data Engineer will work in the cloud either exclusively or to a large extent, becoming familiar and comfortable with one of the main cloud providers is critical.

The data tells me that you want to know more about how to become a data engineer. A great way to start your journey into the world of big data, compute, overload, and fancy graphs, are through the courses, labs and other learning resources, such as:

In case you have found a mistake in the text, please send a message to the author by selecting the mistake and pressing Ctrl-Enter.
Alex 9.8K
Joined: 4 years ago
Comments (0)

    No comments yet

You must be logged in to comment.

Sign In / Sign Up