Data Engineering

What skills a Data Engineer really needs

Why data engineering has long since stolen the show from data science in terms of importance and career opportunities, but is itself subject to constant change.

Data Engineer Job Profile

Is the Data Scientist the sexiest job of the 21st century? Maybe, because the job has its own special appeal, also due to its interface function between technology and specialist expertise. But the spotlight of the coming years has long belonged to another job description from the data value chain – and this is also reflected in salaries.

Many companies are currently on the path to data-driven business, a corporate management that relies on transparent data bases for its decisions and automates operational processes as much as possible using business intelligence, data science, and automation with deep learning and RPA. The solution for these tasks is often sought primarily from the experts for process automation and data science, but success depends much more on the procurement of valid data bases, and thus on a completely different decisive position in the workflow of data-driven decision processes, the data engineer.

Data Engineer, the most sought-after job of the 21st century?

The job of a data scientist, on the other hand, is still more in demand than ever among students and graduates of STEM subjects, as evidenced by the daily rush of many graduates from courses related to data science to such job postings. There is also no longer any shortage of international applicants with a focus on statistics and machine learning. The solidly trained and at best still German-speaking Data Scientist is still hardly to be found on offer, but overall good candidates are no longer too hard to find. For years, many qualification offers for students as well as workers on the market have also been available online at low cost and quite flexibly, without having to accept any compromises in the reputation of these education and training measures.

But what is the use of a Data Scientist if he does not have the data needed for his tasks? Certainly, the task of every Data Scientist is also the preparation and presentation of his projects. However, the acquisition and management of large amounts of data in an enterprise-capable architecture is fundamentally not his or her focus, and he or she often lacks the authorizations for this in an enterprise IT environment. The need for data acquisition and preparation becomes even more concrete in business intelligence, because this requires fixed structures such as a data warehouse for sustainable reporting.

The Data Engineer`s Profile: Big Data High-Tech

Even if data engineering is still treated somewhat stepmotherly by universities and training providers, the use and the resulting requirement profile of a data engineer are outlined quite clearly on the market. The core deployment scenarios for these data engineers – also an acceptable designation in German – are the creation of data warehouse and data lake systems, now primarily on cloud platforms. They develop these systems for tapping into internal and external data sources and prepare the data volumes obtained structurally and in terms of content so that they can be used appropriately by other employees in the company.

Enabler for Business Intelligence, Process Mining and Data Science

No data engineer should lose sight of the actual consumer of the data, for whom the data should be merged, cleaned and brought into the target format according to all rules of the art. Classically, engineers work on data warehousing for business intelligence or process mining, for which more and more event logs are needed. A data warehouse is the underwater, much larger part of the business intelligence (BI) iceberg that feeds reports with qualified data. This iceberg analogy can also be applied to data engineering as a whole, which is usually hardly visible to the end users at the upper end of the data food chain, because they only see the finished analyses and not the data pots prepared for them.

Databases are both: Source and target in Data Engineering

Data is rarely available directly structured in a single CSV file, but comes from one or more databases that are subject to their own rules. Business data, for example from ERP or CRM systems, is stored in relational databases, often from Microsoft, Oracle, SAP or as an open source alternative. The cloud-native databases BigQuery from Google, Redshift from Amazon and Synapse from Microsoft, as well as the cloud-independent database snowflake, are particularly popular at present. These are joined by databases such as PostgreSQL, Maria DB or Microsoft SQL Server as well as CosmosDB or simpler cloud storage such as Microsoft Blobstorage, Amazon S3 or Google Cloud Storage. Whichever database is the right choice for the business, nothing works in data engineering without SQL and an understanding of normalized data.
Other types of databases, called NoSQL databases rely on file formats, a columnar or a graph orientation. Examples of widespread NoSQL databases are MongoDB, CouchDB, Cassandra or Neo4J. These databases do not just exist as entertainment value for bored nerds, but have very specific areas of application in which they offer the best performance in reading or writing data.
A data engineer must therefore be able to cope with different database systems, some of which are at home on different cloud platforms.

Data Pipelines are Key Responsibility

One of the core tasks of the data engineer is the development of ETL routes to extract data from sources, transform it into the desired target format and finally load it into the target database. This may sound simple at first, but it becomes a real challenge when many ETL processes combine to form entire ETL chains and networks, which have to run efficiently despite the high frequency of data queries.

Future and salary prospects

Compared to the Data Scientist, who requires a particularly high level of methodological understanding for data analysis, statistics and also for the subject area being investigated, Data Engineers are more oriented towards tools and platforms. A Data Scientist who has understood Deep Learning can quickly apply his knowledge with both TensorFlow and PyTorch. A Data Engineer, on the other hand, works more intensively with the tools that evolve much more rapidly over the years. A data engineer for the Google Cloud will need more training should he suddenly have to work on AWS or Azure.
In Germany, a data engineer can expect a gross annual salary of between EUR 45,000 and EUR 55,000 as an entry-level employee with good previous knowledge and initial experience. Companies are happy to reward more than two years of concrete experience in data engineering with salaries between EUR 50,000 and 80,000. As a rule, only Data Architects / Data Architects, who are more likely to be found in large companies and require a particularly high level of experience, are paid more. Other career opportunities for data engineers include consulting careers or management positions.
However, those who have brought a Data Engineer into a permanent position should not feel too secure, because recruiters lurk around every corner of social media for these qualified professionals. Especially in metropolitan areas like Berlin, by no means all companies manage to employ every Data Engineer for years. With so many jobs and challenges to choose from, it’s not hard for these data experts to proactively drive their salary increases by changing jobs.

DATANOMIQ is the independent consulting and service partner for business intelligence, process mining and data science. We are opening up the diverse possibilities offered by big data and artificial intelligence in all areas of the value chain. We rely on the best minds and the most comprehensive method and technology portfolio for the use of data for business optimization.