This is a repost of my feature on the Silectis blog.
With our Magpie platform, you can build a trusted data lake in the cloud as the shared foundation for analytics in your organization. Magpie brings together the tools data engineers and data scientists need to integrate data, explore it, and develop complex analytics.
The importance of data engineering is on the rise, with organizations increasingly investing in talent and infrastructure. Here at Silectis, we are in the fortunate position of working with a wide range of enterprises across multiple industries. I caught up with a few members of the team to take note of some of the data engineering trends we anticipate seeing more of this year and beyond.
1. Scarce skills drive product evolution & unification
There continues to be a serious shortage in data engineering talent. This has been driven by the relatively recent emergence of “data engineering” as an organized discipline, the fact that data engineering is sometimes perceived as unglamorous relative to its cousin, data science, and the high hurdle for new entrants trying to become productive. The position requires knowledge of cloud services, analytics databases, ETL tools, big data platforms, DevOps, and the fundamentals of the business, all of which make it tough to know where to start.
Technology providers have not been doing enterprises any favors by building technology products organized around one thin slice of the tech stack rather than creating a more integrated experience that allows team members to level up and makes it easier to get going.
Fortunately, this challenge should improve. Over the course of the next year, I anticipate vendors integrating more capabilities into their offering either through acquisitions or further development of their platforms. These advancements will reduce the integration burden on customers and make it easier for more users to participate in the data engineering lifecycle. We are already seeing vendors adding more usefulness to their stack by including more of the end-to-end experience (like data import and data visualization) and trying to serve a broader group of users. Examples include Snowflake’s introduction of Snowsight and Databricks’ introduction of SQL Analytics.
Note that this does not necessarily mean centralization – data may still be stored across a number of systems. However, the way an organization interacts with that data and prepares it for analytics will trend towards a single, dedicated platform. Our product, Magpie, is an example of a platform that was built from the ground up to serve the full end-to-end data engineering workflow.
– Matt Boegner, Data Architect at Silectis
2. Investment in data engineering talent and tooling
Today, many resumes mention “proficiency in data analytics,” and just like “proficiency in word processing,” it means very little. The marketplace is inundated with tools to help people visualize organized data, and board rooms are full of charts that support making big, important decisions. Similarly, organizations have been pursuing attractive (and expensive) ideas in machine learning & AI. Behind the scenes, organizations are slow to make concurrent investments in data engineering talent and tooling. Pulling together and prepping all this data remains a growing and massively unattended challenge. The technical skills needed to build and maintain data infrastructure are in high demand and it will be difficult to hire and retain resources without breaking the bank.
The demand for data engineering resources within enterprises is growing, and we expect to see companies expand their investments in hiring and data engineering systems in 2021 and beyond.
– Demetri Kotsikopoulos, CEO of Silectis
3. Drag-and-drop GUI-based ETL tools are useful but, not a panacea
The popularity of no/low-code for building software and web products is not reaching the same level of popularity in data engineering. Drag-and-drop data tools are great for small companies or getting an analytics environment off the ground, but organizations are recognizing their limitations. For many, needs may be better met by more formalized “data ops,” with tools for declarative configurations, database versioning, and pipeline development frameworks seeing market traction.
Many of the legacy graphical-user interface (GUI) tools like Informatica and Talend are inherently constrained in terms of their functionalities and viewed as too brittle for production-level applications. Once again, these tools do simplify development in many early-stage scenarios and speed up integrations to external tools like marketing and CRM systems. However, we do not believe that these tools will play as large of a role in 2021 given the wider range of use cases required by many organizations.
– Brendan Freehart, Data Engineer at Silectis
4. Popularity of notebook interfaces continue to rise
Notebook interfaces have been extremely popular in data science for years. In many ways, they are the default way that data scientists explore, collaborate, and display information. Popular open source projects like Jupyter and Apache Zeppelin run notebooks that are preconfigured with Python, Scala, or R packages and increase the speed at which data professionals iterate in their work.
Notebooks will continue to gain traction among data engineers in 2021. We shouldn’t have to configure an IDE for someone to be able to explore a data source, nor should our first chance to visualize data be in business intelligence tools like Looker or Tableau after a data pipeline has been established. Notebooks allow data engineers to mix and match languages as the task requires, while documenting and visualizing intermediate steps along the way.
– Matt Boegner, Data Architect at Silectis
5. Data platforms that are unable to scale are non-starters
I’ve talked to many companies that leverage incredible scale in tools like Amazon S3 or Azure Data Lake Storage, while at the same time, they are dealing with legacy systems that can’t keep up with demands. While companies do not usually need scale right away, almost every organization has a horror story about dealing with these kinds of constrained and costly systems.
Especially given the business circumstances over the past year, I think that many business leaders are plotting to save money by finally shutting down their non-performant cloud systems. Replacements have to be able to scale and meet future needs.
– Clay Buckalew, Director of Sales at Silectis
6. Multi-cloud may still be a pain for most organizations
A few years ago, companies tended to set up all of their infrastructure on either AWS or GCP with only a handful of services operating elsewhere. Nowadays established companies aim to deploy across clouds for several reasons. First, companies must compare pricing and competitive concerns while avoiding vendor lock-in. A rise in bandwidth costs could kill a streaming service. Similarly, an online retailer may think twice about using AWS when Amazon is a major competitor.
Second, companies optimize for performance and the finding the “right tool for the job”. Examples include companies moving to GCP to more easily handle Google Analytics data or to run Tensorflow on high-powered GPU’s, or plugging existing Microsoft systems into Azure’s Active Directory for security reasons.
Finally, there is the reliability and disaster recovery argument. Some organizations aim to have complete redundancy across clouds in an effort to maintain uptime. A few minutes of downtime for financial or eCommerce systems during holiday shopping could cost millions.
However, operating multi-cloud in a truly streamlined way is still really hard. The decision to go multi-cloud usually stems from the concerns above rather than the day-to-day needs of the business. For data-intensive organizations, you have to consider numerous challenges like data consistency and network throughput in addition to standard obstacles like security and reliability. That’s why using platforms that already operate across all major cloud platforms can save a lot of time and energy.
– Jon Lounsbury, VP of Engineering at Silectis
7. Partnerships signal that enterprises seek to gain more market share together
In 2020, we noticed a few unlikely partnerships crop up, signaling strategic alliances between complementary companies in the data space. There are many narrow lanes in the great data landscape, with technologies offering tooling for a single use case or step in the process of data engineering. We’re beginning to see more organizations join forces with each other, some through official partnership programs, such as the Databricks and Matillion partnership announcement.
Over the next year, we expect to see additional partnerships arise as companies align to earn more market share as a joint solution for data engineering. Additionally, we expect to see more large players formally announce partner programs so that smaller technologies and tools can add value to larger platforms with integrated capabilities.
– Rishon Roberts, Marketing Manager at Silectis