
Every data engineering team operates on a set of accepted principles, a playbook of “best practices” intentionally designed to ensure scalability, governance, and performance.
But with cloud-native systems, explosive data growth, and the rise of AI, a critical question dominates the data leaders’ mindshare: What happens when the trusted playbook is the very thing holding you back? This tension between established process and modern velocity is something Mayank Bhola, Co-Founder and head of products at LambdaTest, says he constantly confronts in his work. He argues that true speed comes not from unchecked freedom, but from smart constraints.
“Empowering engineers is critical,” Bhola explains, “but it’s not about letting a thousand flowers bloom in a thousand different ways. It’s about providing a paved road—a standardized, reliable framework—so they can drive as fast as they want without worrying about building the road itself. This approach trades chaotic autonomy for guided velocity, which is far more scalable and sustainable.”
Bhola’s perspective on balancing freedom with frameworks is a microcosm of a larger challenge in data engineering. Many of the industry’s most respected “rules” were established in a different era, and following them blindly can lead to the kind of hidden architectural and process debt that hinders progress and innovation.
The data hoarder’s playbook
Perhaps the most pervasive “best practice” of the last decade has been the mantra to centralize all possible data, driven by the fear of missing out on some future analytical insight. The guiding principle was to collect everything now and figure out the use cases later, but leaders now warn that this approach creates more chaos than clarity. Srujan Akula, CEO of The Modern Data Company, argues that this philosophy of pipelining without purpose leads directly to complexity and waste.
“A widely accepted practice in data engineering is to pipeline everything. Ingest as much as possible, centralize it in a warehouse or lake, and figure out the use cases later. It sounds scalable, but in reality, it creates unnecessary complexity and very little business value. Pipelines get built without a clear purpose. Logic is duplicated, ownership is unclear, and teams maintain jobs no one actually uses. Even after centralization, the data often isn’t trusted or usable because context is missing and semantics are inconsistent. What I’ve seen work better is starting with intent. Define the decision, product, or metric you’re trying to support, then build backward. This keeps pipelines focused, logic clean, and outcomes aligned with what the business actually needs.”
This makes a strong case for flipping the traditional model on its head. Building pipelines without a clear purpose is a primary reason data platforms become slow and begin failing fast. By starting with the intended business outcome before building any pipeline and defining the decision or metric to be supported first, teams can ensure their engineering efforts are focused, purposeful, and directly aligned with measurable business value.
This “collect now, model later” approach is also challenged by Angshuman Rudra, Director of Product at TapClicks, who sees the popular “schema-on-read” mindset as a way of avoiding necessary architectural conversations.
“The schema-on-read (push all your data to a lake) mindset gained traction when data lakes became popular around 2011. More recently, lakehouse architectures have emerged to address some of the data lake’s limitations, adding ACID transactions, schema enforcement, and better support for analytics use cases. But here’s my opinion: Thinking about your data model will never go away, and you can’t skip it. Whether you’re using a data lake or a lakehouse, the idea that you can capture everything now and figure it out later works, until it doesn’t. It simply pushes the harder conversations – around modeling, ownership, and governance into the future. Without upfront modeling or intentional structure and thinking through use cases (at a high level), data becomes hard to discover, harder to trust, and nearly impossible to govern. Analysts spend more time decoding columns than generating insights. The promise of flexibility ends up delivering confusion and cleanup debt.”
The pragmatic approach is to find a balance, using the flexibility of a lakehouse for raw ingestion while committing to early, incremental data modeling for valuable datasets. Instead of abandoning modeling entirely, leaders should foster a practice of defining ownership and structure as soon as data begins to serve a real business use case.
This critique extends to the very goal of centralization itself, particularly when it comes to supporting the dynamic, real-time needs of the business. Ganeshkumar Palanisamy, Principal Software Architect at Reltio, explains how this approach fails when speed and latency are critical.
“A widely accepted but counterproductive practice in data engineering is the heavy reliance on centralized data warehouses for all analytics needs. At Reltio, this approach caused latency and complexity for real-time applications like compliance and fraud detection. Instead, we adopted a federated, cloud-native architecture using tools like Snowflake and Google BigQuery to process data closer to its source. This reduced transaction times by 300% and improved scalability. I advocate for distributed systems over centralized warehouses to better support dynamic, real-time use cases.”
This highlights the need to move beyond a one-size-fits-all architectural strategy, carefully evaluating whether a centralized warehouse can truly meet the latency demands of mission-critical, real-time applications. It points to a larger strategic shift advocated by Amit Saxena, Vice President and General Manager of Workflow Data Fabric at ServiceNow, who believes the true goal is not just aggregating data, but orchestrating it to drive action.
“A common misstep in data modernization is treating centralization as the finish line. It’s not. Aggregating data can provide insight, but orchestration is transformational. Real value is found when data is connected to context, accessible in real time, and able to move across systems, workflows, and decision points, regardless of its source system. If your data just sits in a warehouse without driving action, you’ve built a liability, not a platform.”
Leaders should therefore measure the success of their data platform not by the volume of data it holds, but by its ability to connect and move data across workflows and decision points. The focus must shift from building a static reservoir of data to creating a dynamic system that powers business action.
The autonomy paradox
Just as the “collect everything” mantra has come under scrutiny, so has the practice of how ingestion pipelines are built. A common belief in agile environments is that granting individual engineers full autonomy over their pipelines promotes speed and ownership, allowing them to move quickly without being slowed by central oversight. But without a foundational framework to guide that freedom, this well-intentioned empowerment can quickly turn into chaos, leading to a fragmented and unmaintainable platform. David Forino, Co-Founder and CTO at Quanted, makes a compelling case for a more balanced approach, arguing that complete autonomy at the ingestion stage is a recipe for long-term technical debt.
“Handing data engineers full autonomy over ingestion logic is often framed as best practice, but it fragments the platform fast. When every team builds custom pipelines for evaluating new data, you lose consistency in validation, versioning, and metadata handling. I’ve seen this lead to redundant work, hidden errors, and slow onboarding across teams. What works better is to centralise the ingestion layer, define contracts early, and give teams flexibility at the transformation stage instead. That separation keeps the platform maintainable without slowing experimentation.”
This reinforces the “paved road” philosophy advocated by Bhola of LambdaTest. He adds, “The goal of a central framework isn’t to restrict engineers, but to free them from solving the same problems over and over. When you standardize the complex, low-level plumbing of ingestion—things like observability, credential management, and validation—you empower your teams to focus on the high-value transformation logic that actually serves the business. You give them a faster, safer vehicle, not a list of destinations they aren’t allowed to visit.”
This calls for leaders to establish a clear architectural principle: centralize for consistency, but decentralize for speed. By building a standardized, centrally managed ingestion layer that handles core validation and versioning, you can then safely empower individual teams with the flexibility they need to transform and experiment with the data downstream.
The governance trap
Beyond how data is ingested, the “best practices” for how data is governed are also failing to keep pace with modern realities. Models for data access and structure that were designed for a more static and centralized world are now cracking under the pressure of explosive scale and the rise of artificial intelligence. A prime example of this is Role-Based Access Control (RBAC), a long-standing best practice for managing permissions. Mo Plassnig, Chief Product Officer at Immuta, argues that this model, while effective in the past, is simply not built to handle the scale of today’s data consumers.
“Role-Based Access Control (RBAC) is often treated as a best practice, but it does not scale in a world where agents and automated systems dramatically increase the number of data consumers. Instead of tens of thousands of employees accessing data, you now have hundreds of thousands of identities. Trying to manage that with a traditional RBAC model, by continually adding more roles, leads to role bloat that overwhelms the system and slows down the organization.”
This signals a clear need for leaders to evaluate more dynamic, attribute-based access control systems that can scale beyond manually curated roles. The future of data governance lies in automating policy enforcement based on the attributes of the user, the data, and the context of the request itself.
This same pressure to evolve beyond legacy models applies not just to how data is accessed, but also to how it is structured. Jared Peterson, Senior Vice President of Platform Engineering at SAS, takes this a step further, questioning the long-term viability of even the most established data modeling practices in the face of generative AI.
“I’m going to go with common data models. While these can be incredibly valuable, they tend to be similar to the pot of gold at the end of the rainbow. As GenAI gets better and better, will we essentially be able to transform data into a desired state simply by stating our desires or intent in a conversation?”
While common data models still offer value today, leaders should begin preparing for a future where data agility may be prioritized over rigid conformity. The recommendation is to invest in tools and architectures that support dynamic, on-the-fly data transformation, enabling the business to adapt to new requirements without being locked into a single, predefined view of the data.
The procurement pitfall
The final outdated “best practice” that leaders identified has less to do with technical architecture and more with the business philosophy behind how technology is acquired. There is a deeply ingrained assumption in many large organizations that expensive, proprietary, “enterprise-grade” solutions are inherently superior and represent a safer choice for mission-critical systems. But Alan DeKok, CEO of InkBridge Networks and founder of the open-source FreeRADIUS project, passionately argues that this mindset is not only financially inefficient but also strategically shortsighted.
“Here’s my counterintuitive take: start with commodity solutions that work, not enterprise solutions that impress procurement committees. The biggest mistake I see in data platform modernization is the assumption that expensive equals better. Companies spend $1 million on ‘enterprise-grade’ solutions when commodity hardware running open-source software would outperform them at 1/10th the cost. Your smartphone has more processing power than the enterprise servers that used to run major data centers 20 years ago. The ‘low-end’ solutions of today become the high-end solutions of tomorrow, while the expensive enterprise solutions become legacy tech debt you’ll need to migrate away from in five years. Focus on fundamentals that scale: proper authentication architecture, documented configurations, and solutions with active communities. Don’t chase vendor lock-in disguised as ‘enterprise features.’”
This advises a fundamental shift in procurement strategy, urging leaders to prioritize solutions based on their scalability, community support, and freedom from vendor lock-in, rather than on their price tag or enterprise branding. The most resilient and cost-effective platforms are often built on a foundation of well-supported open-source components and commodity hardware that can be scaled and adapted over time.
From dogma to first principles
The common thread running through all of these challenged “best practices” is a clear rejection of rigid, top-down dogma. Whether it’s the impulse to centralize all data without purpose, the reliance on outdated governance models, or the assumption that expensive enterprise tools are always better, the leaders on our panel are advocating for a fundamental shift. They are calling for a move away from inflexible rules and toward a more adaptable, first-principles approach to building and managing data platforms.
This presents a challenge for every data leader and practitioner to look inward at their own team’s established ways of working. Instead of simply asking “Are we following best practices?”, the more powerful question becomes “Which of our unquestioned rules and default assumptions are truly serving our goals, and which are creating the very friction we are trying to eliminate?”
Challenging old habits and dismantling ineffective practices clears the necessary ground for building something better, but it is only the first step.