AI Is Coming for the Traditional Data Analyst. Here's What to Do About It.
Two articles dropped this week that solidified my decision to upskill aggressively and become irreplaceable in this industry: dbt Labs' 2026 Semantic Layer vs. Text-to-SQL benchmark and Anthropic's Project Glasswing announcement. Read together, they paint a picture that every data professional needs to see.
TL;DR:AI now matches or beats most analysts at writing SQL queries. With a semantic layer, accuracy hits 98 to 100%. Meanwhile, Anthropic's unreleased Mythos model is finding vulnerabilities that survived over 20 years of human review. AI capabilities are doubling every six months. The traditional “take a question, write a query, send back a chart” analyst role is being automated. The path forward is going deeper: data modeling, platform engineering, compliance, infrastructure as code, and domain expertise. I'm upskilling aggressively and you should be too.
I started my career as a Sales Operations Intern and Data Analyst in 2019 and worked my way up to Senior Data Engineer. The good thing about working at startups is you learn quick, and if you listen more than you talk, you learn a lot. Pair that with my own time spent studying AI, full stack development, and AI infrastructure, and I've had a front row seat to how fast this industry is moving. I wrote some janky queries. My coworkers wrote some questionable queries. I've spent more hours than I'd like to admit cleaning up queries that technically ran but produced numbers no one should have trusted. And the process was slow. Scoping the question, planning the approach, writing the SQL, validating the results, formatting the output, second-guessing whether a join was right. A single “quick question” from a stakeholder could eat half a day.
If I were starting that same role today, half of what I did would already be automated. And unlike the queries I was writing in 2019, the automated version would probably be more consistent.
The benchmark that should make every analyst pay attention
dbt Labs just published their 2026 Semantic Layer vs. Text-to-SQL benchmark. The headline: LLMs now hit 90% accuracy writing analytical SQL queries against well-modeled data. Pair that with a semantic layer and you're looking at 98 to 100% accuracy. Deterministic, reproducible, no variation between runs.
Compare that to the reality I lived. Three analysts writing the same metric would get three different numbers depending on how they interpreted the join logic, which filter they applied, or whether they remembered that one edge case with the legacy billing system. We validated results by gut feel and spot checks. The AI doesn't have that problem. It doesn't get tired. It doesn't forget the edge case. And it doesn't need 45 minutes to plan an approach. It generates the query in seconds.
The accuracy gap between text-to-SQL and the semantic layer approach isn't even about the AI being smarter. It's about the data being better organized. When the dbt team added just three additional models on top of the raw tables, accuracy jumped from decent to near-perfect. The bottleneck was never the person writing the query. It was the data platform underneath.
And AI isn't slowing down. It's accelerating.
Here's what makes this more urgent than a single benchmark: AI capabilities are roughly doubling every six months. What you saw in 2023 looks primitive compared to 2026. What we have in 2026 will look primitive by 2027.
If you want proof that we're on an exponential curve, look at what Anthropic just announced with Project Glasswing. Claude Mythos Preview, a frontier model that hasn't even been publicly released yet, found thousands of zero-day vulnerabilities across every major operating system and web browser. It discovered a 27-year-old vulnerability in OpenBSD that survived 27 years of human security review. It found a 16-year-old flaw in FFmpeg in a line of code that automated testing had hit five million times without catching it. It autonomously chained together multiple Linux kernel vulnerabilities to escalate from user access to full system control.
This isn't a model that's marginally better at pattern matching. This is a model that reasons about complex systems more thoroughly than teams of expert humans who've been staring at the same code for years. Anthropic brought together AWS, Apple, Google, Microsoft, NVIDIA, CrowdStrike, and JPMorganChase to deploy it defensively because the capability is that significant.
Now map that trajectory onto data analytics. If AI can autonomously find vulnerabilities that survived 20+ years of expert review and millions of automated tests, how long before it can handle the analytical complexity of your company's data warehouse? The answer is: it mostly already can. And the next generation will close whatever gaps remain.
We don't know exactly when Mythos-class models will be generally available, but Anthropic has said they plan to launch safeguards with an upcoming Opus model to pave the way. The capability exists. It's a matter of when, not if, it reaches the broader market.
The job that's disappearing
The traditional data analyst workflow looks like this: someone from marketing Slacks you a question, you spend 20 minutes understanding what they're actually asking, you spend another 30 minutes writing and debugging a query, you validate the results against something that looks reasonable, you format it into a chart or a table, and you send it back. Rinse and repeat 15 times a week.
I lived that loop for years. And I'll be honest, the quality was inconsistent. Not because anyone was bad at their job, but because humans writing ad hoc SQL against complex schemas are going to make mistakes. We join on the wrong key. We forget a filter. We interpret “active customer” differently than the person who wrote the model upstream. We produce numbers that are close enough to pass a sniff test but aren't actually right.
When a product manager can ask an AI assistant a question and get a correct, sourced, deterministic answer in 8 seconds from a system that never has a bad day, never forgets a join condition, and never misinterprets a metric definition. The value proposition of a human doing the same thing in 45 minutes starts to collapse.
How I see this playing out
Phase 1 (happening now): Companies adopt AI assistants connected to their semantic layers. Routine questions get answered without human involvement. Analysts notice the volume of ad hoc requests dropping.
Phase 2 (next 12 to 18 months): Organizations realize they need fewer analysts for reporting but more people who can model the data well enough for AI to query it reliably. The dbt benchmark proved this. The modeling layer is what makes the AI accurate. Someone has to build and maintain it.
Phase 3 (2 to 3 years out):The “data analyst” title either evolves or gets absorbed. The people who moved upstream into modeling, engineering, and governance are thriving. The ones who stayed in the query-and-chart lane are competing with a $20/month subscription. And with Mythos-class reasoning capabilities becoming mainstream, the bar for what AI can handle autonomously will be far higher than anything we're benchmarking today.
Where the opportunities are
I saw this coming. That's why I started building a consultancy on the side in October 2025. Not because I was in any way ready, but because the window to get ahead of this curve was closing and I saw it. I didn't have a perfect plan, a brand, a pipeline, or a first client. I had a hypothesis, a laptop, and the conviction that waiting until I felt “ready” was going to put me on the wrong side of the curve. Six months in, I don't regret a single uncomfortable decision it took to get here.
And that's the bet I'd make for anyone reading this. You don't need to have it figured out. You need to start moving before the curve passes you. The people who will do well over the next five years are the ones who make bets while everyone else is still debating whether the bet is worth making.
I'm glad I made that transition early because it gave me a front-row seat to where things are headed. But even as a data engineer, I'm not standing still. The job is moving deeper into abstraction layers, governance, and platform architecture. It's not enough to build pipelines anymore. You need to build the systems that make AI-powered analytics reliable.
The same benchmark that threatens traditional analysts reveals exactly where humans remain essential:
Data modeling and semantic layer design.The AI can't model your business for you. It doesn't know that your company defines “active customer” differently than the industry standard, or that certain revenue gets recognized on a weird schedule because of a legacy billing system. Someone has to encode that logic into a semantic layer. That person needs to understand both the business and the technical layer, and that's a far more valuable skill set than writing SELECT statements.
Data engineering and platform infrastructure. The pipelines, transformations, data quality frameworks, and observability that make AI-powered analytics possible still require human expertise. If anything, demand is increasing as companies realize their data needs to be AI-ready. The dbt benchmark showed that three models made the difference between 65% and 100% accuracy. Building those models is engineering work.
Compliance and governance.In regulated industries like healthcare and insurance, you can't let an AI query your data unsupervised. Someone needs to ensure the models enforce access controls, the metrics are auditable, and the outputs meet regulatory requirements. With Project Glasswing showing that AI can find vulnerabilities in code that's been reviewed for 20+ years, the need for rigorous security and governance in data systems is only growing. This is specialized, high-value work.
Strategic analysis.AI can answer “what happened?” It's getting decent at “why did it happen?” It's still not great at “what should we do about it?” The kind of synthesis that requires domain expertise, organizational context, and judgment. If that's not where you spend most of your time as an analyst, you're in the automation crosshairs.
Every time my friends ask me what to study, I say data and AI
I sound like a broken record at this point, but the numbers back me up. The Bureau of Labor Statistics projects 36% job growth for data engineers and 34% for data scientists through 2034, some of the fastest growth rates across all tech roles. The global data engineering market exceeds $120 billion in 2026 and is growing at 14 to 18% annually. Worldwide AI spending is forecast to hit $2.52 trillion in 2026, a 44% year-over-year increase. The World Economic Forum's Future of Jobs Report ranks AI and big data as the fastest-growing skill area globally.
And here's the kicker: 90% of AI and ML projects depend directly on data engineering pipelines. Every company deploying a large language model needs someone to build and maintain the data infrastructure feeding it. The models get all the headlines. The pipelines make them work.
Meanwhile, 39% of workers' existing skills are expected to become outdated between 2025 and 2030. The share of AI/ML jobs in the tech market jumped from 10% to 50% between 2023 and 2025. Entry-level roles are getting squeezed, but senior engineers with specialized skills in data platforms, compliance, and AI infrastructure are commanding premiums. AI domain specialists earn 30 to 50% more than generalists at the same experience level.
If you're trying to figure out what to study, what to build your career around, or where to point your energy for the next 10 years, the answer is staring at you.
The upskilling playbook (and what I'm actually doing about it)
I'm not writing this from the sidelines. Here's what my own playbook looks like right now:
Building a business.I started a consultancy, partly to do the work I want to do, partly to give myself the space to study data and AI on my own terms. When you control your schedule, you can invest in learning at a pace that a nine to five doesn't always allow.
Front End and Back End development.I'm learning to build full applications, not just the data layer. Understanding how front end and back end systems work together makes you a more complete engineer and opens doors that staying in the data silo never will. When you can build the product that sits on top of the data, you're no longer waiting for someone else to make your work useful.
OOP and MP Programming.Object-oriented and multiparadigm programming. I'm going beyond scripting in Python and learning how to think about software architecture, design patterns, and systems-level code. This is what separates a data engineer who writes scripts from one who builds platforms.
AI infrastructure.Not just using AI tools, but understanding how they're built, deployed, and served. Model serving, vector databases, embedding pipelines, RAG architectures. The companies deploying AI at scale need engineers who understand the infrastructure underneath. That's where I'm investing.
APIs.Designing, building, and consuming APIs is foundational to everything in modern data and AI. If you can't build a well-structured API, you're limited in what you can integrate, automate, or expose to downstream consumers.
Platform engineering and Infrastructure as Code.Terraform, Kubernetes, cloud architecture. This is where the industry is headed. The data engineer of 2027 isn't someone who writes dbt models and calls it a day. It's someone who can provision and manage the entire platform those models run on. IaC skills are a moat because they're hard to learn, hard to automate, and critical to every production system.
ML fundamentals.Not to become a machine learning engineer, but to understand the tooling well enough to build platforms that serve ML workloads. Databricks, cloud ML certifications, understanding how models are trained, deployed, and monitored. This is table stakes for senior data roles going forward. If you can't speak the language of the AI systems your platform supports, you're going to get left behind.
Compliance frameworks.ISO 27001 Lead Implementer, SOC 2, HIPAA. I'm pursuing these because compliance expertise paired with engineering skills is one of the most defensible positions in the market. AI can write your SQL. It can't navigate your regulatory audit.
Gen 2 Data Warehousing. The warehouse landscape is evolving fast. Lakehouse architectures, Apache Iceberg, separation of storage and compute, real-time streaming layers on top of batch foundations. The next generation of data platforms looks fundamentally different from what most teams are running today, and the engineers who understand where things are headed will architect the systems everyone else operates on.
Investing in domain expertise.The people who survive aren't generalists who can write a passable query in any domain. They're specialists who understand healthcare data models, insurance claim lifecycles, or financial reporting requirements deeply enough to know when an AI output is wrong before anyone else notices.
The bottom line
The dbt benchmark is a snapshot of where analytics AI stands in April 2026. Project Glasswing is a snapshot of where frontier AI capability stands, and it's terrifying in the best possible way. These models aren't improving linearly. They're on an exponential curve, and every six months the goalposts move.
The trajectory is clear. AI is not going to partially automate the traditional analyst role. It's going to fundamentally replace the query-and-report workflow that defines it. And it's going to do it faster than most people expect, especially once Mythos-class models reach general availability.
That's not a tragedy. It's an upgrade. The work that remains for humans is more interesting, more impactful, and more valuable. But only if you start moving toward it now.
The analysts who build the layers will outlast the ones who query them. Start building.
More doom and gloom coming from me soon. Keep an eye out.
Want help building the data platform or AI infrastructure that makes your team irreplaceable?
Contact Us