Meetings, but prompting Claude Code under the table (50%) Prompting Claude Code at desk (25%) Reviewing Claude Code output (15%) Taking credit (100%)**We are data engineers, not mathematicians
The Claude Code Adoption Maturity Model
Where is your team? (Hint: if you're reading this book, you're at least Level 1. Congratulations.)
0
Denial
"AI can't write production code." Narrator: It could.
→
1
Curiosity
Uses Claude Code for "small tasks." Secretly impressed. Won't admit it in standup.
→
2
Adoption
Claude Code writes all new code. Engineer reviews. Productivity doubles. Impostor syndrome triples.
→
3
Integration
CI/CD runs Claude Code. MCP servers everywhere. The data platform becomes self-aware. (Not really. Probably.)
→
4
Transcendence
You are Claude Code. Claude Code is you. The distinction no longer matters. Your LinkedIn says "AI Whisperer."
👀
You're reading a free preview. This web version contains the first 3 chapters. The full 451-page book (25 chapters, 4 appendices, and way too many jokes about data engineers) is available as a PDF.
Download the full book →
1
Part I: Foundations
What Is Claude Code? A Data Engineer's Perspective
Before Claude Code, data engineers had to write SQL by hand. Like cavemen. With syntax errors.
Claude Code is not just another code completion tool that suggests variable names while you do all the real work. It is a full-featured command-line agent that understands your entire project, can read and modify files, run shell commands, interact with APIs, and reason through complex architectural decisions. For data engineers, this means a collaborator that can design schemas, write transformation logic, debug pipeline failures, and generate infrastructure code -- all while you focus on the truly important work of attending standups and updating Jira tickets.
Could a human do all this? Technically yes. Should they? The question answers itself.
Key Topics
The AI-Augmented Data Engineer
Understanding the new paradigm where data engineers work alongside AI assistants to dramatically accelerate every stage of the data engineering lifecycle -- from ingestion through serving. Your job title stays the same. Your actual job changes completely.
Why Claude Code for Data Engineering?
Claude Code's agentic capabilities -- file system access, shell command execution, and multi-step reasoning -- make it uniquely suited for the complex, multi-system work of data engineering. Some engineers still debug by reading logs manually. Those engineers also churn their own butter.
How to Use This Book
Each chapter pairs conceptual understanding with hands-on Claude Code workflows. You will learn not just what to build, but how to instruct Claude Code to build it correctly. Then you will take credit for it in the code review. We do not judge.
Honest Moment
This entire book was written by Claude Code. If you find errors, that's a feature -- it keeps you on your toes. Consider it a built-in data quality exercise.
Tip
Install Claude Code globally with npm install -g @anthropic-ai/claude-code before starting. Run claude in any project directory to begin an interactive session with full project context. Then sit back and pretend you're "supervising."
bash
# Install Claude Codenpm install -g @anthropic-ai/claude-code
# Start a session in your data projectcd ~/projects/my-data-pipeline
claude# Ask Claude Code to analyze your project structure"Analyze this data pipeline project. Identify the ingestion,
transformation, and serving layers. Suggest improvements.
Also tell me why my predecessor wrote it this way. Actually,
don't -- I don't want to know."
2
Part I: Foundations
The Data Engineering Lifecycle Meets AI
Every pipeline follows a lifecycle: it's born, it breaks at 3 AM, and you fix it while questioning your career choices. Claude Code changes one of those stages. (Hint: it's not the 3 AM part.)
In this chapter, we map the complete data engineering lifecycle and show how Claude Code integrates with each stage. We also introduce the undercurrents -- security, data management, DataOps, data architecture, orchestration, and software engineering -- that flow beneath every stage and represent the cross-cutting concerns that separate robust pipelines from fragile ones that page you on holidays.
Key Topics
Lifecycle Stages
Generation, ingestion, storage, transformation, and serving. Each stage has distinct challenges, tooling choices, and Claude Code workflows. Each stage also has distinct ways to fail spectacularly in production.
The Undercurrents
Security, data management, DataOps, data architecture, orchestration, and software engineering practices that must be considered at every stage. Or, as most teams call them, "things we'll get to in Q3."
Claude Code Across the Lifecycle
A preview of how Claude Code accelerates each stage, from reverse-engineering source system schemas to generating monitoring dashboards for serving layer SLAs that nobody reads.
Warning
Claude Code can execute shell commands and modify files. Always review its proposed changes before approving, especially when working with production infrastructure. Claude Code is confident. Claude Code is usually right. But "usually" and "production" are words that should never share a sentence without adult supervision.
Real Talk
The lifecycle diagram in this chapter was generated by Claude Code. The previous version, drawn by hand on a whiteboard, has been mercifully destroyed. We do not speak of it.
python
# Example: Using Claude Code to generate a lifecycle audit# Prompt: "Audit this pipeline and map each component to a# lifecycle stage. Identify gaps. Be brutally honest."fromdataclassesimport dataclass
fromenumimport Enum
classLifecycleStage(Enum):
GENERATION = "generation"INGESTION = "ingestion"STORAGE = "storage"TRANSFORM = "transformation"SERVING = "serving"@dataclassclassPipelineComponent:
name: strstage: LifecycleStagetool: strhealth: str# "healthy" | "degraded" | "missing"# Claude Code generated this audit from project analysis# A human would have taken three sprints and two retrospipeline_audit = [
PipelineComponent("Kafka consumers",
LifecycleStage.INGESTION, "confluent-kafka", "healthy"),
PipelineComponent("S3 data lake",
LifecycleStage.STORAGE, "boto3", "healthy"),
PipelineComponent("dbt models",
LifecycleStage.TRANSFORM, "dbt-core", "degraded"),
PipelineComponent("Analytics API",
LifecycleStage.SERVING, "FastAPI", "missing"),
]
forcompinpipeline_audit:
status = "OK"ifcomp.health == "healthy"else"NEEDS ATTENTION"print(f"[{status}] {comp.stage.value}: {comp.name} ({comp.tool})")
3
Part I: Foundations
Getting Started with Claude Code
Getting started with Claude Code is easy. Stopping is impossible. You've been warned.
Claude Code operates as a REPL-style agent in your terminal. Unlike IDE-based assistants that offer timid inline suggestions you can politely ignore, Claude Code takes a conversational approach: you describe what you need, and it executes multi-step plans that can span reading files, writing code, running tests, and iterating on results. This agentic loop is what makes it powerful for data engineering, where a single task might involve modifying a SQL model, updating an Airflow DAG, and adjusting a Terraform config -- all before your coffee gets cold.
Key Topics
Installation & Configuration
Setting up Claude Code, configuring API keys, customizing system prompts with CLAUDE.md files, and integrating with your existing development workflow. The hardest part is explaining to your team why you're suddenly ten times more productive.
The Agentic Workflow Loop
Understanding how Claude Code reads, plans, executes, and verifies. Learning to guide the agent with clear prompts and effective context management. It's like pair programming, except your pair doesn't argue about tabs vs. spaces.
MCP Servers for Data Tools
Connecting Claude Code to databases, cloud consoles, and monitoring tools through the Model Context Protocol (MCP) for direct system interaction. Give Claude Code access to your database. What could go wrong? (Chapter 18 covers what could go wrong.)
Career Advice
After reading this chapter, you will be tempted to use Claude Code for everything -- writing emails, grocery lists, performance reviews. We cannot stop you. We can only document the phenomenon.
Note
The CLAUDE.md file is automatically read by Claude Code when you start a session. Place it at the root of your project to provide persistent context about your architecture, conventions, and common commands. Think of it as an onboarding document that your AI colleague actually reads, unlike every human colleague you've ever had.
markdown
# CLAUDE.md - Project context for Claude Code# (The only documentation anyone will ever read)## Project Overview
This is a batch + streaming data pipeline built with:
- Apache Airflow for orchestration
- dbt for SQL transformations
- Apache Kafka for streaming ingestion
- Snowflake as the primary data warehouse
- Terraform for infrastructure
- Prayers for production stability
## Conventions
- SQL style: lowercase keywords, snake_case naming
- Python: black formatter, isort, type hints required
- All DAGs must include SLA callbacks
- dbt models follow staging > intermediate > marts pattern
- Commits must not contain the word "oops"
## Key Commands
- `dbt run --select tag:daily` - Run daily models
- `airflow dags test` - Test DAG parsing
- `terraform plan -var-file=prod.tfvars` - Preview infra changes
- `claude` - Do all of the above without typing any of the above
That's the preview!
You've reached the end of the free web preview. The remaining 22 chapters cover everything from SQL optimization to real-time streaming to multi-modal pipelines to building a Claude Code-first data platform. All with the same level of humor, practical code examples, and existential dread.
Learn to use Claude Code for reverse-engineering source system schemas, profiling data quality from APIs and databases, generating comprehensive data dictionaries, and building connection wrappers that handle authentication, pagination, and rate limiting automatically. Because reading API documentation is a form of suffering that no longer needs to be endured.
Full content available in the PDF
5
Part II: Data Generation & Ingestion
Data Ingestion Pipelines Powered by Claude Code
Covers batch and micro-batch ingestion patterns using Claude Code to generate extractors, schema validators, idempotent loaders, and comprehensive error-handling strategies. Includes patterns for CDC, full-load, and incremental extraction. Finally, a chapter where "just have Claude Code write it" is actual professional advice.
Full content available in the PDF
6
Part II: Data Generation & Ingestion
Working with Unstructured Data
PDFs, images, audio, and the dreaded "Excel files from finance." Claude Code helps parse, classify, and structure data that was never meant to be structured. If your data source is a ZIP file of screenshots attached to an email thread, this chapter is for you.
Full content available in the PDF
7
Part III: Storage & Transformation
Data Modeling with Claude Code
Dimensional modeling, Data Vault, and wide-table patterns. Claude Code generates star schemas from business requirements, creates ERD documentation, and scaffolds the full model implementation in your warehouse of choice. It even understands slowly changing dimensions, which is more than can be said for most humans.
Full content available in the PDF
8
Part III: Storage & Transformation
SQL Generation and Optimization
Claude Code writes SQL that is not only correct but also performant. It analyzes query plans, suggests index strategies, rewrites correlated subqueries, and explains why your 47-line CTE chain is actually doing a full table scan. Prepare to feel personally attacked by an AI's code review.
Full content available in the PDF
9
Part III: Storage & Transformation
dbt and Transformation Pipelines
Building production-grade dbt projects with Claude Code. From generating staging models and writing complex window functions to implementing incremental materializations, generic tests, and comprehensive documentation that someone might theoretically read someday.
Full content available in the PDF
10
Part III: Storage & Transformation
Data Quality and Testing
Unit testing, integration testing, and data quality frameworks for pipelines. Claude Code generates Great Expectations suites, dbt tests, and anomaly detection rules. Because data quality is like flossing: everyone agrees it's important, nobody does it enough, and Claude Code actually follows through.
Full content available in the PDF
11
Part IV: Serving & Analytics
Building Analytical Queries with Claude Code
Building semantic layers, materialized views, and analytics APIs. Claude Code generates BI-ready data marts, creates metric definitions, and builds the serving infrastructure that stakeholders interact with. Stakeholders who will inevitably ask for "just one more column."
Full content available in the PDF
12
Part IV: Serving & Analytics
Reverse ETL and Operational Analytics
Pushing transformed data back to operational systems. Claude Code builds sync pipelines to CRMs, marketing platforms, and product databases while handling conflict resolution and idempotency. Yes, we put data into a warehouse just to take it back out. The circle of data life.
Full content available in the PDF
13
Part IV: Serving & Analytics
Machine Learning Feature Engineering
Designing and implementing feature stores for ML teams. Claude Code helps define feature schemas, build point-in-time correct training datasets, and create real-time feature serving endpoints. All so your ML team can train a model that predicts something you could have figured out with a SQL query.
Full content available in the PDF
14
Part V: Orchestration & Infrastructure
Orchestrating Data Pipelines with Claude Code
Building production Airflow DAGs with Claude Code. Covers dynamic DAG generation, custom operators, SLA monitoring, complex dependency patterns, and migrating from legacy schedulers like cron. The DAGs write themselves now. You just have to explain to management why you still need a team.
Full content available in the PDF
15
Part V: Orchestration & Infrastructure
Infrastructure as Code with Claude Code
Terraform, Pulumi, and CloudFormation for data infrastructure. Claude Code generates IaC modules for data warehouses, Kafka clusters, Airflow deployments, and networking configurations. It even handles the Terraform state file, which is more than most humans are willing to do without hazard pay.
Full content available in the PDF
16
Part V: Orchestration & Infrastructure
DataOps and CI/CD
Docker, Kubernetes, GitHub Actions, and the complete DataOps toolkit. Claude Code creates optimized Dockerfiles, CI/CD workflows, and deployment strategies tailored for data workloads. Continuous deployment of data pipelines: because breaking things manually was too slow.
Full content available in the PDF
17
Part VI: Security, Governance & Ethics
Data Governance with Claude Code
Implementing data catalogs, lineage tracking, and quality frameworks. Claude Code generates data contracts, builds lineage graphs from existing code, and creates governance documentation. It's like having a compliance officer who actually understands SQL.
Full content available in the PDF
18
Part VI: Security, Governance & Ethics
Security and Access Control
Encryption, access control, and secrets management for data platforms. Claude Code audits existing security configurations, generates IAM policies, and implements column-level encryption patterns. Also, a reminder to stop putting credentials in your .env file and committing it. You know who you are.
Full content available in the PDF
19
Part VI: Security, Governance & Ethics
Ethics of AI in Data Engineering
Responsible use of AI in data engineering. Understanding when to trust AI-generated code, managing intellectual property concerns, avoiding bias in automated decisions, and maintaining human oversight. The chapter where we briefly pretend we have reservations about automating everything.
Full content available in the PDF
20
Part VII: Advanced Patterns
Claude Code as a Data Engineering Agent
Autonomous pipeline creation, self-healing data systems, and agent-driven architecture decisions. In this chapter, Claude Code stops being your assistant and starts being your colleague. It has opinions. It will share them. You will agree, because they are usually correct.
Full content available in the PDF
21
Part VII: Advanced Patterns
RAG for Data Engineering
Retrieval-Augmented Generation applied to data engineering workflows. Using your own documentation, schemas, and historical queries as context for Claude Code to produce better, more contextual results. Teaching an AI to learn from your mistakes so you never have to learn from them yourself.
Full content available in the PDF
22
Part VII: Advanced Patterns
Real-Time Data Processing with Claude Code
Lambda and Kappa architectures, event sourcing, and CQRS patterns. Claude Code helps design and implement architectures that serve both real-time dashboards and batch analytics from unified pipelines. Real-time data: for when your stakeholders need to watch numbers go wrong live instead of finding out the next morning.
Full content available in the PDF
23
Part VII: Advanced Patterns
Multi-Modal Data Engineering
Processing images, audio, video, and documents alongside structured data. Claude Code helps build pipelines that handle the full spectrum of data types, because apparently CSVs were not enough and now everything is a data source. Even that whiteboard photo from the all-hands.
Full content available in the PDF
24
Part VIII: The Future
Building a Claude Code-First Data Platform
Designing a data platform from scratch with Claude Code at the center of every workflow. Architecture patterns, team structures, and operating models for organizations that have fully embraced the "Just Use Claude Code for Everything" philosophy. The org chart has one box and it says "Claude Code." The humans are in the footnotes.
Full content available in the PDF
25
Part VIII: The Future
The Future of Data Engineering with AI
Where data engineering is headed. The convergence of AI and data infrastructure, the evolving role of the data engineer, and what skills matter when your primary tool can already do most of your job. Spoiler: "prompting" is now a marketable skill, and your parents still don't understand what you do for a living.
Full content available in the PDF
About This Book
This book was written entirely by Claude Code in a single afternoon. The human "author" was busy getting coffee. We kept their name off the cover out of kindness.
Rather than treating AI assistance as a gimmick bolted onto the final chapter, every chapter integrates Claude Code workflows directly into the data engineering process. You will learn how to think about problems, how to describe them to an AI agent in plain English, and how to take credit for the results in your performance review.
Hands-On Workflows
Every chapter includes real Claude Code sessions showing exact prompts and outputs. Copy them. Paste them. Tell your boss you wrote them from scratch. We won't tell.
Production Patterns
Not toy examples. Full production patterns for ingestion, transformation, orchestration, and infrastructure. The kind of code that survives contact with real data and real stakeholders.
Brutal Honesty
We tell you when Claude Code is amazing, when it's wrong, and when it confidently generates code that will bankrupt your cloud budget. Consider it a public service.