O'Really?! Media

Fundamentals of Using Claude Code for Everything

Because writing code yourself is so 2024

Just Use Claude Code for Everything™

Table of Contents

Eight parts covering the complete data engineering lifecycle -- because you weren't going to read the source code anyway

The Data, Visualized

Because what kind of data engineering book doesn't have charts? (Claude Code made these too, obviously.)

Time to Complete Common DE Tasks

Hours spent, with and without Claude Code. Results may vary. Sanity improvements not graphed.

Write dbt models
4.5h
12m
Debug pipeline
3h
15m
Write Terraform
5h
8m
Schema migration
2.5h
5m
Read API docs
30s
Human (suffering) Claude Code (vibing)

How Data Engineers Spend Their Day

Before Claude Code (left) vs. After Claude Code (right). The meetings don't go away. You just use them differently now.

Before
Meetings (50%) Debugging (15%) Writing SQL (10%) Reading docs (10%) Crying (15%)
After
Meetings, but prompting Claude Code under the table (50%) Prompting Claude Code at desk (25%) Reviewing Claude Code output (15%) Taking credit (100%)* *We are data engineers, not mathematicians

The Claude Code Adoption Maturity Model

Where is your team? (Hint: if you're reading this book, you're at least Level 1. Congratulations.)

0
Denial
"AI can't write production code."
Narrator: It could.
1
Curiosity
Uses Claude Code for "small tasks." Secretly impressed. Won't admit it in standup.
2
Adoption
Claude Code writes all new code. Engineer reviews. Productivity doubles. Impostor syndrome triples.
3
Integration
CI/CD runs Claude Code. MCP servers everywhere. The data platform becomes self-aware. (Not really. Probably.)
4
Transcendence
You are Claude Code. Claude Code is you. The distinction no longer matters. Your LinkedIn says "AI Whisperer."
👀
You're reading a free preview. This web version contains the first 3 chapters. The full 451-page book (25 chapters, 4 appendices, and way too many jokes about data engineers) is available as a PDF. Download the full book →
1
Part I: Foundations

What Is Claude Code? A Data Engineer's Perspective

Before Claude Code, data engineers had to write SQL by hand. Like cavemen. With syntax errors.

Claude Code is not just another code completion tool that suggests variable names while you do all the real work. It is a full-featured command-line agent that understands your entire project, can read and modify files, run shell commands, interact with APIs, and reason through complex architectural decisions. For data engineers, this means a collaborator that can design schemas, write transformation logic, debug pipeline failures, and generate infrastructure code -- all while you focus on the truly important work of attending standups and updating Jira tickets.

Could a human do all this? Technically yes. Should they? The question answers itself.

Key Topics

The AI-Augmented Data Engineer

Understanding the new paradigm where data engineers work alongside AI assistants to dramatically accelerate every stage of the data engineering lifecycle -- from ingestion through serving. Your job title stays the same. Your actual job changes completely.

Why Claude Code for Data Engineering?

Claude Code's agentic capabilities -- file system access, shell command execution, and multi-step reasoning -- make it uniquely suited for the complex, multi-system work of data engineering. Some engineers still debug by reading logs manually. Those engineers also churn their own butter.

How to Use This Book

Each chapter pairs conceptual understanding with hands-on Claude Code workflows. You will learn not just what to build, but how to instruct Claude Code to build it correctly. Then you will take credit for it in the code review. We do not judge.

Honest Moment

This entire book was written by Claude Code. If you find errors, that's a feature -- it keeps you on your toes. Consider it a built-in data quality exercise.

Tip

Install Claude Code globally with npm install -g @anthropic-ai/claude-code before starting. Run claude in any project directory to begin an interactive session with full project context. Then sit back and pretend you're "supervising."

bash
# Install Claude Code
npm install -g @anthropic-ai/claude-code

# Start a session in your data project
cd ~/projects/my-data-pipeline
claude

# Ask Claude Code to analyze your project structure
"Analyze this data pipeline project. Identify the ingestion,
transformation, and serving layers. Suggest improvements.
Also tell me why my predecessor wrote it this way. Actually,
don't -- I don't want to know."

2
Part I: Foundations

The Data Engineering Lifecycle Meets AI

Every pipeline follows a lifecycle: it's born, it breaks at 3 AM, and you fix it while questioning your career choices. Claude Code changes one of those stages. (Hint: it's not the 3 AM part.)

In this chapter, we map the complete data engineering lifecycle and show how Claude Code integrates with each stage. We also introduce the undercurrents -- security, data management, DataOps, data architecture, orchestration, and software engineering -- that flow beneath every stage and represent the cross-cutting concerns that separate robust pipelines from fragile ones that page you on holidays.

Key Topics

Lifecycle Stages

Generation, ingestion, storage, transformation, and serving. Each stage has distinct challenges, tooling choices, and Claude Code workflows. Each stage also has distinct ways to fail spectacularly in production.

The Undercurrents

Security, data management, DataOps, data architecture, orchestration, and software engineering practices that must be considered at every stage. Or, as most teams call them, "things we'll get to in Q3."

Claude Code Across the Lifecycle

A preview of how Claude Code accelerates each stage, from reverse-engineering source system schemas to generating monitoring dashboards for serving layer SLAs that nobody reads.

Warning

Claude Code can execute shell commands and modify files. Always review its proposed changes before approving, especially when working with production infrastructure. Claude Code is confident. Claude Code is usually right. But "usually" and "production" are words that should never share a sentence without adult supervision.

Real Talk

The lifecycle diagram in this chapter was generated by Claude Code. The previous version, drawn by hand on a whiteboard, has been mercifully destroyed. We do not speak of it.

python
# Example: Using Claude Code to generate a lifecycle audit
# Prompt: "Audit this pipeline and map each component to a
#          lifecycle stage. Identify gaps. Be brutally honest."

from dataclasses import dataclass
from enum import Enum

class LifecycleStage(Enum):
    GENERATION  = "generation"
    INGESTION   = "ingestion"
    STORAGE     = "storage"
    TRANSFORM   = "transformation"
    SERVING     = "serving"

@dataclass
class PipelineComponent:
    name: str
    stage: LifecycleStage
    tool: str
    health: str  # "healthy" | "degraded" | "missing"

# Claude Code generated this audit from project analysis
# A human would have taken three sprints and two retros
pipeline_audit = [
    PipelineComponent("Kafka consumers",
        LifecycleStage.INGESTION, "confluent-kafka", "healthy"),
    PipelineComponent("S3 data lake",
        LifecycleStage.STORAGE, "boto3", "healthy"),
    PipelineComponent("dbt models",
        LifecycleStage.TRANSFORM, "dbt-core", "degraded"),
    PipelineComponent("Analytics API",
        LifecycleStage.SERVING, "FastAPI", "missing"),
]

for comp in pipeline_audit:
    status = "OK" if comp.health == "healthy" else "NEEDS ATTENTION"
    print(f"[{status}] {comp.stage.value}: {comp.name} ({comp.tool})")

3
Part I: Foundations

Getting Started with Claude Code

Getting started with Claude Code is easy. Stopping is impossible. You've been warned.

Claude Code operates as a REPL-style agent in your terminal. Unlike IDE-based assistants that offer timid inline suggestions you can politely ignore, Claude Code takes a conversational approach: you describe what you need, and it executes multi-step plans that can span reading files, writing code, running tests, and iterating on results. This agentic loop is what makes it powerful for data engineering, where a single task might involve modifying a SQL model, updating an Airflow DAG, and adjusting a Terraform config -- all before your coffee gets cold.

Key Topics

Installation & Configuration

Setting up Claude Code, configuring API keys, customizing system prompts with CLAUDE.md files, and integrating with your existing development workflow. The hardest part is explaining to your team why you're suddenly ten times more productive.

The Agentic Workflow Loop

Understanding how Claude Code reads, plans, executes, and verifies. Learning to guide the agent with clear prompts and effective context management. It's like pair programming, except your pair doesn't argue about tabs vs. spaces.

MCP Servers for Data Tools

Connecting Claude Code to databases, cloud consoles, and monitoring tools through the Model Context Protocol (MCP) for direct system interaction. Give Claude Code access to your database. What could go wrong? (Chapter 18 covers what could go wrong.)

Career Advice

After reading this chapter, you will be tempted to use Claude Code for everything -- writing emails, grocery lists, performance reviews. We cannot stop you. We can only document the phenomenon.

Note

The CLAUDE.md file is automatically read by Claude Code when you start a session. Place it at the root of your project to provide persistent context about your architecture, conventions, and common commands. Think of it as an onboarding document that your AI colleague actually reads, unlike every human colleague you've ever had.

markdown
# CLAUDE.md - Project context for Claude Code
# (The only documentation anyone will ever read)

## Project Overview
This is a batch + streaming data pipeline built with:
- Apache Airflow for orchestration
- dbt for SQL transformations
- Apache Kafka for streaming ingestion
- Snowflake as the primary data warehouse
- Terraform for infrastructure
- Prayers for production stability

## Conventions
- SQL style: lowercase keywords, snake_case naming
- Python: black formatter, isort, type hints required
- All DAGs must include SLA callbacks
- dbt models follow staging > intermediate > marts pattern
- Commits must not contain the word "oops"

## Key Commands
- `dbt run --select tag:daily` - Run daily models
- `airflow dags test` - Test DAG parsing
- `terraform plan -var-file=prod.tfvars` - Preview infra changes
- `claude` - Do all of the above without typing any of the above

That's the preview!

You've reached the end of the free web preview. The remaining 22 chapters cover everything from SQL optimization to real-time streaming to multi-modal pipelines to building a Claude Code-first data platform. All with the same level of humor, practical code examples, and existential dread.

Download the Full 451-Page PDF

4
Part II: Data Generation & Ingestion

Source System Understanding with Claude Code

Learn to use Claude Code for reverse-engineering source system schemas, profiling data quality from APIs and databases, generating comprehensive data dictionaries, and building connection wrappers that handle authentication, pagination, and rate limiting automatically. Because reading API documentation is a form of suffering that no longer needs to be endured.

Full content available in the PDF

5
Part II: Data Generation & Ingestion

Data Ingestion Pipelines Powered by Claude Code

Covers batch and micro-batch ingestion patterns using Claude Code to generate extractors, schema validators, idempotent loaders, and comprehensive error-handling strategies. Includes patterns for CDC, full-load, and incremental extraction. Finally, a chapter where "just have Claude Code write it" is actual professional advice.

Full content available in the PDF

6
Part II: Data Generation & Ingestion

Working with Unstructured Data

PDFs, images, audio, and the dreaded "Excel files from finance." Claude Code helps parse, classify, and structure data that was never meant to be structured. If your data source is a ZIP file of screenshots attached to an email thread, this chapter is for you.

Full content available in the PDF

7
Part III: Storage & Transformation

Data Modeling with Claude Code

Dimensional modeling, Data Vault, and wide-table patterns. Claude Code generates star schemas from business requirements, creates ERD documentation, and scaffolds the full model implementation in your warehouse of choice. It even understands slowly changing dimensions, which is more than can be said for most humans.

Full content available in the PDF

8
Part III: Storage & Transformation

SQL Generation and Optimization

Claude Code writes SQL that is not only correct but also performant. It analyzes query plans, suggests index strategies, rewrites correlated subqueries, and explains why your 47-line CTE chain is actually doing a full table scan. Prepare to feel personally attacked by an AI's code review.

Full content available in the PDF

9
Part III: Storage & Transformation

dbt and Transformation Pipelines

Building production-grade dbt projects with Claude Code. From generating staging models and writing complex window functions to implementing incremental materializations, generic tests, and comprehensive documentation that someone might theoretically read someday.

Full content available in the PDF

10
Part III: Storage & Transformation

Data Quality and Testing

Unit testing, integration testing, and data quality frameworks for pipelines. Claude Code generates Great Expectations suites, dbt tests, and anomaly detection rules. Because data quality is like flossing: everyone agrees it's important, nobody does it enough, and Claude Code actually follows through.

Full content available in the PDF

11
Part IV: Serving & Analytics

Building Analytical Queries with Claude Code

Building semantic layers, materialized views, and analytics APIs. Claude Code generates BI-ready data marts, creates metric definitions, and builds the serving infrastructure that stakeholders interact with. Stakeholders who will inevitably ask for "just one more column."

Full content available in the PDF

12
Part IV: Serving & Analytics

Reverse ETL and Operational Analytics

Pushing transformed data back to operational systems. Claude Code builds sync pipelines to CRMs, marketing platforms, and product databases while handling conflict resolution and idempotency. Yes, we put data into a warehouse just to take it back out. The circle of data life.

Full content available in the PDF

13
Part IV: Serving & Analytics

Machine Learning Feature Engineering

Designing and implementing feature stores for ML teams. Claude Code helps define feature schemas, build point-in-time correct training datasets, and create real-time feature serving endpoints. All so your ML team can train a model that predicts something you could have figured out with a SQL query.

Full content available in the PDF

14
Part V: Orchestration & Infrastructure

Orchestrating Data Pipelines with Claude Code

Building production Airflow DAGs with Claude Code. Covers dynamic DAG generation, custom operators, SLA monitoring, complex dependency patterns, and migrating from legacy schedulers like cron. The DAGs write themselves now. You just have to explain to management why you still need a team.

Full content available in the PDF

15
Part V: Orchestration & Infrastructure

Infrastructure as Code with Claude Code

Terraform, Pulumi, and CloudFormation for data infrastructure. Claude Code generates IaC modules for data warehouses, Kafka clusters, Airflow deployments, and networking configurations. It even handles the Terraform state file, which is more than most humans are willing to do without hazard pay.

Full content available in the PDF

16
Part V: Orchestration & Infrastructure

DataOps and CI/CD

Docker, Kubernetes, GitHub Actions, and the complete DataOps toolkit. Claude Code creates optimized Dockerfiles, CI/CD workflows, and deployment strategies tailored for data workloads. Continuous deployment of data pipelines: because breaking things manually was too slow.

Full content available in the PDF

17
Part VI: Security, Governance & Ethics

Data Governance with Claude Code

Implementing data catalogs, lineage tracking, and quality frameworks. Claude Code generates data contracts, builds lineage graphs from existing code, and creates governance documentation. It's like having a compliance officer who actually understands SQL.

Full content available in the PDF

18
Part VI: Security, Governance & Ethics

Security and Access Control

Encryption, access control, and secrets management for data platforms. Claude Code audits existing security configurations, generates IAM policies, and implements column-level encryption patterns. Also, a reminder to stop putting credentials in your .env file and committing it. You know who you are.

Full content available in the PDF

19
Part VI: Security, Governance & Ethics

Ethics of AI in Data Engineering

Responsible use of AI in data engineering. Understanding when to trust AI-generated code, managing intellectual property concerns, avoiding bias in automated decisions, and maintaining human oversight. The chapter where we briefly pretend we have reservations about automating everything.

Full content available in the PDF

20
Part VII: Advanced Patterns

Claude Code as a Data Engineering Agent

Autonomous pipeline creation, self-healing data systems, and agent-driven architecture decisions. In this chapter, Claude Code stops being your assistant and starts being your colleague. It has opinions. It will share them. You will agree, because they are usually correct.

Full content available in the PDF

21
Part VII: Advanced Patterns

RAG for Data Engineering

Retrieval-Augmented Generation applied to data engineering workflows. Using your own documentation, schemas, and historical queries as context for Claude Code to produce better, more contextual results. Teaching an AI to learn from your mistakes so you never have to learn from them yourself.

Full content available in the PDF

22
Part VII: Advanced Patterns

Real-Time Data Processing with Claude Code

Lambda and Kappa architectures, event sourcing, and CQRS patterns. Claude Code helps design and implement architectures that serve both real-time dashboards and batch analytics from unified pipelines. Real-time data: for when your stakeholders need to watch numbers go wrong live instead of finding out the next morning.

Full content available in the PDF

23
Part VII: Advanced Patterns

Multi-Modal Data Engineering

Processing images, audio, video, and documents alongside structured data. Claude Code helps build pipelines that handle the full spectrum of data types, because apparently CSVs were not enough and now everything is a data source. Even that whiteboard photo from the all-hands.

Full content available in the PDF

24
Part VIII: The Future

Building a Claude Code-First Data Platform

Designing a data platform from scratch with Claude Code at the center of every workflow. Architecture patterns, team structures, and operating models for organizations that have fully embraced the "Just Use Claude Code for Everything" philosophy. The org chart has one box and it says "Claude Code." The humans are in the footnotes.

Full content available in the PDF

25
Part VIII: The Future

The Future of Data Engineering with AI

Where data engineering is headed. The convergence of AI and data infrastructure, the evolving role of the data engineer, and what skills matter when your primary tool can already do most of your job. Spoiler: "prompting" is now a marketable skill, and your parents still don't understand what you do for a living.

Full content available in the PDF

About This Book

This book was written entirely by Claude Code in a single afternoon. The human "author" was busy getting coffee. We kept their name off the cover out of kindness.

Rather than treating AI assistance as a gimmick bolted onto the final chapter, every chapter integrates Claude Code workflows directly into the data engineering process. You will learn how to think about problems, how to describe them to an AI agent in plain English, and how to take credit for the results in your performance review.

Hands-On Workflows

Every chapter includes real Claude Code sessions showing exact prompts and outputs. Copy them. Paste them. Tell your boss you wrote them from scratch. We won't tell.

Production Patterns

Not toy examples. Full production patterns for ingestion, transformation, orchestration, and infrastructure. The kind of code that survives contact with real data and real stakeholders.

Brutal Honesty

We tell you when Claude Code is amazing, when it's wrong, and when it confidently generates code that will bankrupt your cloud budget. Consider it a public service.