Database Infrastructure
Every real app eventually needs to remember things — users, orders, messages, settings. That memory lives in a database. For a beginner, "database" can feel like a black box guarded by experts. It isn't. This chapter demystifies the data layer so you can make good choices and direct AI to do the heavy lifting safely.
The Three Shapes of Data Storage
Most databases fall into three families. You don't need to memorize products — you need to recognize which shape fits your problem.
| Type | Examples | Best for | Trade-offs |
|---|---|---|---|
| Relational (SQL) | PostgreSQL, MySQL, SQLite | Structured data with relationships; anything money- or accuracy-critical | You must design a schema up front; scaling writes very large is harder |
| Document / NoSQL | MongoDB, Firestore, DynamoDB | Flexible or nested data, fast iteration, varied record shapes | Easy to create inconsistent data; relationships and reporting get awkward |
| Key-Value | Redis, Memcached, Cloudflare KV | Caching, sessions, counters, fast lookups by a known key | No querying by content; usually not your source of truth |
A good default for 90% of projects: start with a relational database (Postgres). It is mature, predictable, enforces structure, and handles the relationships real apps inevitably grow. Reach for document stores when your data is genuinely shapeless, and key-value stores as a supporting layer (caching), not your primary store.
Managed vs Self-Hosted
You can run a database yourself on a server, or rent a managed one (Supabase, Neon, PlanetScale, RDS, Cloudflare D1) where the provider handles backups, patching, and uptime.
For vibe coding, always start managed. Self-hosting a database means you are now responsible for backups, security updates, disk space, and 3 a.m. recovery. That is real operations work that distracts from building. Managed services cost a little money and save enormous time and risk. Graduate to self-hosting only when you have a concrete reason and the skills to back it up.
Schema Design Basics
A schema is the blueprint of your data: what tables exist, what columns they have, and how they relate. Good schema design is mostly common sense plus a few habits.
- One table per "thing." Users, posts, orders — each gets its own table.
- Every row needs a unique ID (a primary key).
- Link tables with foreign keys, not by copying data around.
- Pick honest types. Money is a
decimal, not afloat. Dates are timestamps, not text. - Don't duplicate. If a user's email lives in five tables, you'll eventually have five different emails.
Here is a simple schema for a blog, the kind AI will generate for you:
CREATE TABLE users (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
email TEXT NOT NULL UNIQUE,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE TABLE posts (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
author_id UUID NOT NULL REFERENCES users(id),
title TEXT NOT NULL,
body TEXT NOT NULL,
published BOOLEAN NOT NULL DEFAULT false,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX idx_posts_author ON posts(author_id);
Notice the REFERENCES (a foreign key tying a post to a real user) and the NOT NULL and UNIQUE constraints. These are guardrails the database enforces for you, so bad data can't sneak in even when your code has a bug.
Migrations: Changing the Schema Safely
Your schema will change — you'll add a column, rename a table, add an index. A migration is a versioned, repeatable script that describes one such change. Migrations matter because the schema in your laptop, your teammate's laptop, and production must stay identical.
The rules that keep migrations safe:
- Never edit the database by hand in production. Write a migration.
- Migrations move forward. Each one is a new file; you don't rewrite old ones that already ran.
- Make destructive changes carefully. Dropping a column or table deletes data permanently. Have a backup first.
- Additive changes are safest. Adding a nullable column rarely breaks anything; renaming or removing one can break running code.
Tools like Prisma, Drizzle, or Rails migrations generate and track these for you. Let them.
Indexing
An index is like the index at the back of a book: it lets the database jump straight to matching rows instead of scanning every one. Without indexes, queries get slower as data grows — fine at 100 rows, painful at 1,000,000.
The practical guidance:
- Index columns you frequently filter or join on (e.g.
author_id,email). - Primary keys are indexed automatically.
- Don't index everything — each index speeds up reads but slows down writes and uses space.
- Add indexes when you see a slow query, not preemptively for every column.
Caching
A cache stores the answer to an expensive question so you don't have to ask again. If your homepage runs the same heavy query for every visitor, cache the result in Redis for 60 seconds and serve it instantly.
Caching is a performance optimization, not a source of truth. The classic beginner trap is forgetting cache invalidation — the cached copy goes stale and users see old data. Rule of thumb: cache only when you've measured a real bottleneck, and always have a clear rule for when the cached value expires or gets refreshed.
Queues
A queue lets you say "do this later" instead of "do this now." When a user signs up, you don't want them waiting while you send a welcome email and resize their avatar. Push those jobs onto a queue (RabbitMQ, BullMQ, Cloudflare Queues, SQS); a background worker picks them up. The user gets an instant response.
Use a queue for any task that is slow, can fail and retry, or doesn't need to finish before you respond to the user.
Common Beginner Mistakes
- Reaching for NoSQL because it sounds modern. Most apps have relationships; SQL handles them better.
- Storing money as floating-point.
0.1 + 0.2is famously not0.3. Use decimal types. - No backups. Managed services usually back up automatically — confirm it, don't assume it.
- Putting secrets or huge files in the database. Files belong in object storage (S3, R2); the database stores the link.
- Letting AI write raw queries with user input glued in. This causes SQL injection. Use parameterized queries.
Directing AI to Build Your Data Layer
AI is excellent at schema design and query writing — if you give it the constraints to design against. Don't say "make a database." Describe the domain, the relationships, and your non-negotiables.
A strong prompt looks like this:
Design a PostgreSQL schema for a task-management app.
Entities and relationships:
- A user can own many projects.
- A project has many tasks.
- A task belongs to one project, has a status
(todo / doing / done), and an optional due date.
Requirements:
- Use UUID primary keys and created_at timestamps.
- Enforce relationships with foreign keys.
- Add indexes for the columns we'll filter on.
- Give me the schema as SQL, plus a forward migration file.
- Explain any trade-offs you made.
Then review the output with the checklist from this chapter: honest types, foreign keys and constraints present, sensible indexes, no duplicated data. When you ask for queries, explicitly require parameterized queries so user input is never concatenated into SQL. And for any migration that drops or renames something, ask AI to call it out so you can back up first.
The database is the part of your app where mistakes are hardest to undo — data, once corrupted or lost, doesn't always come back. That's exactly why these fundamentals are worth knowing: not so you can write every line yourself, but so you can direct AI confidently and catch the dangerous moves before they ship.