You Don't Actually Know What's In Your Database

February 24, 2026

postgresqldatabasedataclaude-code

I built the schema. I wrote the migrations. I can tell you what every column means. And then someone asked me a pretty reasonable question about our data, and I had absolutely no idea what the answer was.

Not because it was a complex question. Because I’d never actually looked.

This happens more than I expected. You build the system, you know the structure, but you don’t really know the contents. You have assumptions baked in—this field is always populated, that relationship always holds, values in this column follow a certain pattern. Those assumptions come from how you designed it, not from what’s been accumulating in there since you shipped it.

The gap between those two things is usually interesting.

When You Actually Have to Look

It comes up in a few situations. Sometimes it’s a question from someone outside of engineering that you can’t answer without actually checking. Sometimes it’s prepping for a migration and realizing you have no idea how many rows have null in a column you’re about to make required. Sometimes it’s a weird bug that only affects certain records and you need to figure out what those records have in common.

The common thread: you have a question about your data and you can’t answer it from memory. You have to look.

Where Claude Fits In

The exploratory side of this is a lot of one-off queries. Count this. Group by that. Find me records where this field is null but this other field isn’t. Show me the distribution of values in this column. Find anything that looks off in this date range.

I can write these myself, but it’s slow. Not because the SQL is hard—it’s usually not. It’s that each question spawns three more questions, and you end up writing a bunch of queries in a row trying to understand what you’re looking at.

Describing what I want to know and having Claude write the query is faster. “Show me how many rows in this table have a null here but a status of completed” takes seconds to say and not much longer to get back as SQL. That speed compounds when you’re doing ten or fifteen of these in a row trying to chase something down.

What helps to give Claude upfront: the relevant table names, rough column shapes, any quirks you know about (like a column that looks like it should be an enum but is actually free text from an old import). That context cuts down the back-and-forth considerably.

What You Actually Find

This is the part that’s actually interesting.

Take the example of a column that’s marked non-nullable in code but was never enforced at the database level. You might find a chunk of rows with null there from before validation was added. Nothing’s breaking—until you write a query that assumes the invariant holds, and then it does.

Or picture trying to understand usage patterns and finding that a big portion of what you’re calling “active” hasn’t done the thing that made them active in a long time. The aggregate counts look fine. You have to look at the distribution to see the shape of it.

Neither of those is necessarily breaking anything. But they’re the kind of thing you want to know before you write a migration that depends on assumptions about your data—or before you give someone outside of engineering a number that turns out to be wrong.

What Claude Can’t Do

Claude writes the query. It can’t tell you what the result means.

If I run something and get back 40,000 rows with a null where I expected a value, Claude can’t tell me if that’s a bug, a legacy artifact, or expected behavior for a certain class of records. That interpretation is mine. I know the domain. I know which data came from a bulk import years ago and is always a little weird. I know the business rules that determine whether something’s actually wrong.

The pattern that works: Claude generates the query, I run it, I bring the result back and explain what I’m seeing, and we figure out what to look at next. It’s a loop. The queries come fast. The thinking is still on me.

Worth Doing Before Anything Consequential

Most of the time when I do this kind of exploration, I find at least one thing I didn’t expect. Not always a problem. Sometimes it’s just interesting. But the assumption that I know what’s in my database because I designed it has been wrong enough times that I stopped trusting it.

Now before I do anything that matters—a migration, a new query in a hot path, a feature that depends on a data invariant I haven’t actually verified—I spend some time actually looking first. Claude makes the query-writing part fast enough that there’s no real reason not to.

The data’s been in there the whole time. You just haven’t read it.