Data Constellation

Agile Information Management and Design

Out of Vietnam, Part 2

Posted by Clifford Heath on November 15, 2007 at 10:03 PM

I talked about how we build systems by composing them from elements, not by decomposing monolithic “problem statements”. The elements always depict either states, or transitions between states. These two angles are the information perspective and the process perspective. Process steps always transit between legal states, so the set of legal states must be defined first. It’s not that information is pre-eminent, but it does tend to lead rather than lag the rest of the design. So here I’m focusing on the information part of our design.

Now, we need to build up an overall aggregate picture of what things our system can describe, and what it needs to know about those things. The picture is made of many small elements, and many small constraints on the ways they interact. The sum of these small things forms our conceptual model - they reflect the way we and our clients think about their problem, not its solution. To store them however, we need to group them together for efficient management. That’s what we’re doing when we’re building a database design - writing down all the things that the system needs to know, in a way that will be efficient to manage. There are two goals here: manage all the elements and their interactions without losing track of any, and produce an aggregate structure that is efficient. These goals work against one another.

When we’re done, if we’ve done a good job, we have a normalised database design. “Normalised” basically just means that it provides only one way to represent any of the elementary facts, so that you can’t have two versions that disagree. But there’s another property of normalised data that causes problems: any one “thing” will only have one record, and all facts about that thing for which there is only one value at a given time are stored in that record. This aggregation is a fine principle for creating efficient physical storage structures, but the aggregation leaks into our code.

When we query the data using SQL, there’s one way of accessing a fact that has only one value for each thing - the column - and there’s another completely different and somewhat difficult way of accessing facts that have more than one value for each thing - the join. SQL forces the direct use of the physical database model, while at the same time hiding the true domain model which is present in the elementary form. This prevents the domain expert from properly engaging in verifying the model and ensures communication problems because of the translation and interpretation required.

Being bound to the physical model also tragically limits the agility (evolution) of applications because the physical model is always more unstable than the conceptual model. The mere number of values (or other things) related to a thing in a given fact relationship should be a minor detail, yet it completely controls the physical model. When a requirement changes in tiny ways, we can sometimes end up needing to do a major restructure of the database, potentially across many tables.

Imagine you have a table of users, and one column is the “given names” column. Your client now needs to store information about the reason your parents gave you each name… and all of a sudden you need to move the contents of that column out into a new “given names” table. Every query that fetched a user will probably have also fetched their given names, and so now needs to be rewritten. All we did was add a new fact to an otherwise complete model - why does all our code need to be checked and maybe rewritten? Ok, perhaps “given names” is an uncommon example, but this sort of thing occurs so often in relational databases that for more than twenty years, it has its own name: attribute migration. It’s an example of just one way the addition of a small item to our elementary model causes big impacts on out aggregate design.

So while relational databases are one of the preeminent achievements of computer science, they must move beyond requiring direct dependence on the physical storage structures. SQL is the problem here, because of the gross difference between accessing a column (single value) and another table via a join.

Replacing SQL by a language that has this property of uniformity of reference must be the top priority if the industry is to more forward in solving this critical problem. There is a way out of Vietnam… but only after we replace SQL. Tune in next time for a first peek at the language that can do this, the Constellation Query Language.

How to ruin a Rails project

Posted by Clifford Heath on October 18, 2007 at 10:37 PM

There are lots of ways to ruin any project. I’ve seen most of them over the last few decades, but this year I’ve been called in to salvage a series of Rails projects that were, well, off the rails, in some ways that maybe special to Rails. So I’ll try to steer clear of the ordinary foul-ups, and focus on the ones that Rails seems to attract.

  1. We have four months before the website is needed, and Rails is so productive that we don’t need to get started yet. We can deliver the specifications in a couple of months or so, and everyone will be ready to knock out the website in two weeks. Right. Let me know how that goes, ok?

  2. Databases suck, no-one wants to write SQL, and I can’t do all my validations in it anyhow, so why should I do any? We’ll do things the Rails Way and put all that stuff in the code where it’s easy. After all, who needs a uniqueness constraint if the code always checks for an existing record before inserting a new one, right? Nothing can go wrong with that can it?

  3. Indexes? Add them after users complain that the site is too slow - even if it was obvious after a moment’s thought that they were always going to be needed. MySQL is so bad at optimizing queries that it might as well be forced to do full table scans it was probably going to do anyway. And besides, it worked just fine with the 5 test records I put in the test fixtures manually.

  4. Performance doesn’t matter, so if the site is too slow, well, at least it was quick to develop. And when the client urgently needs a report that should take five seconds to produce, but because it’s a five-way join and you didn’t add any indexes it times out in Apache’s mod_proxy after the regulation five minutes, well, that’s why you turn your mobile phone off at night and ensure you can never be found online, right? That way you can get a good night’s sleep while the client is tearing out his hair and losing his business.

  5. Foreign keys. You don’t need the database to enforce them if you get the code right. No need to actually take a look at the database from time to time to see whether the invariants your code is supposed to enforce are actually held. So when you later make administrative changes and delete records that other ones refer to, well, ActiveRecord is good about providing a nil that should do nothing, and if not, well, there’s always an exception catcher to tell you your mistake.

  6. Oh, yes, exceptions. The Rails log is full of them, but they’re mostly from Chinese hackers trying to find hidden features, or irrelevant little deadlocks or races that made some user redo their work. No big deal, it only happens occasionally. No need to deploy one of the nice plugins that send you email when you get an exception, of course. That would just mean you’d have to go and find out why it happened, and Rails exists to reduce boring work.

  7. If it works for one user, it’ll work for hundreds, won’t it? Transactions and locks are for banks, not for websites. And two-phase commit, that’s engagement & marriage isn’t it, not something you’d use in a payment protocol? Oh, and I sprinkled a few magic Model.transaction {} blocks around the place, and they must work, because people who should understand such things said they work.

  8. Release management is for wimps. Just use the SVN trunk, and when you check in code, check it out on the test server, let the client look it over, then deploy it to production. No need even to log in to do that, just cap deploy - you can do it without getting out of your pyjamas. All your developers are demigods who never make mistakes anyhow, so if one on one side of the city deploys the other one’s code into production without even Skyping or picking up the phone, there won’t be any unforeseen interactions, will there now?

  9. It was so easy to write, any fool can see it’s correct. TDD is fine for some slow thinkers, and we’re glad Rails makes it easy for them, but seriously, do you expect me to write 100 lines of code to test 50, when I can see perfectly well that there aren’t any errors in it? And besides, if there is an error, it’ll be a one-line fix. Barely even need to finish my latte first, it’ll be fixed in a moment. Not necessarily the moment before it makes the site melt down, but that’s what backups are for, right?

  10. Hmm, backups. That would have been a good idea. That would have helped when, after discovering we hadn’t planned far enough ahead to see the one feature that was going to make all the difference on the big day, we let folk type data directly into the database using an unvalidated, unlogged administration feature. Pity they deleted the entire contents of a critical table… And even then, we might have been able to cobble together a script to reconstruct the transactions that were lost, except that the Rails log only lists the form parameters, not the saved session variables that form the context in which those parameters were relevant.

Discipline? Who needs discipline or forethought when you’re agile?