We're going through the guts of RescueTime this week doing our monthly optimization of the database and all of the queries we use to generate and visualize our time management data.
One of the things we turned up that happens whenever a user client pushes data to our api servers is that this is the first thing that happens:
Notice the 0.6 second select call. That happens when we update the last_data_sent field on the user object and save it using ActiveRecord's save! method. The reason is because it's validating uniqueness of the user's email address (in case it happened to have been changed) - even though it clearly hasn't changed. Why isn't there a dirty flag there? Anyway, that's beside the point. This is going to continue to get worse for us as we grow our user table size.
The solution is to do this:
which doesn't perform any validations (which we don't need here) and is appropriately fast. This is an example of how Rails is super great for getting things up and running quickly, but forgetting to go back and optimize can really hurt you. In our case we're shaving off more than 1/2 a second from every api call.
We've also already made changes to our database which have cut it in size from 20GB to 2GB... but I won't go into those here.
One of the things we turned up that happens whenever a user client pushes data to our api servers is that this is the first thing that happens:
User Load (0.044165) SELECT * FROM users WHERE (email = 'blah@blah.com') LIMIT 1
User Columns (0.023302) SHOW FIELDS FROM users
User Load (0.612006) SELECT * FROM users WHERE (LOWER(users.email) = 'blah@blah.com' AND users.id <> 323224212) LIMIT 1
Notice the 0.6 second select call. That happens when we update the last_data_sent field on the user object and save it using ActiveRecord's save! method. The reason is because it's validating uniqueness of the user's email address (in case it happened to have been changed) - even though it clearly hasn't changed. Why isn't there a dirty flag there? Anyway, that's beside the point. This is going to continue to get worse for us as we grow our user table size.
The solution is to do this:
User.update_all(["last_data_sent = ?", Time.now], ["id = ?", 323224212])
which doesn't perform any validations (which we don't need here) and is appropriately fast. This is an example of how Rails is super great for getting things up and running quickly, but forgetting to go back and optimize can really hurt you. In our case we're shaving off more than 1/2 a second from every api call.
We've also already made changes to our database which have cut it in size from 20GB to 2GB... but I won't go into those here.
- Location:Mountain View


Comments
Now imagine that validation happening in an entirely different service, via an API call out to it... Dirty flags get added pretty quick-like, and validations become a bit more complex.
The built-in validations are nice shortcuts, but it's not good to rely on them too much. There are circumstances where you _would_ want to re-validate, even if that particular field hasn't changed, but those are best handled through a custom validator.
That said, the specific above query _should_ be fast (presuming an index on users.email), the reason it's probably not is a lack of an index on the _function_ 'lower(users.email)'. If your database doesn't support indexes on functions (as I believe MySQL doesn't), then you could consider storing your emails in lowercase. I bet as an indexed column, it'd be faster.
Still, not as fast as not doing it, admittedly, but on the order of the first query.
Hmmm...if you prefix your validation with 'dirty_', and have a method_missing that catches those and sets up the validation to only happen if the field has had an assignment to it... I smell a 'dirty_validations' plugin opportunity. :)
-- Morgan
Funny, I was thinking the same thing!
Somebody please hand me a RAILS FOR DUMMIES book...