Why You Should Replace ENUM With Something Else

« »
This Series: The Beginner Pattern

One of the most hotly contested points of my article on database design was the suggestion that developers drop the use of ENUM and use something else instead. Lots of people argued in favor of ENUM; however, there are several good reasons why developers should reconsider ENUM and use it sparingly.

There are three core reasons why ENUM is a data type that should be reconsidered.

ENUM requires a rebuild of the table when adding a value to the middle of the set.
While its true that adding an ENUM to the end of the set doesn’t require a rebuild, often this is impractical. Adding a new value to the ENUM definition will require MySQL to rebuild the entire table – less than optimal for large tables. The time required will depend on how large your tables are but millions of rows will take considerable amounts of time.

ENUM values are ordered in the order they’re added to the database
If you’ve ever done an ORDER BY on an ENUM column, you’ll notice that MySQL organizes them via the order they were added to the ENUM, rather than alphabetically or numerically. This ties into the first point, because if you want to order values in, say, alphabetical order, you have to reorder the ENUM in alphabetical order, and this results in a rebuilding of the table by MySQL.

It’s also worth nothing that for developers on your team, they may be extremely frustrated when they discover that the column is ordered “incorrectly”; they might expect it to be ordered alphabetically and since it’s not, they will try and figure out why. They may not know the database as thoroughly as the database administrator or developer who put it together, and thus might not know about the ENUM fields.

ENUM values do in the database what should be done in the model.
Contrary to what many people believe, the database is not the model. The model is the domain logic which takes the raw data and turns it into data that the application uses. For example, you may store 0.25 in the database but convert that to 25% when you display it in your view; it’s not stored as 25% in the database, though.

The model should be enforcing the constraints on the data types and values going into your database, not the database itself. The database is simply the storage location for the data the model needs. The same logic tends to apply for triggers and stored procedures, limiting their authorship to manipulating data in the database when doing so via the model would be too time-consuming or resource intensive (the database is generally going to be faster at manipulating database data).

I’ve seen ORMs that translate ENUM and SET columns into VARCHAR columns automatically for you. According to Rya Martinsen, Doctrine is one such ORM.

Cases For Using ENUM
When my last blog post published, one of my good friends Eli White pointed out that ENUM could be useful for columns that had data that would always fit into one particular set of values. For example, he said gender would be one such field that might be served well with an ENUM. And if anyone should know about database types, it’s Eli, since he’s a former Digg employee.

Summary
The bottom line is that ENUM has its place, but should be used sparingly. The model should enforce the constraints, not the database; the model should handle interpreting raw data into useful information for your views, not the database.

Brandon Savage is the author of Mastering Object Oriented PHP and Practical Design Patterns in PHP

Posted on 11/25/2009 at 1:00 am
Categories: Technology, Best Practices, Databases
Tags: , , ,

Hari K T (@harikt) wrote at 11/25/2009 1:42 am:

Wow Thanks .

I learned something new . Great post as usual .

Mazharul Anwar (@mazharul_anwar) wrote at 11/25/2009 1:47 am:

nice post. Thanks for this informative post.

Jan! wrote at 11/25/2009 2:41 am:

I disagree.

“ENUM requires a rebuild of the table when adding a value to the middle of the set.”
Randomly inserting new values into an ENUM, instead of appending, is extremely poor practice, anyway. It is the same as changing a bitmask. All compiled code that uses that bitmask would have to be recompiled. If you add to an ENUM or a bitmask, what used to work, will still work.

For example, consider PHP’s built-in error flags: E_ERROR, E_WARNING, … When they added E_DEPRECATED, do you think they even considered using an existing value?

“ENUM values are ordered in the order they’re added to the database”
You can get around that: ORDER BY CONCAT(“”, enum_column_here). However, I think that comes at the expense of using filesort.

“ENUM values do in the database what should be done in the model.”
Data types exist for a reason. Applying your reasoning, everything should be a string and be handled by the application. Internally, it’s all just bytes, anyway. Why would you use an INT but not an ENUM?

Ben wrote at 11/25/2009 5:16 am:

You make a much better argument in this post than you did in your previous post.

I think the main reason not to use enums – which you didn’t mention – is that they are a MySQL custom data type and therefore aren’t portable.

Also, I was under the impression that any ‘ALTER TABLE’ statement requires the entire table to be rebuilt? I may be wrong about that, if so, please correct me. However, I agree with Ian! that you shouldn’t add a value to the end.

I do still think there is a bit of hair splitting here. There are far more important decisions to be made when designing a database, and I think this is something that is unlikely to cause major problems for the vast majority of implementations.

Bruce Weirdan wrote at 11/25/2009 5:19 am:

“The model is the domain logic which takes the raw data and turns it into data that the application uses. For example, you may store 0.25 in the database but convert that to 25% when you display it in your view; it’s not stored as 25% in the database, though.”
Bad example – number formatting is a task for view, not the model.

Brandon Savage (@brandonsavage) wrote at 11/25/2009 6:24 am:

Jan!, to address your last point, making use of the integer versus varchar data types makes doing math in the database easier, and using the proper data type saves bytes and more importantly, time. MySQL will switch types for you (e.g. if you have an integer but insert a value surrounded by quotes it will cast it to an integer), but this takes the system time.

Bruce, there’s an argument to be made either way. Storing .25 in the database is proper for doing math, but the data must be manipulated in two ways in order to be displayed: it must be multiplied by 100 (to make it a whole number), and then it must be given a % symbol on the end. I think the view can handle adding the percent symbol, but the manipulation of the data should take place in the model – even if it’s just as simple as multiplying the value by 100. Still, I think we’re arguing semantics here, rather than addressing the core issue, which is that the database stores data, and the model manipulates it.

FractalizeR wrote at 11/25/2009 6:34 am:

I disagree.

“ENUM requires a rebuild of the table when adding a value to the middle of the set.”
I see no reason of adding an enum value to the middle of the set. Nothing prevents you from adding it to the end.

“ENUM values are ordered in the order they’re added to the database”
I don’t see any reason of ordering a table by ENUM also. ENUM is an enumeration and if you need to sort a table by it, that means, you chose a bad type for this value.

“ENUM values do in the database what should be done in the model.”
“The model should be enforcing the constraints on the data types and values going into your database, not the database itself”

I just disagree. If we take this point, we don’t need, for example, foreign keys also because this database restriction can be applied by model.

Les wrote at 11/25/2009 7:45 am:

> … developers drop the use of ENUM and use something else
> instead …

Such as?

There is no other reasonable alternative – that is why we have ENUM and SET in the first -beep- place.

Been using them for a number of years and I will continue to use them. This whole arguement about using them or not comes not from a database design point of view but an ORM point of view.

You can shove your ORMs where the sun ain’t going to shine if you ask me. I’m not going to use something else that would break the rules of database normalisation.

Most of the time I agree with you but this arguement I can’t.

Krzysztof Kotowicz wrote at 11/25/2009 7:46 am:

I totally dissagree, Brandon. As Ben pointed out, one could argue against ENUMs because they are MySQL-specific, but your points are just incorrect in my opinion.

First agrument (adding to middle of the set) – who would want that? It’s an enumeration, you don’t sort by it. Exemplary enumeration is CLUB, DIAMOND, HEART, SPADE (thats’s from http://en.wikipedia.org/wiki/Enumerated_type) – is diamond greater than club? You can’t tell so you shouldn’t order by it (unless you only want to show clubs next to clubs, spades next to spades etc.)

Second – “ENUM values are ordered in the order they’re added to the database”. Yes, they are. Just like AUTO_INCREMENT fields. If you used (instead of ENUMs) an additional dictionary table
and put a foreign key into the main table, it would still bite you – primary key in the dictionary table would most likely be auto incremented and you couldn’t add a value to the middle of the set or change the order without touching the main table.

If you wan’t to manipulate the order of ENUMs – you are using them wrong, it’s not the case against ENUMs.

‘ENUM values do in the database what should be done in the model’ – I’m sorry but I just dissagree, database IS a part of a model – it’s been built for exactly this purpose. You have data types, foreign keys, constraints, triggers, stored procedures, indexes – all of this is to manage persistent data. If you want to throw it out the window and try to enforce this in application layer only, you will end up writing SQL logic in PHP. Also, a single PHP application might not be the only consumer of this data – what if there is a .NET application, a Perl script or whatever which also accesses the data in your DB? You need to port the same constraints to these clients. And maintain in for the future.

My experience tells me that what could go into database layer, probably should go there. It’s compiled, optimized, it keeps the data integrity at the lowest possible level.

Of course I’m not saying you should transfer all business logic to DB or totally skip the validation in application level. For sure though, constraints must be managed by the database or you can easily end up with inconsistent data.

Pádraic Brady (@padraicb) wrote at 11/25/2009 8:20 am:

@Krzysztof The database is NOT part of the model. It is a storage backend for the model. You can use constraints in the database for data integrity, but that is the storage design. The Model will perform its own high level checking and normalise retrieved data to an understandable form.

As for PHP/.NET etc operating on the same database – the reality is their Model still needs to be developed anyway. If there’s duplication there, someone needs to ask why there are two languages in use creating this consequence. (Usually it’s for obvious reasons – PHP as a frontend, something else in the back – they will duplicate Models somewhat if operating on similar data).

A lot of this is why ENUMs are somewhat sensitive. They are a database construct – not a programming one. Since many of us use ORMs and design Models from a top-down perspective, they tend not to get much traction everywhere. Personally I use them only rarely – my Models will handle Enum like properties internally and just use a varchar to store the value. Enums are by no means a necessity in modelling. Their main justification would lie in performance – assuming such is measurable. Most apps won’t bother…

Note: In case nobody has noticed, when using an ORM most of this is perfectly pointless ;). If you do not use an ORM and write your own SQL, well I’m sure Enums are great at that level. Just be aware that it all depends at what level you are designing from.

Vid Luther (@vidluther) wrote at 11/25/2009 9:47 am:

I think the post should be renamed to “How to use ENUMs the wrong way”..

1. MySQL 5.4/6.0 will have online capabilities, so the issue of ALTER tables taking too long will go away, an argument against something, should either be prefaced by a version #, or should stand the test of time.

2. Adding ENUM’s to the middle is indicative of a flaw in logic on the DBA/Programmers part, not the data type.

3. Sometimes, you do need the database to hande to handle constraints, foreign keys, enums etc, because you can not guarantee that the only way to access the database is going to be through your application.

4. The only reason not to use ENUM, is if you think you’ll be changing databases, and even then, you don’t switch your PDO DSN, you think through a lot of stuff before you make the change, and you should have a migration path.

Jase (@jase.com) wrote at 11/25/2009 9:47 am:

Worrying about how enums sort should be a non-issue, since you should NEVER SORT IN MySQL. This is because ORDER BY will usually cause the query to dump to disk, especially if you are working with millions of records. Instead, set a high enough memory_limit, dump data with mysql_buffered_query, and dump into multi-dimensional arrays so they can be more easily sorted in PHP. Much, much faster.

With that in mind, enums are great if you use them for a set of values that is too small for a separate table and likely to be used too widely to be enforced in logic. Inserting new enum values in the middle may rebuild the table, and with millions of records this could pose a small problem, but the real problem would be if you were using lots of enums (they allow 65,000+ values) and rebuilding millions of records.

The truly funky thing with enums are that they are basically 1-based arrays. If you define ENUM(‘0’, ‘1’, ‘2’), the indexes are 1,2,3, and you can SELECT using the index. The following queries show what happens if you forget quotes:

SELECT enum_column FROM table WHERE enum_column=1; # gives you 2

SELECT enum_column FROM table WHERE enum_column=’1′; # gives you 1

Also, they are sorted by this numeric index, and so that is why they do not sort by alpha.

Krzysztof Kotowicz wrote at 11/25/2009 10:04 am:

@Pádraic

I’m not saying that you should skip validation in your application code. I replied to this Brandon’s statement: “The model should be enforcing the constraints on the data types and values going into your database, not the database itself”. What I think is that DB _must_ enforce its own constraints (eg. by using FK constraints, check constraints where it would be applicable) and should not rely that application always submits consistent data.

High level checking is needed, I agree completely. However, I don’t think that “the database itself” is not the place for constraints – exactly the opposite. If you don’t enforce the constraints at the lowest possible layer, you can end up with incosinstent data because application code was e.g. buggy, the connection dropped, there was a security hole in application etc. To compare that – it’s like having a Javascript validation on a client and skipping it server-side. Would anyone recommend it?

Jase (@jase.com) wrote at 11/25/2009 10:08 am:

Oops, meant:

SELECT enum_column FROM table WHERE enum_column=1; # gives you 0

SELECT enum_column FROM table WHERE enum_column=’1′; # gives you 1

Les wrote at 11/25/2009 2:56 pm:

I wouldn’t say the database backend is part of the model either – just like using web services (a good comparison to your database) would not be considered part of your model.

Rijk van Wel wrote at 11/26/2009 9:46 am:

Hi Brandon, interesting post, however you didn’t consider the fact that ENUM columns use integer values internally (like Jase mentioned earlier), which is a lot more effectient when storing and selecting in certain fixed sets of string values (like ‘male’ or ‘female’).

I was wondering what the alternative was you had in mind – since I don’t think varchar is a suitable replacement in these cases?

Jon wrote at 11/30/2009 2:56 pm:

I don’t think that it’s wise to follow a developer’s opinion on DB design models. As on the MySQL DB forums for suggestions. Developers can only give you what a developer understands. DBA’s can give you a better understanding of when/where to use specific MySQL field types. If ENUM is in MySQL’s core functionality, it’s obviously useful. Otherwise, I tend to think that they would be smart enough to drop it and consider it obsolete…

Ryan (@popthestack) wrote at 11/30/2009 6:21 pm:

I agree with Vid about #3: “3. Sometimes, you do need the database to hande to handle constraints, foreign keys, enums etc, because you can not guarantee that the only way to access the database is going to be through your application.”

I’ve actually had problems of bad values in a database because Doctrine uses varchars and I did a whole lot of bulk operations with raw queries. Granted, it was an easy problem to fix and didn’t do any damage, but it was a bit of a pain.

Of course, it’s also been really nice to only have to update the application instead of a bunch of development and live databases.

Ryan (@popthestack) wrote at 11/30/2009 6:47 pm:

Also, this article suggests I use enums over varchars:

http://net.tutsplus.com/tutorials/other/top-20-mysql-best-practices/

Who to believe? I dare say it’s different in each situation, hence the need for a good DBA. :)

« »

Copyright © 2023 by Brandon Savage. All rights reserved.