November 23, 2004
Dealing with database portability and auto increment

It appears every relational database has its own way of dealing with automated identifiers. Most databases, for instance Oracle, allow you to create sequences which can be used in conjunction with triggers to create a sequential ID when inserting rows of data.

MySQL has its own way of dealing with this problem which stems from its lack of triggers in all but MaxDB versions. In MySQL there is a special column attribute "auto_increment" which will increment the value of the identifier upon insert. Behind the scenes, it is really quite simple, MySQL performs a special lock on the table, selects a max value on the column and adds 1. The downfall (or for some, it may be a benefit) of this implementation is that ID numbers can be reused if data is deleted after the database server has been restarted (since MySQL stores the information in memory). With the Oracle approach, the ID numbers will not be re-used unless the sequence is renumbered - not something that would be done often.. well, in most cases.

PostqreSQL somewhat combines the two approaches. It provides the ability to create sequence but there is no need to triggers (as you would use in Oracle), instead you can use nextval('sequence_name') as part of the table declaration, much like auto_increment attribute in MySQL.

For insertion purposes all these approaches work in a similar manner: you insert a row with a zero or a null ID and the database does the work for you. The difference is in how you can retrieve the ID once a row of data is created.

There are a couple of ways this can be handled:

1. Handle it in the code using abstraction and writing separate implementation for getting the identifier based on the database type.

2. Create your own implementation of incremented Identifiers: for instance a table which stores a separate value for the last id for the table/column.

The drawback of the first approach is obvious: writing more code to implement each separate supported database. On the other hand, it utilizes built-in database functionality and simplifies insertion.

The second approach has its drawbacks as well, which brings me to the whole point of this entry. I have seen this solution used in many open source projects to simplify database compatibility code and find that to be clunky for a few reasons:

  • A select statement must be made before each insertion.
  • The table needs to be write locked for the duration of the transaction to prevent another process from grabbing the same ID and creating duplication
  • Manually inserting data into the database becomes a hassle (you're faced with two sql statements for a row of data instead of one)
  • You're not utilizing the features of the database which are optimized for this purpose.
  • The whole process is likely slower

I find the extra effort spent on writing abstracted database persistence code to be well worth the hassle you avoid while later debugging and supporting the application.

Posted November 23, 2004 05:44 PM in Java
TrackBack URL for this entry: http://www.unix-girl.com/mt/mt-tb.cgi/1403
Comments
On November 23, 2004 06:16 PM Peter added:

Heh, the Oracle sequence renumbering thing has happened to me on a number of occasions, such as when importing a table from a production server to a development server, but forgetting to update the sequence on the development server to be greater than the max id in the production server's table.

The auto-incrementing stuff has always bugged me as well. I first learned SQL on MySQL, and was annoyed to find the lack of an auto_increment attribute when I learned Oracle's SQL. I had actually never thought about using a trigger to increment sequences in Oracle... bet that would speed up a lot of things. :-)

#
On November 23, 2004 06:25 PM Mark added:

You missed the obvious third solution: don't use sequences as ID values if you're that concerned about database portability. Use a varchar. (blasphemy, I know, but it works)

pstmt.setString(1,(new java.rmi.dgc.VMID()).toString());

http://java.sun.com/j2se/1.3/docs/api/java/rmi/dgc/VMID.html

#
On November 23, 2004 08:10 PM J Yu added:

I wouldn't allow anybody other than the application to generate IDs. it's only asking for trouble.

most of drawbacks you listed for approach 2 are not exactly true. a 'hi/lo algorithm' can avoid them.

#
On November 23, 2004 09:07 PM fluffy added:

Then there's always fun issues like phpBB assuming that every DBMS is mysql, so when you do a phpBB database backup with its horrible web-based interface, it doesn't bother saving any of the sequence data on Postgres. That made me a very unhappy fluffy once upon a time, when I had to manually restart a whole bunch of sequences at sane values when I didn't even know enough SQL to do it.

Of course, now I just use a cron job to do pg_dump so I don't have to be at the mercy of phpBB anymore (especially since its web-based UI doesn't seem to have ever been tested on a db which takes more time to restore than PHP's process lifetime allows).

#
On November 23, 2004 09:09 PM Johnny added:

Use OJB. It handles every popular database's auto increment in a portable way.

#
On November 23, 2004 09:33 PM Zher0 added:

It might be worth noting that AUTO_INCREMENT for MySQL only behaves in the way you described for ISAM tables. AUTO_INCREMENT values for MyISAM and InnoDB tables (perhaps others) are strictly increasing and are not reused.

#
On November 23, 2004 09:38 PM kasia added:

Are you sure?

http://dev.mysql.com/doc/mysql/en/InnoDB_auto-increment_column.html

It's stored in memory, restarting the server will reset it and possibly re-use values, as I noted.

#
On November 23, 2004 11:12 PM Mark Matthews added:

It's a _key_, it shouldn't have any meaning to your application other than a way to uniquely identify a row (or a relation to another row or set of rows in another table), so if InnoDB happens to re-use a value that it has used before (but that _doesn't_ exist in the table _now_), so what?

If your application depending on identifiers to monotonicaly increase in value, then something is wrong with your application, in my opinion.

Even though auto_increments do have that property in general, relying on that is about the same as relying on ordered results from SELECTs when you don't use ORDER BY.

#
On November 23, 2004 11:16 PM kasia added:

I've seen applications where that would be a problem -- no referrential integrity and bad deletes happen. But you're absolutely correct, it's really supposed to be an arbitrary number (but hopefuly not a varchar!).

#
On November 23, 2004 11:26 PM Harrison added:

MySQL (in 4.1) now accepts the serial type for a column (which in the background gets turned into a bigint auto_increment), and I know that PostgreSQL does as well. Anyone know if Oracle does the same? If so, it could simplify the creation a bit.

Also as far as retrieving it once added, some languages can abstract that away for you as well. For example JDBC 3.0 has the getGeneratedKeys() function, which should allow you to do it easily. Hopefully other languages will eventually add an abstraction for this as well.

#
On November 24, 2004 08:01 AM pbw added:

Interesting, I didn't know you still had to manually gen the ID in Oracle with triggers. In Microsoft DB products, there is a 'seed' property that is held at the table level to track the last used ID and that keeps you from reusing IDs. Even Access has this feature, I wonder why Oracle has not added it?

Sybase has the same feature, but it has had a bug for as long as I can remember where if the server goes down hard the 'seed' will skip millions of values. If seen this 'feature' take down an application or two because the 'seed' had gotten bigger then the datatype will allow at the application level. Then the DBA gets to renumber the table and hold the RI across the database.

I've seen Sybase DBAs have to reseed whole databases due to this (Thank God it wasn't me). I can't remember what DB2 does for Auto-ID.

#
On November 24, 2004 08:05 AM kasia added:

I haven't used DB2 in about 5 years but I'm being told it still has no sequences or auto_increment!

#
On November 24, 2004 09:18 AM Zher0 added:

aaahhh.... you're right Kasia, thanks for pointing that out!

#
On November 24, 2004 10:10 AM pbw added:

'I haven't used DB2 in about 5 years but I'm being told it still has no sequences or auto_increment!'

Bummer, it is a nice little feature. The DB just automatically gens the ID in the same transaction as the insert statement and no repeat numbers (without manual intervention by the DBO). Nice and clean if you decide to go that route in your DB design.

#
On November 24, 2004 11:11 AM Steve Friedl added:

DB2 version 6.x (which I still use for a small project) had no sequences, which really shocked me: I heard tell that it was "coming soon", and "soon" is likely to have already arrived, as they are on version 8.x now.

#
On November 24, 2004 07:55 PM Dave added:

What open source projects actually achieve "database compatibility"? It's not worth it - pick a database (NOT mysql) exclusively.

Your post also seems to contradict itself - first it seems you're against database abstraction/compatibility but then you conclude by saying it's worth the hassle.

#
On November 25, 2004 01:50 PM Jeremy Zawodny added:

Actually, your description of MySQL isn't quite right. It doesn't need to go fetch the max value from the table. At least in the case of a MyISAM table (which 95% of people use), it already has computed what the next auto-increment value is. It exists as a bit of metadata that's part of the table. This is an optimization, obviously.

#
On November 25, 2004 07:54 PM SC added:
On November 28, 2004 03:55 PM Eric added:

I agree with Mark Matthews. If your application is dependant upon a number that has no meaning to actually relate to your data, then there's something wrong.

#
On November 28, 2004 10:35 PM pbw added:

'I agree with Mark Matthews. If your application is dependant upon a number that has no meaning to actually relate to your data, then there's something wrong.'

All true in theory. However, what Kasia said was true IMO, you are asking for trouble if you reuse numbers. What if you want to remove data from your DB and move it to a data warehouse?

Not saying you can't make it work, but like Kasia implied, I've seen where reusing IDs caused some strange problems in programs or trying to trace data history.

To be save, I wouldn't reuse numbers. How many numbers are you really going to reclaim anyway in most cases? How many times has anyone had to increase the size of field to hold a bigger number? Odds are not many times, if at all.

#
Trackbacks