July 15, 2004
My Subversion annoyances list

So far one on the list that I discovered just today. When you request Subversion provides you with a diff between current version and previous revision (svn diff $file) it looks at the timestamp of the current file and the version in repository. Given the timestamps are identical, diff always returns false.. This is a problem.

I discovered this when running test conversions from our ClearCase repository.. turns out that sometimes my script is just too fast and a version of a file may have the same timestamp as the previous revision which means subversion merrily ignores the change and doesn't commit it. Ironically, this would be less of a problem on windows since windows stores milliseconds not just seconds as unix.. but that's a whole other issue. Of course this is pretty easy to fix in my script.. just touch the file with its original revision date and subversion is happy to acknowledge the changes.. but I see this as a weak point in what I thought was a pretty well thought out scm tool.

The reasoning behind it is of course performance.. it's a lot faster to determine if a file has changed if you do not have to actually compare two files.. and size isn't the most reliable indicator. Well turns out.. timestamp isn't necessarily reliable either. Now if you put the two together.. size & timestamp in a hash you have a much higher chance of being correct when making the "is the file changed" assumption. Which is what I suggested on the Subversion users list and, as everything else in life, it just isn't quite so simple -- it appears the current implementation provides no simple access to the size information of the in-repository-revision.

Given that my original suggestion isn't something easily accomplished, I hope my second suggestion of allowed the user to override this default behavior (of course, performance will suffer but reliability will improve for scripted commits) will be implemented.

After about a month of pretty heavy usage and testing of subversion this is so far my biggest issue -- one not likely to be encountered in day-to-day usage. Not bad at all, my CVS list is a lot longer. Of course.. I haven't really started on hooks and heavy scripting yet..

Posted July 15, 2004 06:48 PM in development tools
TrackBack URL for this entry: http://www.unix-girl.com/mt/mt-tb.cgi/1280
Comments
On July 15, 2004 08:01 PM garrett added:

Note that I'm not sure you can actually reproduce this under normal use conditions without forcing the issue and reseting the timestamp manually.

When Subversion modifies a file it makes sure to sleep until the timestamp is at least 1 second in the past before returning control to the user.

I suspect the only reason you're seeing it is that when you check the next revision out of ClearCase it's setting the timestamp for you. If you were using a normal editor (which would not set the timestamp like ClearCase seems to be doing) you would not hit this problem.

#
On July 15, 2004 08:08 PM kasia added:

That occured to me as well.. so I checked for it. ClearCase isn't setting the timestamp when I get the files.

I'm running this on the ClearCase server as svn is faster remotely than cc.. so I can get the revisions fairly quickly - the only bottleneck is the network connection to subversion.. so this only occurs when the previous revision resulted in a very fast commit and the script that does in fact get the next verison within the same second..

It's entirely possible that this is specific to Solaris.. I haven't tested it on a linux box (don't have any clearcase linux licenses available).

#
On July 15, 2004 10:51 PM Michael Koziarski added:

Are you sure it's not clearcase? I'm using the windows client here, and there's a "Preserve original file modification time" option that's off by default.

#
On July 15, 2004 10:53 PM kasia added:

Yes, that's not even an option with the client I'm using.

#
On July 16, 2004 10:01 AM Pfunk added:

I would think that using an md5 of each file would be the most effective, yet also time consuming method to verify changes. Shame that it doesn't have an option to choose the test method with. You could check 'time, time&size, md5 or gpg signature' via a conf file.

#
On July 16, 2004 10:03 AM kasia added:

Well.. that would make the process a whole lot slower.. Getting the size in addition to timestamp is not a big deal since both come from the inode.. but getting an md5 hash of the content is a *lot* slower and more involved.

#
On July 16, 2004 12:12 PM Aristotle Pagaltzis added:

garret:
> (which would not set the timestamp like ClearCase seems to be doing)

How would that work? A file always has a timestamp.

#
On July 16, 2004 02:13 PM Matt added:

Ironically, when using VSS under Win32 we typically wind up comparing files by checksum because VSS is more than a little bit buggy in terms of timestamps.

We will not be using VSS in our next project :)

#
On July 19, 2004 10:16 PM garrett added:

When I said ClearCase was 'setting the timestamp' I assumed it was preserving the original timestamp or something, i.e. setting it to whatever it was when the file was checked in. Apparently that's not the case though, at least with the version of ClearCase Kasia is using.

In theory Subversion shouldn't be allowing this, but perhaps there's a case where we're not sleeping the appropriate amount of time to ensure that the timestamp based comparison is valid. I know we do that when you do an 'svn update', but perhaps we're not doing it on commit or something. Certainly seems possible.

#
On July 19, 2004 10:21 PM garrett added:

Oh, and it is certainly possible that different filesystems could screw up the code that tries to sleep to ensure timestamp integrity. There's been discussion lately on the lists about how Fat32 filesystems actually have only a 2 second granularity, so there seem to be some odd edge cases there as a result.

In any case, my previous theory about us not sleeping to ensure timestamp integrity on commit is wrong. We do have a call to svn_sleep_for_timestamps in svn_client_commit.

IIRC ClearCase operates as a filesystem plugin, right? Perhaps that could be causing a problem somehow...

I don't know, I'm just throwing out random ideas at this point, I've never even used ClearCase ;-)

#
On August 26, 2004 03:02 AM Peter Pentchev added:

I haven't tried Subversion yet, but if it has any kind of keyword substitution, the size/md5/whatever checks would be well nigh impossible to implement on the server side: the server would have to precalculate and store this information for any and all possible checkout modes (in CVS there's the default -kkv, then there's -kk, -ko, -kv, -kb, and even -kkvl).

Alternatively, this information could be kept at the client's end.. and this actually doesn't sound so crazy or hard to implement. LazyWeb time, anyone? :)

#
Trackbacks