NSFileManager getRelationship:ofDirectoryAtURL:toItemAtURL:error: returning NSURLRelationshipSame for Different Directories

I'll try to ask a question that makes sense this time :) . I'm using the following method on NSFileManager:

  • (BOOL) getRelationship:(NSURLRelationship *) outRelationship ofDirectoryAtURL:(NSURL *) directoryURL toItemAtURL:(NSURL *) otherURL error:(NSError * *) error;
  • Sets 'outRelationship' to NSURLRelationshipContains if the directory at 'directoryURL' directly or indirectly contains the item at 'otherURL', meaning 'directoryURL' is found while enumerating parent URLs starting from 'otherURL'. Sets 'outRelationship' to NSURLRelationshipSame if 'directoryURL' and 'otherURL' locate the same item, meaning they have the same NSURLFileResourceIdentifierKey value. If 'directoryURL' is not a directory, or does not contain 'otherURL' and they do not locate the same file, then sets 'outRelationship' to NSURLRelationshipOther. If an error occurs, returns NO and sets 'error'.

So this method falsely returns NSURLRelationshipSame for different directories. One is empty, one is not. Really weird behavior. Two file path urls pointing to two different file paths have the same NSURLFileResourceIdentifierKey? Could it be related to https://developer.apple.com/forums/thread/813641 ?

One url in the check lived at the same file path as the other url at one time (but no longer does). No symlinks or anything going on. Just plain directory urls.

And YES calling -removeCachedResourceValueForKey: with NSURLFileResourceIdentifierKey causes proper result of NSURLRelationshipOther to be returned. And I'm doing the check on a background queue.

So this method falsely returns NSURLRelationshipSame for different directories. One is empty, one is not. Really weird behavior.

Do you know where/what the directories "were"? The problem here is that there's a pretty wide variation between the "basic" case of "a bunch of files and directories sitting on a standard volume" and "the range of ALL possible edge cases".

Two file path URLs pointing to two different file paths have the same NSURLFileResourceIdentifierKey?

Yes, this is possible. As one example, the data volume basically ends up in the hierarchy "twice" meaning that, for example, the path "/System/Volumes/Data/Users/" and "/Users/" are in fact the same directory. And, yes, getRelationship returns NSURLRelationshipSame for those directories.

Now, this:

One is empty, one is not.

...is definitely "weirder". Ignoring the cache issue below, I don't think you could do it within a standard volume, but you might be able to do it using multiple volumes, particularly duplicated disk image and/or network file systems.

However, in this case:

Could it be related to https://developer.apple.com/forums/thread/813641?

One URL in the check lived at the same file path as the other URL at one time (but no longer does). No symlinks or anything going on. Just plain directory URLs.

...yes, it's a/the cache. The proof of that is this:

And YES calling -removeCachedResourceValueForKey: with NSURLFileResourceIdentifierKey causes the proper result of NSURLRelationshipOther to be returned. And I'm doing the check on a background queue.

...since any issue that is fixed by clearing the cache is, by definition, "caused" by the cache. That's a good excuse to revisit this thread here, which I'm afraid I missed:

Could it be related to https://developer.apple.com/forums/thread/813641 ?

The core of the issue here is the inherent tension between a few facts:

  1. The entire file system is essentially a lock-free database being simultaneously modified by an unconstrained number of processes/threads.

  2. Your ability to monitor file system state is relatively limited. Basically, you can either ask for the current state and receive an answer with unknown latency or ask the system to update you as things change, at which point you'll receive a stream of events... with unknown latency.

  3. Accessing the file system is sufficiently slow that it's worth avoiding/minimizing that access.

Jumping back to here, there's actually a VERY straightforward way to do this:

Two file path URLs pointing to two different file paths have the same NSURLFileResourceIdentifierKey?

That is, have two processes where:

Process 1 calls "getRelationship".

Process 2 manipulates the file system such that the following sequence occurs:

  1. Process 1 retrieves the metadata of the source object.
  2. Process 2 deletes the existing directory at the target location.
  3. Process 2 moves the source object to the target location.
  4. Process 2 deletes the contents of the target object.
  5. Process 1 retrieves the metadata of the target object.

...and process 1 now compares #1 and #5, returning NSURLRelationshipSame because they are in fact the same. Now, you might say this seems far-fetched/impossible to time; however, I never said process 2 was running on the same system. With SMB over a slow connection, I suspect you could replicate the scenario above pretty easily.

The point here is that the system’s caching behavior is simply one dynamic among many. That is, caching increases the probability of strange behavior (like the one above) because it increases the time gap between #1 and #5, and the wider the gap between actions, the more likely it is that "something" has changed. However, you can't actually shrink the gap to the point where it goes away.

One solution to these issues is for the interested processes to communicate with each other to coordinate their actions (for example, by using "File Coordination“). However, that requires all of the processes involved to participate in that mechanism, which they definitely don't today.

Realistically, the reason this all isn't a total disaster is that most of the activity here is either:

  • Directly controlled/managed by the user, who is both being careful about what they does and moving "slow" enough that collisions don't happen.

OR

  • Happening in "private" parts of the file system where only one "entity" is manipulating the data (for example, an app’s data container).

All of which leads to the big question... what are you actually trying to do?

If this is a one-off event that you're concerned/confused about, then the answer is basically, yes, the file system can be way weirder than it looks, and sometimes that means calling removeCachedResourceValueForKey "just in case".

However, if this is something that is a recurring problem for your app, then it might be worth stepping back and rethinking your approach to minimize the possibility and consequences of these kinds of "oddities".

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Thanks for the reply! I actually stumbled across this while reworking things in my app to account for NSURL caching behavior I mentioned in the other thread.

What I was doing not too long ago was using an NSCache on top of NSURL for resource values. At some point when responding to metadata changes I was calling -removeCacheResourceValues on a background thread to get refreshed data and I had discovered that -removeCacheResources could crash if another thread was reading at the same time. I guess at some point in my frustration I just moved some stuff around to stop the crashes (and I did).

I had either forgotten or just never realized that NSURL caches only for a run loop turn (or maybe just sometimes? More on that in a second). I guess this is cool in the middle of a dragging session but apparently at some point I must've just assumed that NSURL must be caching for a more meaningful period of time (from the perspective of my app anyway) because if I didn't call -removeCachedResources I'd get stale values sometimes. SO why cache on top of a cache? And I chucked my NSCache which I never really loved but apparently that was a mistake. My bad.

I guess my wish would be for NSURL to either cache forever until explicitly clear values or don't cache at all because if we're caching on a cache that may not be a cache but sometimes it seems like a cache it's hard to cache. Maybe I'm just being selfish though.

But back to the collision. So I'm reworking all this (not using NSCache this time). Now as I'm rewriting my caching code I commented out few things here and there checking some error handling code paths that seem extremely unlikely to really occur and I stumble across this collision but there are many run loop turns in between these events so I don't understanding why the cached values are living for so long in this particular case. Maybe something like cancelPreviousPerformRequestsWithTarget causes cached values to live longer but I'm not suppose to worry about the implementation details.

I can easily reproduce this with NSFileManager using the following steps:

  1. -trashItemURL:resultingItemURL: - grab the resultingItemURL.
  2. Put an empty new folder in the exact same location you just trashed.
  3. Compare the NSURLFileResourceIdentifierKey of the URLs you got from resultingItemURL with new folder at its old location and they match - until you programmatically remove the cached value.

I guess my wish would be for NSURL to either cache forever until explicitly cleared values or not cache at all because if we're caching on a cache that may not be a cache but sometimes it seems like a cache it's hard to cache.

So, the first issue here is that "not caching at all" isn't really an option. Most of the data you retrieve from NSURL all came from the same API (getattrlist) and, much of the time, that data is ALWAYS retrieved in every call. getattrlist() is a "bulk" fetch API (it's designed to return a bunch of data at once) and the vast majority of the performance cost here is the cost of the syscall itself, NOT the retrieval of the data itself or the copy out of the kernel. Putting that in concrete terms, let’s say you ask for "all" of the times for a file (ATTR_CMN_CRTIME, ATTR_CMN_MODTIME, ATTR_CMN_CHGTIME, ATTR_CMN_ACCTIME, ATTR_CMN_BKUPTIME):

  • Basically "every" file system is going to end up storing all of those values inside some kind of file system-specific structure, so the only "cost" here is the act of finding that record, not the individual time.

  • All the values involved are so small that there isn't that the transit cost "out" of the kernel is basically fixed.

...so asking for one of them costs exactly the same as asking for all 5.

Putting that another way, there's a fundamental disconnect between how file system calls work and how NSURL works. File system APIs are built as "retrieval APIs" which return as much data as possible in a single call (stat being an obvious example). All of the data returned by each system call represents the exact state of that object at a particular "instant" in time. It may not be right "now" (the file system can be constantly changing) but it WAS right at some moment in time.

On the other hand, NSURL (and lots of other API layers) want to let you retrieve individual elements separately, but that means the API then needs to decide whether to:

  1. Return the data it retrieved in an earlier call, which is both faster and provides a more "coherent" picture of the file system state, since the data being retrieved is coming from the same "fetch".

  2. Fetch new data, which is more accurate but creates inconsistent results between the "current" state and the "previous" state.

ACTUALLY doing #2 for every call is a terrible idea for both performance and coherence issues, but that means we're basically stuck trying to sort out when to reset, not if we're going to cache.

As a side note here, an API like URL.resourceValues(forKeys:) gets you much closer to how the file system itself works, since you're not retrieving a fixed dictionary from a particular instant, NOT an ambiguous data smear.

I can easily reproduce this with NSFileManager using the following steps:

  1. -trashItemURL:resultingItemURL: - grab the resultingItemURL.
  2. Put an empty new folder in the exact same location you just trashed.
  3. Compare the NSURLFileResourceIdentifierKey of the URLs you got from resultingItemURL with the new folder at its old location and they match - until you programmatically remove the cached value.

Huh. That's really weird. How did you construct those URLs? Are you building them from string paths or getting them from the system (like through an open panel or by enumerating the directory)? What does "isFileReferenceURL” mean and what happens if you do the same check but call "fileReferenceURL" on both URLs first?

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

ACTUALLY doing #2 for every call is a terrible idea for both performance and coherence issues, but that means we're basically stuck trying to sort out when to reset, not if we're going to cache.

I agree. IMO the problem is not that NSURL is caching the problem is the way it caches. The way it caches forces me to cache on top of it. The documentation claims it only caches for 1 run loop turn but as previously mentioned that is not always the case and certain values tend to get 'stuck.'

Building a cache on top of NSURL resource values which may or may not be stale can cause all sorts of weird behavior if you don't call -removeCacheResourceValues so I can cache on the true value...but NSURL I assume makes its cache thread safe so -removeCacheResourceValues probably isn't so cheap. Doesn't that mean the url cache is costing me performance by requiring me to clear it to get to the true values I really want to cache?

Huh. That's really weird. How did you construct those URLs?

Originally the URL came through NSFIlemanager enumeration, or maybe -createDirectoryAtURL: I can't remember. I'll have to try it out later when I have a little bit more time.

But I just stumbled across some really weird behavior when passing a file type from Finder to my app. It could be unrelated but I wouldn't be completely surprised if it was related to this topic. I might file a bug on that later. It would be great if this forum supported private messages I'm not sure if I'm ready to provide more details yet in the open

The documentation claims it only caches for 1 run loop turn, but as previously mentioned, that is not always the case, and certain values tend to get 'stuck.'

FYI, I think there are actually two different issues at work here:

  1. The run loop itself doesn't actually "turn" at a predictable rate. Depending on how your app is architected and the overall app state, it's entirely possible for an app to go seconds or even minutes without the main thread ever running.

  2. The documentation says that values are "automatically removed after each pass through the run loop", but that's not quite accurate. NSURL is tracking the main loop activity through a runloop observer, but it doesn't actually flush the cache until the first time "something" tries to access that URL from the main thread. If nothing on the main thread accesses that URL, then it could theoretically return the old values "forever".

...with #2 obviously being the most significant issue.

Building a cache on top of NSURL resource values, which may or may not be stale, can cause all sorts of weird behavior if you don't call -removeCacheResourceValues so I can cache on the true value...but NSURL, I assume, makes its cache thread-safe, so -removeCacheResourceValues probably isn't so cheap. Doesn't that mean the URL cache is costing me performance by requiring me to clear it to get to the true values I really want to cache?

Hypothetically, yes, but if you ACTUALLY run into performance, then I think you have a bigger issue. In terms of the lock itself, there's an os_unfair_lock that's used to protect access to the data, which means the cost of uncontested access is fairly minimal. The problem here is that having contention means that you have multiple threads attempting to manage/manipulate the same file at the same time... which is a bad idea regardless of performance.

That leads back to here:

Building a cache on top of NSURL resource values

The real question here is basically "what are you trying to do"? The problem here is that NSURL is basically a low-level primitive, not really "the" solution for file tracking. For example:

  1. Document-based apps are better off using a class like NSDocument, which manages things like file coordination and safe saves.

  2. Longer-term file tracking is better done with bookmarks, since they're harder to break and allow an app to restore access to the target as needed.

  3. Apps that manipulate files "in bulk" often end up using lower-level APIs to improve performance.

One final note here is that it's not difficult to get an NSURL object that doesn't have the automatic flushing behavior. All you need to do is take the NSURL you're starting with, pass it (or whatever API fits what you're starting with) into CFURLCreateFilePathURL() (to create a CFURLRef), then cast that CFURLRef back to NSURL. Toll-free bridging means that CFURLRef can be used exactly like an NSURL, so the only difference is that it won't free its own cache.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

The run loop itself doesn't actually "turn" at a predictable rate. Depending on how your app is architected and the overall app state, it's entirely possible for an app to go seconds or even minutes without the main thread ever running.

Doesn't appear to be what's going on in this case. I made this dumb little test which can easily reproduce the issue (sorry can't get code to format well on these forums).

+(MachoManURLTester*)sharedTester
{
	static MachoManURLTester *sharedTester = nil;
	
	static dispatch_once_t token;
	dispatch_once(&token,^{
		sharedTester = [[self alloc]init];
	});
	return sharedTester;
}

-(void)startURLTrashDance
{
	NSAssert(NSThread.currentThread.isMainThread, @"Main thread only.");
	
	NSFileManager *fm = [NSFileManager defaultManager];
	NSURL *wrapperDir = [[NSURL fileURLWithPath:NSTemporaryDirectory() isDirectory:YES] URLByAppendingPathComponent:NSUUID.UUID.UUIDString isDirectory:YES];
	if (![fm createDirectoryAtURL:wrapperDir withIntermediateDirectories:YES attributes:nil error:nil])
		{
			NSLog(@"Test failed");
			return;
		}
	
	//[[NSWorkspace sharedWorkspace] activateFileViewerSelectingURLs:@[wrapperDir]];
	
	NSURL *untitledFour = [wrapperDir URLByAppendingPathComponent:@"Untitled 4" isDirectory:YES];
	if (![fm createDirectoryAtURL:untitledFour withIntermediateDirectories:YES attributes:nil error:nil])
	{
		NSLog(@"Test failed");
		return;
	}
	
	NSLog(@"Created untitled 4.");
	
	NSURL *resultingURL = nil;
	
	if (![fm trashItemAtURL:untitledFour resultingItemURL:&resultingURL error:nil])
		{
			NSLog(@"trash failed");
			return;
		}	
	
	NSLog(@"Moved Untitled 4 to the trash.");
	
	[self performSelector:@selector(replaceTrashedURL:) withObject:untitledFour afterDelay:1.0];
	[self performSelector:@selector(compareBothURLS:) withObject:@[untitledFour,resultingURL] afterDelay:4.0];
	
}


-(void)replaceTrashedURL:(NSURL*)originalURL
{
	NSFileManager *fm = [NSFileManager defaultManager];
	if ([fm createDirectoryAtURL:originalURL withIntermediateDirectories:YES attributes:nil error:nil])
	{
		NSLog(@"Recreated Untitled 4");
	}
}

-(void)compareBothURLS:(NSArray<NSURL*>*)twoURLsArray
{
	NSLog(@"4 seconds is up - let's check");
	NSFileManager *fm = [NSFileManager defaultManager];
	NSURL *untitledFour = twoURLsArray.firstObject;
	NSURL *resultingURL = twoURLsArray.lastObject;
	
	// Uncomment these fixes the relationship check:
	//[untitledFour removeCachedResourceValueForKey:NSURLFileResourceIdentifierKey];
	//[resultingURL removeCachedResourceValueForKey:NSURLFileResourceIdentifierKey];
	
	NSURLRelationship relationship;
	NSError *error = nil;
	if ([fm getRelationship:&relationship ofDirectoryAtURL:untitledFour toItemAtURL:resultingURL error:&error])
		{
			if (relationship == NSURLRelationshipSame)
				{
					NSLog(@"NSURLRelationshipSame: %@ - %@?",untitledFour,resultingURL);
				}
			else if (relationship == NSURLRelationshipContains)
				{
					NSLog(@"NSURLRelationshipContains");
				}
			else  if (relationship == NSURLRelationshipOther)
				{
					NSLog(@"NSURLRelationshipOther");
				}
			else {
				NSLog(@"Unknown");
			}
		}			
	else 
		{
			NSLog(@"Error reading relationship: %@",error);
		}
}

@end

Just use that class and do this in a test program.

	MachoManURLTester *URLTester = [MachoManURLTester sharedTester];
	[URLTester startURLTrashDance];

And to answer your earlier question, YES the file reference urls do collide.

NSFileManager getRelationship:ofDirectoryAtURL:toItemAtURL:error: returning NSURLRelationshipSame for Different Directories
 
 
Q