File names on Windows

I’ve been reading up on file names in Windows because I’m having a problem with my C++ code processing a file with an odd TM character in the file name. I’m not sure why, but it seems that file names returned by the POSIX readdir function don’t necessarily exist when then given to ifstream.open, or some weird character encoding thing is going on. Hopefully I get to the bottom of it. It’s a complete fluke that I actually had a file this failed on available and did testing on it, lucky I guess.

I did some more reading and discovered that there are ‘wide character’ versions of the file functions for Windows that use UTF-16 encoded strings rather than ‘code page’ encoded strings, I guess. Anyway, I don’t think I’m going to bother with such things, if the file can’t be opened because it has a weird character in it then I’ll just fail with an error message and the user can look at fixing it. This program is only being “developed and tested” in Windows, there are no plans to actually run it on Windows, it will run on Linux, which won’t have this weird character encoding issue.

The difference between delete and delete[] in C++

I just learned about the difference between delete and delete[] in C++ by reading this article from StackOverflow. Basically you use delete[] to delete arrays, and delete for everything else. There was conflicting information about whether C++ runs the destructors on objects in an array when the array is deleted. Some people said it did, others said it didn’t. I should do an experiment to see one day. The reason for the difference between delete and delete[] seems to be that when C++ allocates an array it allocates memory for storing the size of the array as well as the array elements, and then returns a pointer to the first array element, which is beyond the start of the allocated memory, because the array size takes up the first bit of space.

Pcdedupe

I’m working on a new ProgClub project called pcdedupe. It’s a file system de-duplicator and it’s a C++ system based on rdfind. I haven’t created the project page on the wiki yet, but the source code is available.

Basically I’m going to take a new angle on the rdfind software and tailor it to suit my particular environment (I have ten million files with massive duplication and rdfind isn’t optimised for that kind of scale).