About Jay Jay

Hi there. My name is John Elliot V. My friends call me Jay Jay. I talk about technology on my blog at blog.jj5.net and make videos about electronics on my YouTube channel @InTheLabWithJayJay.

FS variable in awk

I was reading about environment variables and I also found this article Internal Variables that describes the variables used by bash. In reading that I learned about the awk FS variable which aids in field splitting. See page 146 of sed & awk by Dougherty and Robbins for details, but basically you can set FS to a single character to have lines split into fields based on that character, or you can specify a regular expression such as “\t+” (any number of tabs separates fields) or “[,;]” (a single comma or fullstop will separate fields).

Environment Variables and Secure Programming for Linux

I read the Environment Variables section of Secure Programming for Linux and Unix HOWTO and learned about the IFS environment variable.

I also read CS 15-392 Secure Programming – Environment Variables.

The IFS environment variable is the “internal field separator” and it is typically space, tab, new line. I.e. white space used to separate fields. So in bash you can delete the IFR variable and it will default to ” \t\n” or you can set it explicitly to that value. So that explains why I found a script that unset the IFR variable — it’s a secure programming practice.

File names on Windows

I’ve been reading up on file names in Windows because I’m having a problem with my C++ code processing a file with an odd TM character in the file name. I’m not sure why, but it seems that file names returned by the POSIX readdir function don’t necessarily exist when then given to ifstream.open, or some weird character encoding thing is going on. Hopefully I get to the bottom of it. It’s a complete fluke that I actually had a file this failed on available and did testing on it, lucky I guess.

I did some more reading and discovered that there are ‘wide character’ versions of the file functions for Windows that use UTF-16 encoded strings rather than ‘code page’ encoded strings, I guess. Anyway, I don’t think I’m going to bother with such things, if the file can’t be opened because it has a weird character in it then I’ll just fail with an error message and the user can look at fixing it. This program is only being “developed and tested” in Windows, there are no plans to actually run it on Windows, it will run on Linux, which won’t have this weird character encoding issue.

The difference between delete and delete[] in C++

I just learned about the difference between delete and delete[] in C++ by reading this article from StackOverflow. Basically you use delete[] to delete arrays, and delete for everything else. There was conflicting information about whether C++ runs the destructors on objects in an array when the array is deleted. Some people said it did, others said it didn’t. I should do an experiment to see one day. The reason for the difference between delete and delete[] seems to be that when C++ allocates an array it allocates memory for storing the size of the array as well as the array elements, and then returns a pointer to the first array element, which is beyond the start of the allocated memory, because the array size takes up the first bit of space.

Pcdedupe

I’m working on a new ProgClub project called pcdedupe. It’s a file system de-duplicator and it’s a C++ system based on rdfind. I haven’t created the project page on the wiki yet, but the source code is available.

Basically I’m going to take a new angle on the rdfind software and tailor it to suit my particular environment (I have ten million files with massive duplication and rdfind isn’t optimised for that kind of scale).

I’ve set up a new file server

I’ve been having some fun over the last day or two looking over all my old files. I’ve got files that go back as far as 1999 in my archives. I’ve found my old blog database and associated files, so I hope to get that back up again soon, and I found some old code that I’ve been looking for (I don’t want to have to write it again!).

So my new file server has 6TB of storage as 3 x 2TB partitions. I can fit all my data in 1.3TB of space, so I’m planning to have one file share, and then a backup of that onto another partition. I have 10,174,633 files in my archive folder, and many more in my media, download and home folders. I might publish some more stats once du -s has finished processing. :)

I’m running Ubuntu 10.04 LTS Server as my file server. I tried to setup the Desktop version but it wouldn’t play nice with my nVidia graphics card.