What files is my program trying to access?

In this example I’m using hashdeep. I’m redirecting the output of two hash sets to two different files. I am doing that with the following command:

hashdeep -rj0 /path-to-drive-1 > hashes.drive1

and

hashdeep -rj0 /path-to-drive-2 > hashes.drive2

I have those running in their own terminal windows. I then optionally have another two windows open running a tail on them so I can monitor the files:

tail -f hashes.drive1

The hard drives are located in an external multi-bay enclosure and all hard drive LEDs are flashing away like mad. A good sign. But every now and then I’ll run an ‘ls’ to see where the files are at (checking file size) or alternatively (and usually better but more resource intensive) a line count of the hash files. Given I know how many files there should be, the line count gives a fair indication of the progress of the whole process.

wc -l hashes.drive*

In today’s example I was simply doing a file size comparison of the two hashes vs a known hashset of one of the drives that was a month old. The sizes should be relatively similar. I was getting results similar to:

madivad@server:~$ ls -al hashes*
-rw-rw-r-- 1 madivad madivad 330483319 Feb 11 09:26 hash.drive1.1602
-rw-rw-r-- 1 madivad madivad 341570757 Mar 23 12:09 hash.drive1.1603
-rw-rw-r-- 1 madivad madivad 243344728 Mar 23 11:18 hash.drive2.1603

The fact that drive1.1603 is larger is of no consequence, there are just more files to consider.

After running the above check for sometime, I realised that one of the files (in this case drive1.1603) had stalled for several hours. I’m not exactly sure when it seemed to stop growing, but doing a tail of the file confirmed it was stopped. The last output was an inconsequential .DS_Store file roughly 6K in size. After physically monitoring it for some time I began to get concerned about this. I could see the all 4 RAID drives getting activity, but nothing was being recorded. The 5th drive, the backup, was hashing away without a problem and the log file was growing as expected.

After some quick research I came across this stack exchange Q&A ( How do I know which file a program is trying to access? )

The first answer provided a solution that worked best with my scenario:

lsof -c hashdeep

I’d never seen this output before but very quickly I could see the important pieces of information it had dumped out. Namely:

madivad@server:~$ lsof -c hashdeep
COMMAND  PID  USER    FD TYPE DEVICE     SIZE/OFF  NODE       NAME
hashdeep 2539 madivad 1w REG  252,0     243344728  5535319    /home/madivad/hash.drive1.1603
hashdeep 2539 madivad 3r REG  259,0  499418030080  113639426  /path1/largeFiles/a-very-big-image-of-500GB.img
hashdeep 2552 madivad 1w REG  252,0     341611062  5535320    /home/madivad/hash.drive2.1603
hashdeep 2552 madivad 3r REG  8,33     3152347139  126025746  /path2/misc/random.file

The ‘w’ of FD with ‘1w’ signifies the file is being written and that the file being written was hash.drive1.1603

The ‘r’ of FD with ‘3r’ signifies the file is being read for hashing purposes, and that file is a very large file that I know is around 500GB. Running the command again shows me the second file being read in had changed, yet the first had stayed the same.

Given the file is very large and will take considerable time to hash and that the hard drive LEDs are flashing, I realised all was good in the world and I could move on with the days activities.

UPDATE: after reading the man page on lsof I found a better way to monitor the continual progress of it was to run it with the -r “repeat” switch which defaults to 15 seconds, which could be updated more or less frequently by adding a numerical component:

lsof -r 5 -c hashdeep

Leave a comment