What files is my program trying to access?

In this example I’m using hashdeep. I’m redirecting the output of two hash sets to two different files. I am doing that with the following command:

hashdeep -rj0 /path-to-drive-1 > hashes.drive1

and

hashdeep -rj0 /path-to-drive-2 > hashes.drive2

I have those running in their own terminal windows. I then optionally have another two windows open running a tail on them so I can monitor the files:

tail -f hashes.drive1

The hard drives are located in an external multi-bay enclosure and all hard drive LEDs are flashing away like mad. A good sign. But every now and then I’ll run an ‘ls’ to see where the files are at (checking file size) or alternatively (and usually better but more resource intensive) a line count of the hash files. Given I know how many files there should be, the line count gives a fair indication of the progress of the whole process.

wc -l hashes.drive*

In today’s example I was simply doing a file size comparison of the two hashes vs a known hashset of one of the drives that was a month old. The sizes should be relatively similar. I was getting results similar to:

madivad@server:~$ ls -al hashes*
-rw-rw-r-- 1 madivad madivad 330483319 Feb 11 09:26 hash.drive1.1602
-rw-rw-r-- 1 madivad madivad 341570757 Mar 23 12:09 hash.drive1.1603
-rw-rw-r-- 1 madivad madivad 243344728 Mar 23 11:18 hash.drive2.1603

The fact that drive1.1603 is larger is of no consequence, there are just more files to consider.

After running the above check for sometime, I realised that one of the files (in this case drive1.1603) had stalled for several hours. I’m not exactly sure when it seemed to stop growing, but doing a tail of the file confirmed it was stopped. The last output was an inconsequential .DS_Store file roughly 6K in size. After physically monitoring it for some time I began to get concerned about this. I could see the all 4 RAID drives getting activity, but nothing was being recorded. The 5th drive, the backup, was hashing away without a problem and the log file was growing as expected.

After some quick research I came across this stack exchange Q&A ( How do I know which file a program is trying to access? )

The first answer provided a solution that worked best with my scenario:

lsof -c hashdeep

I’d never seen this output before but very quickly I could see the important pieces of information it had dumped out. Namely:

madivad@server:~$ lsof -c hashdeep
COMMAND  PID  USER    FD TYPE DEVICE     SIZE/OFF  NODE       NAME
hashdeep 2539 madivad 1w REG  252,0     243344728  5535319    /home/madivad/hash.drive1.1603
hashdeep 2539 madivad 3r REG  259,0  499418030080  113639426  /path1/largeFiles/a-very-big-image-of-500GB.img
hashdeep 2552 madivad 1w REG  252,0     341611062  5535320    /home/madivad/hash.drive2.1603
hashdeep 2552 madivad 3r REG  8,33     3152347139  126025746  /path2/misc/random.file

The ‘w’ of FD with ‘1w’ signifies the file is being written and that the file being written was hash.drive1.1603

The ‘r’ of FD with ‘3r’ signifies the file is being read for hashing purposes, and that file is a very large file that I know is around 500GB. Running the command again shows me the second file being read in had changed, yet the first had stayed the same.

Given the file is very large and will take considerable time to hash and that the hard drive LEDs are flashing, I realised all was good in the world and I could move on with the days activities.

UPDATE: after reading the man page on lsof I found a better way to monitor the continual progress of it was to run it with the -r “repeat” switch which defaults to 15 seconds, which could be updated more or less frequently by adding a numerical component:

lsof -r 5 -c hashdeep

Adding to the Favourites Sidebar

drag and drop folder to osx favourite sidebarWhen that happens, there are two other options:

  • File > Add to Sidebar
  • Keyboard shortcuts

Sometimes the File menu option is not available, but the keyboard shortcut CTRL + CMD + T is usually right to go.

You could try dragging a subfolder into place and then ESCAPEing out of it. This will usually open up the past the re-add the folder you couldn’t add a moment ago.

reference: http://apple.stackexchange.com/questions/139305/how-can-i-add-new-folders-to-the-favorites-in-the-finder-sidebar/226776#226776

How to setup BASH custom prompt in Ubuntu

I wanted two things:

  • the time in my prompt
  • a colour prompt

Basics:

  • it’s located in: ~/.bashrc
  • uses Environment Variable: PS1
  • time is inserted using: \t

time

I started here: http://www.cyberciti.biz/tips/howto-linux-unix-bash-shell-setup-prompt.html

colour

http://www.cyberciti.biz/faq/bash-shell-change-the-color-of-my-shell-prompt-under-linux-or-unix/

PixelBeat discussion on coloured command prompts

http://www.pixelbeat.org/docs/terminal_colours/

how to guid for customing the command prompt

http://tldp.org/HOWTO/Bash-Prompt-HOWTO/

md5deep and finding duplicate files

I did this in several steps:

  1. create an md5 dump of the files system in question. For this I used my favourite command line tool `md5deep`,
  2. sort the file based on file size then md5 hash (sorted from largest to smallest), and
  3. remove entries that are not duplicate entries.

Although this could be completed in one step, I have broken it into several steps for my own clarity.

generate hashes using md5deep

simple enough:

md5deep -rze ~/documents/ > dump.md5.1 

 -r : recurse subdirectories
 -z : includes file size
 -e : displays progress on each file

recreate the hashes without filesize (optional)

I wanted the hashes in two forms, one with file sizes, and one without. I first created the hash set with file sizes and then parsed that to remove it from the front of the line. This is a lot quicker than hashing all the files twice.

sed to the rescue

sed -e 's/^[ ]*[0-9]*  //' dump.md5.1 > dump.md5.2

simply enough, I know the line commences with any number of spaces (including zero spaces), then a number that is the file size. This can only be a set of digits, and finally two spaces (unseen between the final asterisk and double slash).

 dav3@dubunt:~$ tail temp.md5
   2111 4ceaa380370a537da5c0e36f932df537 /var/log/syslog.7.gz
  86400 4efaef324ae83db0549e199a3685cdc3 /var/log/wtmp
 346938 9a254850fcd98d94eb991f591eff4772 /var/log/udev
 109440 8353823cde9dcfd415e2c639e2db4924 /var/log/wtmp.1
  56641 1e0c5a881f6e9bd0f8a8c60461051368 /var/log/Xorg.0.log
   9030 5ec0e371a8841a9f9e46cbe0ec128c36 /var/log/Xorg.1.log
 167150 ad1d9c65dd2edd089ba3e8388f68f1f9 /var/log/Xorg.0.log.old
  41438 416bf352c4d00d8912c827bdbc5e9c09 /var/log/Xorg.1.log.old
   6468 06a6770c50edd13140279fec4eacd805 /var/log/Xorg.failsafe.log
   6468 3659d35af858878602b60ed053f32df2 /var/log/Xorg.failsafe.log.old
dav3@dubunt:~$ sed -e 's/^[ ]*[0-9]* //' < temp.md5 > temp.md5.2
madivad@garage:~$ tail temp.md5.2 
4ceaa380370a537da5c0e36f932df537 /var/log/syslog.7.gz
4efaef324ae83db0549e199a3685cdc3 /var/log/wtmp
9a254850fcd98d94eb991f591eff4772 /var/log/udev
8353823cde9dcfd415e2c639e2db4924 /var/log/wtmp.1
1e0c5a881f6e9bd0f8a8c60461051368 /var/log/Xorg.0.log
5ec0e371a8841a9f9e46cbe0ec128c36 /var/log/Xorg.1.log
ad1d9c65dd2edd089ba3e8388f68f1f9 /var/log/Xorg.0.log.old
416bf352c4d00d8912c827bdbc5e9c09 /var/log/Xorg.1.log.old
06a6770c50edd13140279fec4eacd805 /var/log/Xorg.failsafe.log
3659d35af858878602b60ed053f32df2 /var/log/Xorg.failsafe.log.old

sort largest to smallest

we need to sort the file based on two keys. Firstly, the file size then the hash.

sort -n -r -k1,1 -k2,2 < temp.md5 > temp.md5.3

We COULD just sort by hash and then by size, but it’s possible to get the same file hash for two files that are of different size and we are distinctly after a list based on largest files first grouped by hashes (because it likely we’re going to have files with the same file size but they will be different files. Which leads us to…

remove lines that are not duplicated elsewhere

We can now parse this list through unique:

uniq -w 40 -d -c < temp.md5.3 > temp.md5.4

Buffalo buffalo?

It wasn’t until I actually typed this in myself, and read it (and said it out loud to myself) half a dozen times that I finally understood this.

Apparently “buffalo” is a term to bully, I’ve never heard it before, but in that context, buffalos should really stop buffaloing Buffalo buffalos, or they themselves will find themselves the subject of Buffalo buffalos buffaloing Buffalo buffalos… Reminds me of a Thai tongue twister, but that’s subject to another post.

Opal Optimisation coming to an end

But now we’re apparently set to save even more in 2016 and 2017 by cutting loopholes distinctly put in place by the then transport minister urging Sydney-siders to find “ways to beat the system“.

Well, apparently they have with the 8 trips in two days deal, and all remainder trips are free. So for the fact that a few people are getting out there and walking between stations and getting active and healthy, the powers that be are recommending plans to scrap whatever loophole exists and charge up to 3 times more for the Family Sunday Funday… Well not anymore.

What got me was one particular quote from the online edition of the Sydney Morning Herald:

The proposed changes would affect commuters across all modes of public transport but the tribunal said more than 60 per cent would pay less in 2016 to 2017. But by 2018, however, commuters would be paying an average of 8 per cent more per journey

So commuters saved when it was brought in, and we’ll be saving over the next two years as well.

Rail fares and travel in NSW going down over three years?

I think someone is selling porkie pies!

Next in this session: How to get a common sense government

Brackets.io

A modern, open source text editor that understands web design.

http://brackets.io/

With focused visual tools and preprocessor support, Brackets is a modern text editor that makes it easy to design in the browser. Try Creative Cloud Extract (preview) for Brackets for an easy way to get clean, minimal CSS straight from a PSD with no generated code.