In this example I’m using hashdeep. I’m redirecting the output of two hash sets to two different files. I am doing that with the following command:
hashdeep -rj0 /path-to-drive-1 > hashes.drive1
hashdeep -rj0 /path-to-drive-2 > hashes.drive2
I have those running in their own terminal windows. I then optionally have another two windows open running a tail on them so I can monitor the files:
tail -f hashes.drive1
The hard drives are located in an external multi-bay enclosure and all hard drive LEDs are flashing away like mad. A good sign. But every now and then I’ll run an ‘ls’ to see where the files are at (checking file size) or alternatively (and usually better but more resource intensive) a line count of the hash files. Given I know how many files there should be, the line count gives a fair indication of the progress of the whole process.
wc -l hashes.drive*
In today’s example I was simply doing a file size comparison of the two hashes vs a known hashset of one of the drives that was a month old. The sizes should be relatively similar. I was getting results similar to:
madivad@server:~$ ls -al hashes* -rw-rw-r-- 1 madivad madivad 330483319 Feb 11 09:26 hash.drive1.1602 -rw-rw-r-- 1 madivad madivad 341570757 Mar 23 12:09 hash.drive1.1603 -rw-rw-r-- 1 madivad madivad 243344728 Mar 23 11:18 hash.drive2.1603
The fact that drive1.1603 is larger is of no consequence, there are just more files to consider.
After running the above check for sometime, I realised that one of the files (in this case
drive1.1603) had stalled for several hours. I’m not exactly sure when it seemed to stop growing, but doing a tail of the file confirmed it was stopped. The last output was an inconsequential .DS_Store file roughly 6K in size. After physically monitoring it for some time I began to get concerned about this. I could see the all 4 RAID drives getting activity, but nothing was being recorded. The 5th drive, the backup, was hashing away without a problem and the log file was growing as expected.
After some quick research I came across this stack exchange Q&A ( How do I know which file a program is trying to access? )
The first answer provided a solution that worked best with my scenario:
lsof -c hashdeep
I’d never seen this output before but very quickly I could see the important pieces of information it had dumped out. Namely:
madivad@server:~$ lsof -c hashdeep COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME hashdeep 2539 madivad 1w REG 252,0 243344728 5535319 /home/madivad/hash.drive1.1603 hashdeep 2539 madivad 3r REG 259,0 499418030080 113639426 /path1/largeFiles/a-very-big-image-of-500GB.img hashdeep 2552 madivad 1w REG 252,0 341611062 5535320 /home/madivad/hash.drive2.1603 hashdeep 2552 madivad 3r REG 8,33 3152347139 126025746 /path2/misc/random.file
The ‘w’ of FD with ‘1w’ signifies the file is being written and that the file being written was
The ‘r’ of FD with ‘3r’ signifies the file is being read for hashing purposes, and that file is a very large file that I know is around 500GB. Running the command again shows me the second file being read in had changed, yet the first had stayed the same.
Given the file is very large and will take considerable time to hash and that the hard drive LEDs are flashing, I realised all was good in the world and I could move on with the days activities.
UPDATE: after reading the man page on
lsof I found a better way to monitor the continual progress of it was to run it with the
-r “repeat” switch which defaults to 15 seconds, which could be updated more or less frequently by adding a numerical component:
lsof -r 5 -c hashdeep
- File > Add to Sidebar
- Keyboard shortcuts
Sometimes the File menu option is not available, but the keyboard shortcut CTRL + CMD + T is usually right to go.
You could try dragging a subfolder into place and then ESCAPEing out of it. This will usually open up the past the re-add the folder you couldn’t add a moment ago.
How to setup BASH custom prompt in Ubuntu
I wanted two things:
- the time in my prompt
- a colour prompt
- it’s located in: ~/.bashrc
- uses Environment Variable: PS1
- time is inserted using: \t
I started here: http://www.cyberciti.biz/tips/howto-linux-unix-bash-shell-setup-prompt.html
PixelBeat discussion on coloured command prompts
how to guid for customing the command prompt
2n+r where ‘r’ is the number of redundant drives and 2n+r are the total number of drives in the vdev.
I did this in several steps:
- create an md5 dump of the files system in question. For this I used my favourite command line tool `md5deep`,
- sort the file based on file size then md5 hash (sorted from largest to smallest), and
- remove entries that are not duplicate entries.
Although this could be completed in one step, I have broken it into several steps for my own clarity.
generate hashes using md5deep
md5deep -rze ~/documents/ > dump.md5.1 -r : recurse subdirectories -z : includes file size -e : displays progress on each file
recreate the hashes without filesize (optional)
I wanted the hashes in two forms, one with file sizes, and one without. I first created the hash set with file sizes and then parsed that to remove it from the front of the line. This is a lot quicker than hashing all the files twice.
sed to the rescue
sed -e 's/^[ ]*[0-9]* //' dump.md5.1 > dump.md5.2
simply enough, I know the line commences with any number of spaces (including zero spaces), then a number that is the file size. This can only be a set of digits, and finally two spaces (unseen between the final asterisk and double slash).
dav3@dubunt:~$ tail temp.md5 2111 4ceaa380370a537da5c0e36f932df537 /var/log/syslog.7.gz 86400 4efaef324ae83db0549e199a3685cdc3 /var/log/wtmp 346938 9a254850fcd98d94eb991f591eff4772 /var/log/udev 109440 8353823cde9dcfd415e2c639e2db4924 /var/log/wtmp.1 56641 1e0c5a881f6e9bd0f8a8c60461051368 /var/log/Xorg.0.log 9030 5ec0e371a8841a9f9e46cbe0ec128c36 /var/log/Xorg.1.log 167150 ad1d9c65dd2edd089ba3e8388f68f1f9 /var/log/Xorg.0.log.old 41438 416bf352c4d00d8912c827bdbc5e9c09 /var/log/Xorg.1.log.old 6468 06a6770c50edd13140279fec4eacd805 /var/log/Xorg.failsafe.log 6468 3659d35af858878602b60ed053f32df2 /var/log/Xorg.failsafe.log.old dav3@dubunt:~$ sed -e 's/^[ ]*[0-9]* //' < temp.md5 > temp.md5.2 madivad@garage:~$ tail temp.md5.2 4ceaa380370a537da5c0e36f932df537 /var/log/syslog.7.gz 4efaef324ae83db0549e199a3685cdc3 /var/log/wtmp 9a254850fcd98d94eb991f591eff4772 /var/log/udev 8353823cde9dcfd415e2c639e2db4924 /var/log/wtmp.1 1e0c5a881f6e9bd0f8a8c60461051368 /var/log/Xorg.0.log 5ec0e371a8841a9f9e46cbe0ec128c36 /var/log/Xorg.1.log ad1d9c65dd2edd089ba3e8388f68f1f9 /var/log/Xorg.0.log.old 416bf352c4d00d8912c827bdbc5e9c09 /var/log/Xorg.1.log.old 06a6770c50edd13140279fec4eacd805 /var/log/Xorg.failsafe.log 3659d35af858878602b60ed053f32df2 /var/log/Xorg.failsafe.log.old
sort largest to smallest
we need to sort the file based on two keys. Firstly, the file size then the hash.
sort -n -r -k1,1 -k2,2 < temp.md5 > temp.md5.3
We COULD just sort by hash and then by size, but it’s possible to get the same file hash for two files that are of different size and we are distinctly after a list based on largest files first grouped by hashes (because it likely we’re going to have files with the same file size but they will be different files. Which leads us to…
remove lines that are not duplicated elsewhere
We can now parse this list through unique:
uniq -w 40 -d -c < temp.md5.3 > temp.md5.4
Whilst removing a naked USB key from an internal USB port on this server with a pair of long nosed pliers, the pliers slipped and smashed a resistor off the key. That’s no great loss. I can always put it back on.
If I can find it!
The problem is, I can’t find it.
It wasn’t until I actually typed this in myself, and read it (and said it out loud to myself) half a dozen times that I finally understood this.
Apparently “buffalo” is a term to bully, I’ve never heard it before, but in that context, buffalos should really stop buffaloing Buffalo buffalos, or they themselves will find themselves the subject of Buffalo buffalos buffaloing Buffalo buffalos… Reminds me of a Thai tongue twister, but that’s subject to another post.
But now we’re apparently set to save even more in 2016 and 2017 by cutting loopholes distinctly put in place by the then transport minister urging Sydney-siders to find “ways to beat the system“.
Well, apparently they have with the 8 trips in two days deal, and all remainder trips are free. So for the fact that a few people are getting out there and walking between stations and getting active and healthy, the powers that be are recommending plans to scrap whatever loophole exists and charge up to 3 times more for the Family Sunday Funday… Well not anymore.
What got me was one particular quote from the online edition of the Sydney Morning Herald:
The proposed changes would affect commuters across all modes of public transport but the tribunal said more than 60 per cent would pay less in 2016 to 2017. But by 2018, however, commuters would be paying an average of 8 per cent more per journey
So commuters saved when it was brought in, and we’ll be saving over the next two years as well.
Rail fares and travel in NSW going down over three years?
I think someone is selling porkie pies!
Next in this session: How to get a common sense government