hashdeep (debian/4.1-1) - Codebase list

Tree @debian/4.1-1 (Download .tar.gz)

This is md5deep, a set of cross-platform tools to computer hashes, or
message digests, for any number of files while optionally recursively
digging through the directory structure.  It can also take a list of known
hashes and display the filenames of input files whose hashes either do or
do not match any of the known hashes. This version supports MD5, SHA-1,
SHA-256, Tiger, and Whirlpool hashes. 

See the file NEWS for a list of changes between releases.

See the file COPYING for information about the licensing for this program. 

See the file INSTALL for (generic) compilation and installation
instructions. Here's the short version:

$ ./configure
$ make
$ make install

Note that you must be normally root to install to the default location.
The sudo command is helpful for doing so. You can specify an alternate
installation location using the --prefix option to the configure script.
For example, to install to /home/foo/bin, use:

$ ./configure --prefix=/home/foo

There is complete documentation on how to use the program on the 
project's homepage, http://md5deep.sourceforge.net/ 

== md5deep vs. hashdeep ==

For historical reasons, the program has different options and features
when run with the names "hashdeep" and "md5deep."

hashdeep has a feature called "audit" which:
 * Can also use a list of known hashes to audit a set of FILES. Errors
   are reported to standard error. If no FILES are specified, reads from
   standard input.

 -a Audit mode. Each input file is compared against the set of knowns. An
    audit is said to pass if each input file is matched against exactly
    one file in set of knowns. Any collisions, new files, or missing files
    will make the audit fail. Using this flag alone produces a message,
    either "Audit passed" or "Audit Failed". 

    -v - prints the number of files in each category
    -v -v = prints all discrepancies
    -v -v -v = prints the results for every file examined and every known file. 

 -k <file> - The -k option must be used to load the audit file

To perform an audit:
  hashdeep -r dir  > /tmp/auditfile            # Generate the audit file
  hashdeep -a k /tmp/auditfile -r dir          # test the audit

Notice that the audit is performed with a standard hashdeep output
file. (Internally, the audit is computed as part of the hashing process.)

== Unicode Issues ==
POSIX-based modern computer systems consider filenames to be a
sequence of bytes that are rendered as the application wishes. This
means that filenames typically contain ASCII but can contain UTF-8,
UTF-16, latin1, or even invalid Unicode codings.

Windows-based systems have one set of API calls for ASCII-based
filenames and another set for filenames encoded as UCS-2, which
"produces a fixed-length format by simply using the code point as the
16-bit code unit and produces exactly the same result as UTF-16 for
63,488 code points in the range 0-0xFFFF" according to wikipedia.
(http://en.wikipedia.org/wiki/UTF-16/UCS-2) But wikipedia disputes the
factual accuracy of this statement on the talk page. it's pretty clear
that nobody is entirely sure that Windows actually does, and Windows
itself may not be consistent.

Version 3 of this program addressed this issue by using the TCHAR
variable to hold filenames on Windowa dn by refusing to print them,
priting a "?" instead. Version 4 of this program translates TCHAR
strings to std::string strings at the soonest opportunity using the
Windows function WideCharToMultiByte
(http://msdn.microsoft.com/en-us/library/dd374130%28v=vs.85%29.aspx). Flags
have been added escape Unicode when it is printed.

There is no way (apparently) on Windows to open a UTF-8 filename; it needs to be 
converted back to a multi-byte filename with MultiByteToWideChar.

Fortunately, we never really need to convert back.

Notice that on Windows the files hashed can have unicode characters
but the file with the hashes must have an ASCII name.


COMPILING FOR WINDOWS:
	  -D_UNICODE causes TCHAR to be defined as 'wchar_t'.

COMPILING FOR POSIX:
	  -D_UNICODE is not defined, causing TCHAR to be defined as 'char'.

Previously, win32 functions were controlled with #ifdef statements, like this:

#ifdef _WIN32
  _wfullpath(d_name,fn,PATH_MAX);
#else
  if (NULL == realpath(fn,d_name))
    return TRUE;
#endif

There was also a file called tchar-local.h which actually changed the semantics
of functions on different platforms, with things like this:

   #define  _tcsncpy   strncpy
   #define  _tstat_t   struct stat

This made the code very difficult to maintain.

With the 4.0 rewrite, we have changed this code with C++ functions that return
objects were possible and avoid the use of #defines that so that on _WIN32 systems
the function realpath() gets defined prior to its use, and the mainline code
lacks the realpath() function. You can see this in cycles.cpp:

/* Return the canonicalized absolute pathname in UTF-8 on Windows and POSIX systems */
std::string get_realpath(const TCHAR *fn)
{
#ifdef _WIN32    
    /*
     * expand a relative path to the full path.
     * http://msdn.microsoft.com/en-us/library/506720ff(v=vs.80).aspx
     */
    TCHAR absPath[PATH_MAX];
    if(_fullpath(absPath,fn,PAT_HMAX)==0) return "";
    return tchar_to_utf8(absPath);
#else
    char resolved_name[PATH_MAX];	//
    if(realpath(fn,resolved_name)==0) return "";
    return string(resolved_name);
#endif
}


You can install mingw and then simply configure with something like this:
$ export PATH=$PATH:/usr/local/i386-mingw32-4.3.0/bin
$ ./configure --host=i386-mingw32


== Hash Algorithm References ==

The MD5 algorithm is defined in RFC 1321:
http://www.ietf.org/rfc/rfc1321.txt

The SHA1 algorithm is defined in FIPS 180-1:
http://www.itl.nist.gov/fipspubs/fip180-1.htm

The SHA256 algorithm is defined FIPS 180-2:
http://csrc.nist.gov/publications/fips/fips180-2/fips180-2.pdf

The Tiger algorithm is defined at:
http://www.cs.technion.ac.il/~biham/Reports/Tiger/

The Whirlpool algorithm is defined at:
http://planeta.terra.com.br/informatica/paulobarreto/WhirlpoolPage.html

================================================================
Theory of operation.

main.cpp    - sets up the system.
dig.cpp     - iterates through the individual directories.
 	    - calls hash_file() in hash.cpp for each file to hash
hash.cpp    - performs the hashing of each file.
display.cpp - stores/displays the results.
Commit History @debian/4.1-1

»»