Static Survey of File System Statistics

Download and Use Instructions

The tool can be downloaded here.

Below is a brief description of various useful options while running the tool.
For a complete documentation and usage guide run:

$ perldoc <path-to-fsstats> or check the man page.

If you have the tool in the current directory, to get help on various argument options run:

$ ./fsstats –h

The most basic command line usage is as shown below:

To run fsstats in the current directory.

$ ./fsstats

If you want fsstats to run on a target directory and gather stats on all the files and directories below it, use:

$ ./fsstats <targetdirpath>

Some useful commands line options are as follows:

To run the tool with intermediate checkpoints use the –i option. This option specifies the interval (in minutes) between periodic checkpoints. The default is 10 minutes if –i is not specified. Use –c to specify a checkpoint file name. By default the checkpoint files are created in ‘/tmp’ and given program generated names.

$ ./fsstats –i <checkpointinterval> –c <checkpointfilename> <targetdirname>

To restart from a checkpoint file writing output in a CSV format to ‘/tmp/output.csv’ use the –r and the –o options The default when -o is not specified is to print the csv as well as a human readable output of the stats to stdout.

$ ./fsstats –r /tmp/ckpt.1 –o /tmp/output.csv

All fsstats runs will automatically count and report detailed error messages for any errors that are encountered during its run. The result generated will not include data from files/ directories that generated errors on stat.

Note: Though detailed error information is printed for you to view, they are not made part of the results that you may upload to us. This is to preserve the privacy of your data. The result uploaded contains only anonymous, statistical information of your file system. See this section for more details.

You will require at least execute permission on all subdirs that lead to a file in the file tree on which you wish to run fsstats. See the section on choosing an appropriate file tree for more information

More information on how to interpret the results

Note: You will need 64 bit integer support compiled into your Perl library if you want to run this on a file system with large files (> 4GB). If you are not sure what this means for you, please see the section below.

64 bit Integer Support

When you run fsstats for the first time, it will query the Perl library configuration options to determine if the library was compiled with support for large files and 64 bit integers.

If it finds that the library does not have 64 bit integer support then it prints a warning. If your file system has files larger than 4GB, then you will need this support built into your library.

Follow these instructions or ask you IT to do it for you if you want to get 64 bit integer support built in your library. You can do this without interfering with you currently installed Perl library. We are providing instructions below that will allow you to download and build a Perl library with 64 bit support that can be installed in a temp location in your home directory. You can use this library then to run fsstats. Once you are done running fsstats, you can simply delete the new perl modules from your home directory.

Perl library download and install instruction:

  1. Download the latest Perl library from http://www.cpan.org/src/README.html.
  2. Untar the downloaded Perl library.
  3. Configure perl to build with 64bitint support and install in a directory of you choice. For e.g.
    $ ./configure –Duse64bitint –Dprefix=/home/bob/perl will build perl, and configure /home/bob/perl as the destination install directory.
  4. The prefix path is the destination where the new library will be installed when you do a ‘make install’.
  5. Make sure you choose a destination directory for install that is not underneath the directory where you have your Perl sources.
  6. Run ‘make’
  7. Run ‘make install’
  8. Run fsstats with
    $ /home/bob/perl/bin/perl FSSTATS_TOOL_LOCATION/fsstats ...

Sample Results From fsstats Runs

Multiple samples showing interesting special cases:

Interpreting the Results

The resulting output is a series of file attribute histograms:

  1. File size histogram: This presents a size based histogram of all the files found on the target file-tree excluding hard links, special files (blocks special, FIFO’s etc) and symbolic links. The size calculation is based on the number of bytes stored in the file. Hard links are counted just once.
  2. Capacity Used histogram: This is a histogram that bins data based on the actual space that a file occupies on disk and not the number of bytes stored in it.
  3. Bin selection is based on the following calculation:
    Block size * number of blocks allocated to a file.
    The files in consideration are again the files used for ‘file size’ histogram.
    Note: The size of a file in terms of bytes reported by the file system is not the same as the capacity used by the file. A file may have many more blocks allocated to it than the count of user-stored bytes in it. This can happen because of fragmentation and partially filled blocks or pre-allocation by the file system etc.
  4. Directory size histogram: This section of the output presents two histograms: one that represents entries in a directory, a second that represents number of bytes occupied by a directory on disk.
  5. Filename length: This section presents a histogram of filename length (in characters). The histogram is created using every file found in the target file tree including, hard links, soft links, regular files, special files, broken links etc.
  6. Link Count: This is a histogram of the number of hard links to a file. The count fields counts each unique file once (ignoring its hard links). The histogram records the number of hard links to each unique file.
  7. Symlink target length: This histogram records the symbolic links found in the target file tree.
    Note: fsstats does follow symbolic links when traversing a file tree, it only records its information and then moves on to the next file/directory in the tree.

Choosing an Appropriate File Tree for Running fsstats

You can run fsstats on any file tree that is accessible to you. You can choose to run fsstats on a file tree that contains:

  1. Your files: This will be a file tree in one or more file systems that stores all or a fraction of files that are used by you/ your applications, e.g., running fsstats on ‘/home/user-name’. You will normally have sufficient permission on these files so that running fsstats will not generate errors such as ‘permission denied’. This implies execute (search) permission on all of the directories in path that lead to the file.
  2. A complete file system: This would mean running fsstats on a file tree that contains files from users other than you. An example would be ‘/root’ on your desktop machine. If you do not have sufficient permission on those files (execute permission on all subdirs at least) then fsstats will generate some permission denied error. These errors are reported with complete pathname information on the directories on which permission was denied. A count of all errors generated, is also reported.

Note: Though detailed error information is printed for you to view, they are not made part of the results that you may upload to us. See the section on More Information on the Tool, and Its Output for additional details.

Upload Results

Use the tool as you wish. However, we want to build a public database of the statistics of interesting file systems. If you would like to help, please upload your results. When you have completed the questionnaire, you will be directed to a page from which you can browse for and upload your results. Thanks for your input!

Contact Information

Garth Gibson, CMU
Marc Unangst, Panasas Inc. 
Shobhit Dayal, CMU 

 


Last updated 2008-06-25 | ©2008 Carnegie Mellon University