du
estimate file space usage
Synopsis
du
[OPTION]... [FILE]...
du [OPTION]... --files0-from=F
add an example, a script, a trick and tips
examples
source
How to analyse disk usage in command line linux?
Use some combination of the commands and options:
du --max-depth=1 2> /dev/null | sort -n -r | head -n20
to view only the largest few. If you'd like to use it a lot, then
bind it to an alias, e.g. in bash by adding to ~/.bashrc
alias largest='du --max-depth=1 2> /dev/null | sort -n -r | head -n20'
source
Difference between df -k and du -sh
One or more applications have files open on /export
,
but the filenames themselves no longer exist (i.e. have been
deleted).
source
How do you display each sub-directory size in a list format in the command line using only 1 line command?
Try this
du -h --max-depth=1
Output
oliver@home:/usr$ sudo du -h --max-depth=1
24M ./include
20M ./sbin
228M ./local
4.0K ./src
520M ./lib
8.0K ./games
1.3G ./share
255M ./bin
2.4G .
Alternative
If --max-depth=1
is a bit too long for your taste,
you can also try using:
du -h -s *
This uses -s
(--summarize
) and will
only print the size of the folder itself by default. By passing
all elements in the current working directory (*
),
it produces similar output as --max-depth=1
would:
Output
oliver@cloud:/usr$ sudo du -h -s *
255M bin
8.0K games
24M include
520M lib
0 lib64
228M local
20M sbin
1.3G share
4.0K src
The difference is subtle. The former approach will display the
total size of the current working directory and the total size of
all folders that are contained in it... but only up to a depth of
1.
The latter approach will calculate the total size of all passed
items individually. Thus, it includes the symlink
lib64
in the output, but excludes the hidden items
(whose name start with a dot). It also lacks the total size for
the current working directory, as that was not passed as an
argument.
source
Why does du -sl show different sizes for the source and result of a cp -rl?
Just tried this myself, and I found the discrepancy in size is
from the directory files. Since they are not hardlinked they are
new files that get created, maybe not with the exact same
metadata?
To illustrate this run the following commands:
ls -alR folderA/ | grep -v '^d' | awk '{total += $5} END {print "Total:", total}'
ls -alR folderB/ | grep -v '^d' | awk '{total += $5} END {print "Total:", total}'
These sizes should be identical (dir files not included). You
could print the listings with the directory sizes and diff the
results to find which dirs exactly are different.
source
How does du determine which hard link to disregard?
Extending your test to three folders, you can see that only the
first time the inode is hit does du
count it. If you
go into the individual folder and run du
, you'll get
the full size.
To test:
mkdir alexandru
ln mariano/zero_file.2 alexandru/zero_file.0
du -sh *
You should now see alexandru
taking up the 500K+. So
without looking at the du
code, I'm guessing it
stores a list of traversed inodes and doesn't revisit the ones
already seen.
source
explanation about du command linux diskusage
The former counts visible objects within /usr
. The
latter counts all objects under /usr
, including
/usr
itself.
source
"du -h" with more decimal places
du -Lsbc * | awk '
function hr(bytes) {
hum[1024**4]="TiB";
hum[1024**3]="GiB";
hum[1024**2]="MiB";
hum[1024]="kiB";
for (x = 1024**4; x >= 1024; x /= 1024) {
if (bytes >= x) {
return sprintf("%8.3f %s", bytes/x, hum[x]);
}
}
return sprintf("%4d B", bytes);
}
{
print hr($1) "\t" $2
}
'
awk-function based on this.
One could probably make the output look a bit nicer by piping it
through column
or left-padding it with spaces.
Edit: Added the left-padding.
Also, to sort the list: du -Lsbc * | sort -n | awk
and then the awk-script.
source
why does `du` not show results for all files?
If you do a ls -il
in the directory.
You willl see that a lot of files have the same inodes. And thats
why du -a
is showing info for only the unique inodes
source
Is there a way to force du to report a directory size (recursively) including only sizes of files?
$ ls -goR | awk '{sum += $3} END{print sum}'
16992
Edit. To exclude directories, use grep
$ ls -goR | grep -v ^d | awk '{sum += $3} END{print sum}'
6
source
How do I quickly calculate the size of a directory?
No, there isn't a quick way. You need to go through all
subdirectories.
source
Is it possible to format ps RSS (memory) output to be more human friendly?
It seems like there is no appropriate flag in ps
, so
you need to either use a different tool (I personally prefer
htop
) or mess with ps
output a little.
I guess you want to stick with ps
. Here's a dirty
little script I've made as an example:
# get terminal width
WIDTH=`tput cols`
# pipe stdin to awk
cat | \
awk '\
BEGIN {
# set output format
CONVFMT="%.2f"
}
NR==1 {
# search first line for columns that need to be converted from K to M
for (i=1;i<=NF;i++)
# add condition for new columns if you want
if ($i=="VSZ" || $i=="RSS") {
# column numbers are stored in an array
arr[i]=i;
$i = $i "(MB)"
}
}
NR > 1 {
# edit appropriate columns
for (i in arr)
$i=$i/1024;
}
{
# print every line
print $0
}' | \
# format the output into columns and trim it to terminal width
column -t | cut -c 1-$WIDTH
Save it to a file, say prettyps.sh
, make it
executable:
chmod +x prettyps.sh
and use as follows:
ps ux | /path/to/prettyps.sh
Using this script has the downside of adding extra processes to
ps output, but nevertheless it works:
$ ps ux | ./prettyps.sh
USER PID %CPU %MEM VSZ(MB) RSS(MB) TTY STAT START TIME COMMAND
pono 2658 0.0 0.0 358.88 4.29 ? Sl 02:33 0:00 /usr/bin/gnome-keyring
... output truncated...
pono 4507 0.0 0.0 19.14 1.81 pts/1 S+ 03:29 0:00 man
pono 4518 0.0 0.0 10.55 0.96 pts/1 S+ 03:29 0:00 pager
pono 4727 0.7 0.9 1143.59 53.08 ? Ssl 04:10 0:24 /opt/sublime_text/subl
pono 4742 0.1 0.4 339.05 25.80 ? Sl 04:10 0:03 /opt/sublime_text/plug
pono 5177 0.0 0.0 19.23 1.32 pts/0 R+ 05:05 0:00 ps
pono 5178 0.0 0.0 4.34 0.61 pts/0 S+ 05:05 0:00 /bin/sh
Hope this helps to find a way that suits you.
source
What will happen if my DB server runs out of disk space?
The default storage engine over the years was MyISAM. As of MySQL
5.5, it is now InnoDB.
When it comes to your question, MySQL is a little strange in this
aspect because all temporary tables use the MyISAM storage
engine.
According to MySQL 5.0 Certification Study
Guide,
bulletpoint #11 says the following on Pages
408,409 Section 29.2:
If you run out of disk space while adding rows to a MyISAM
table, no error occurs. The server suspends the operation until
space becomes available, and then completes the operation.
Given this fact, SQL operations, especially those using temporary
tables, do not fail. They simply freeze and wait for disk space
to become available. Such operations would fail only if you
disconnect the DB Connection.
In that event, the solution would be to free up the diskspace.
Then, all moving parts of MySQL that were frozen would thaw out
and start moving again.
Perhaps purging old binary logs, dropping old tables, or
truncating the error log would help in this aspect.
source
Why doesn't Ext4 cache directory size?
A simple cache wouldn't work. A cache is about checking if you
already have the answer and only reprocess if you don't. But in
this case, a single missing entry would make others useless. So
it would have to keep all directory sizes updated
all the time.
Also don't underestimate the possible impact of your proposal.
Back when journalling filesystems were new, there was a lot of
opposition because updating the journal was too expensive. Also
most filesystems allow options like noatime
,
nodiratime
and relatime
that reduce
these kinds of medatata updating. Note that all these (journals
and time updatings) are bound in time, they all take a specific
number of block accesses (and are usually 'hidden' by advanced IO
scheduling), but updating the size of every directory up
the path means an unknown amount of accesses.
Finally, in POSIX filesystems, there's no real 'containing
directory'. A file entry on a directory points to an inode (the
disk structure that holds the file information), but there's no
reference from the inode back to the directory. This allows the
'hard link' feature, where more than one entry (usually in
different directories) points to the same inode. Even if you kept
a list of directories that point to the inode, you're multiplying
the (already big) number of updates. Worse, now you have to keep
track if you've already updated each directory, since at some
point up the chain you'll get a shared ancestor, which shouldn't
count twice the updated. Or should it? maybe you'll have to keep
two sizes on each directory, one that counts all
'real' files, and other that counts each time it appears....
It doesn't seem so useful after all.
source
du which counts number of files/directories rather than size
The easiest way seems to be find /path/to/search -ls | wc
-l
Find is used to walk though all files and folders.
-ls
to list (print) all the names. This is a default
and if you leave it out it will still work the same almost all
systems. (Almost, since some might have different defaults). It
is a good habit to explicitly use this though.
If you just use the find /path/to/search -ls
part it
will print all the files and directories to your screen.
wc
is word count. the -l
option tells
it to count the number of lines.
You can use it in several ways, e.g.
- wc testfile
- cat testfile | wc
The first option lets wc open a file and count the number of
lines, words and chars in that file. The second option does the
same but without filename it reads from stdin.
You can combime commands with a pipe |
. Output from
the first command will be piped to the input of the second
command. Thus find /path/to/search -ls | wc -l
uses
find to list all files and directory and feeds the output to wc.
Wc then counts the number of lines.
(An other alternative would have been `ls | wc', but find is much
more flexible and a good tool to learn.)
[Edit after comment]
It might be useful to combine the find and the exec.
E.g. find / -type d ! \( -path proc -o -path dev -o -path
.snap \) -maxdepth 1 -exec echo starting a find to count to files
in in {} \;
will list all directories in /, bar some which
you do not want to search. We can trigger the previous command on
each of them, yielding a sum of files per folder in /.
However:
- This uses the GNU specific extension -maxdepth.
It will work on Linux, but not on just any unix-a-alike.
- I suspect you might actually want a number fo files for each
and every subdir.
description
Summarize disk
usage of each FILE, recursively for directories.
Mandatory
arguments to long options are mandatory for short options
too.
-a, --all
write counts for all files, not
just directories
--apparent-size
print apparent sizes, rather
than disk usage; although the apparent size is usually
smaller, it may be larger due to holes in
(’sparse’) files, internal fragmentation,
indirect blocks, and the like
-B,
--block-size=SIZE
scale sizes by SIZE before
printing them. E.g., ’-BM’ prints sizes in
units of 1,048,576 bytes. See SIZE format below.
-b,
--bytes
equivalent to
’--apparent-size
--block-size=1’
-c,
--total
produce a grand total
-D,
--dereference-args
dereference only symlinks that
are listed on the command line
--files0-from=F
summarize disk usage of the
NUL-terminated file names specified in file F; If F is
- then read names from standard input
-H
equivalent to
--dereference-args
(-D)
-h,
--human-readable
print sizes in human readable
format (e.g., 1K 234M 2G)
--si
like -h, but use powers of 1000 not
1024
-k
like --block-size=1K
-l,
--count-links
count sizes many times if hard
linked
-L,
--dereference
dereference all symbolic
links
-P,
--no-dereference
don’t follow any symbolic
links (this is the default)
-0,
--null
end each output line with 0
byte rather than newline
-S,
--separate-dirs
do not include size of
subdirectories
-s,
--summarize
display only a total for each
argument
-x,
--one-file-system
skip directories on different
file systems
-X,
--exclude-from=FILE
exclude files that match any
pattern in FILE
--exclude=PATTERN
exclude files that match
PATTERN
-d,
--max-depth=N
print the total for a directory
(or file, with --all) only if it is N or
fewer levels below the command line argument;
--max-depth=0 is the same as
--summarize
--time
show time of the last modification of any file in the
directory, or any of its subdirectories
--time=WORD
show time as WORD instead of
modification time: atime, access, use, ctime or status
--time-style=STYLE
show times using style STYLE:
full-iso, long-iso, iso, +FORMAT FORMAT is
interpreted like ’date’
--help
display this help and exit
--version
output version information and
exit
Display values
are in units of the first available SIZE from
--block-size, and the
DU_BLOCK_SIZE, BLOCK_SIZE and BLOCKSIZE environment
variables. Otherwise, units default to 1024 bytes (or 512 if
POSIXLY_CORRECT is set).
SIZE is an
integer and optional unit (example: 10M is 10*1024*1024).
Units are K, M, G, T, P, E, Z, Y (powers of 1024) or KB, MB,
... (powers of 1000).
copyright
Copyright © 2012 Free Software Foundation, Inc. License GPLv3+:
GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute
it. There is NO WARRANTY, to the extent permitted by law.
patterns
PATTERN is a shell pattern (not a regular expression). The
pattern ? matches any one character, whereas *
matches any string (composed of zero, one or multiple
characters). For example, *.o will match any files whose
names end in .o. Therefore, the command
du --exclude='*.o'
will skip all files and subdirectories ending in .o
(including the file .o itself).
reporting bugs
Report du bugs to bug-coreutils[:at:]gnu[:dot:]org
GNU coreutils home page:
<http://www.gnu.org/software/coreutils/>
General help using GNU software:
<http://www.gnu.org/gethelp/>
Report du translation bugs to
<http://translationproject.org/team/>
see also
The full
documentation for du is maintained as a Texinfo
manual. If the info and du programs are
properly installed at your site, the command
info
coreutils 'du invocation'
should give you
access to the complete manual.
author
Written by
Torbjorn Granlund, David MacKenzie, Paul Eggert, and Jim
Meyering.