On a Linux machine I would like to traverse a folder hierarchy and get a list of all of the distinct file extensions within it.
What would be the best way to achieve this from a shell?
-
Try this (not sure if it's the best way, but it works):
find . -type f | perl -ne 'print $1 if m/\.([^.\/]+)$/' | sort -uIt work as following:
- Find all files from current folder
- Prints extension of files if any
- Make a unique sorted list
-
Reursive version:
find . -type f | sed -e 's/.*\.//' | sed -e 's/.*\///' | sort -uIf you want totals (how may times the extension was seen):
find . -type f | sed -e 's/.*\.//' | sed -e 's/.*\///' | sort | uniq -c | sort -rnNon-recursive (single folder):
for f in *.*; do printf "%s\n" "${f##*.}"; done | sort -uI've based this upon this forum post, credit should go there.
: you are going to execute bash for each file name you find??ChristopheD : Good point, changed the solution... -
Find everythin with a dot and show only the suffix.
find . -type f -name "*.*" | awk -F. '{print $NF}' | sort -uif you know all suffix have 3 characters then
find . -type f -name "*.???" | awk -F. '{print $NF}' | sort -uor with sed shows all suffixes with one to four characters. Change {1,4} to the range of characters you are expecting in the suffix.
find . -type f | sed -n 's/.*\.\(.\{1,4\}\)$/\1/p'| sort -uSiegeX : No need for the pipe to 'sort', awk can do it all: find . -type f -name "*.*" | awk -F. '!a[$NF]++{print $NF}': And it's output is also uniq! Nice! -
Since there's already another solution which uses Perl:
If you have Python installed you could also do (from the shell):
python -c "import os;e=set();[[e.add(os.path.splitext(f)[-1]) for f in fn]for _,_,fn in os.walk('/home')];print '\n'.join(e)" -
None of the replies so far deal with filenames with newlines properly (except for ChristopheD's, which just came in as I was typing this). The following is not a shell one-liner, but works, and is reasonably fast.
import os, sys def names(roots): for root in roots: for a, b, basenames in os.walk(root): for basename in basenames: yield basename sufs = set(os.path.splitext(x)[1] for x in names(sys.argv[1:])) for suf in sufs: if suf: print suf -
Powershell: dir -recurse | select-object extension -unique
thanks to http://kevin-berridge.blogspot.com/2007/11/windows-powershell.html
GloryFish : Hey that's pretty cool, and very readable.Kevin Berridge : Thanks for linking to my blog!
0 comments:
Post a Comment