How to list duplicate lines in a text file, with counts next to each unique line
from:
How to list duplicate lines in a text file, with counts next to each unique line - [spugbrap's random notes geek blog] [del.icio.us (bash)]
At some point, last year (it's been in my 'toblog' file all this time), I needed to analyze the lines in a text file, removing duplicate lines, while counting how many times each duplicated line occurred within the file, and sorting from most common to least common.
For example, using a text file called 'dupetest.txt', containing:
foo bar baz
foo qux corge
spugbrap likes bacon
foo qux corge
spugbrap likes bacon
foo bar baz
oatmeal cookies are good
oatmeal cookies are good
foo bar baz
foo qux corge
foo bar baz
The output I want is:
4 foo bar baz
3 foo qux corge
2 spugbrap likes bacon
2 oatmeal cookies are good
I knew there had to be a simple way of doing this by just stringing together a few unix commands (in cygwin), but finding the right combination of commands took me some effort. Here's what I came up with:
sort dupetest.txt | uniq -c -d | sort -n -r
- admin's blog
- Login to post comments

