rigadicomando.org

Whatever you can cat

Secondary links

  • About
  • Contacts
  • Disclaimer

Home Blogs admin's blog

How to list duplicate lines in a text file, with counts next to each unique line

Submitted by admin on Mon, 2006-05-08 07:40.
  • bash
  • scripts

from:

How to list duplicate lines in a text file, with counts next to each unique line - [spugbrap's random notes geek blog] [del.icio.us (bash)]

At some point, last year (it's been in my 'toblog' file all this time), I needed to analyze the lines in a text file, removing duplicate lines, while counting how many times each duplicated line occurred within the file, and sorting from most common to least common.

For example, using a text file called 'dupetest.txt', containing:
foo bar baz
foo qux corge
spugbrap likes bacon
foo qux corge
spugbrap likes bacon
foo bar baz
oatmeal cookies are good
oatmeal cookies are good
foo bar baz
foo qux corge
foo bar baz

The output I want is:
4 foo bar baz
3 foo qux corge
2 spugbrap likes bacon
2 oatmeal cookies are good

I knew there had to be a simple way of doing this by just stringing together a few unix commands (in cygwin), but finding the right combination of commands took me some effort. Here's what I came up with:


sort dupetest.txt | uniq -c -d | sort -n -r

  • admin's blog
  • Login to post comments

Navigation

  • Feedback
  • News aggregator

ICT users' rights

  • New Documentary Film "Patent Absurdity: how software patents broke the system"
  • Time for nonprofits to leave proprietary fundraising software systems behind
  • Breaking the dependency on proprietary software: A call to nonprofits to refuse Microsoft Windows 7
  • Why is free software important to you? Submit your response to our new video campaign!
  • FSF works with PayPal to the benefit of the free software community
more

High Scalability Architecture

  • The cost of High Availability (HA) with Oracle
  • Strategy: Order Two Mediums Instead of Two Smalls and the EC2 Buffet
  • Hot Scalability Links for April 16, 2010
  • Parallel Information Retrieval and Other Search Engine Goodness
  • Strategy: Saving Your Butt With Deferred Deletes
more

Debian Security

  • DSA-2038 pidgin
  • DSA-2037 kdm (kdebase)
  • DSA-2036 jasper
  • DSA-2035 apache2
  • DSA-2034 phpmyadmin
more

Drupal Security

  • SA-CORE-2010-001 - Drupal core - Multiple vulnerabilities
more

EFF

  • Congress Must Investigate Electronic Searches at U.S. Borders
  • Betrayed MSN Music Customers Deserve More from Microsoft
  • EFF Report: FBI Slowed Terror Investigation with Improper NSL Request
  • State Secrets Claim Should Not Bury Important Surveillance Lawsuit
  • Courtroom Showdown for eBay Seller Over Promo CD Sales
more

Invent Geek

  • The Meeting Light Project
  • the ion cooler 2.0
  • the ultimate dance pad v1.0
  • The Thermaltake MiniFridge Case Mod
  • Inventgeek gets a facelift and a butt tuck
more

 Privacy | Disclaimer | Drupal | Creative Commons

All content on this site is ditributed under Creative Commons License, each individual author is responsible for its own posts.

RoopleTheme