周梦康 发表于 2015-04-23 3197 次浏览 标签 : Linuxawk

文本格式类似于:

90203	http://a4.topitme.com/o/201001/21/12640871787244.jpg
90384	http://a4.topitme.com/o/201001/22/12641319578893.jpg
90408	http://a4.topitme.com/o/201001/22/12641331731486.jpg
90413	http://a4.topitme.com/o/201001/22/12641333418124.jpg
90416	http://a4.topitme.com/o/201001/22/12641333819817.jpg
90419	http://a4.topitme.com/o/201001/22/12641333959821.jpg
90424	http://a4.topitme.com/o/201001/22/12641341447975.jpg
90640	http://a4.topitme.com/o/201001/22/12641415382210.jpg
90751	http://a4.topitme.com/o/201001/22/12641455812296.jpg
90763	http://a4.topitme.com/o/201001/22/12641458582048.jpg
90960	http://a4.topitme.com/o/201001/22/12641585925303.jpg
91323	http://a4.topitme.com/o/201001/22/12641698711549.jpg
91325	http://a4.topitme.com/o/201001/22/12641698822679.jpg
91365	http://a4.topitme.com/o/201001/22/12641703727549.jpg
91369	http://a4.topitme.com/o/201001/22/12641704738178.jpg
91377	http://a4.topitme.com/o/201001/22/12641705391280.jpg
91378	http://a4.topitme.com/o/201001/22/12641705426041.jpg
91379	http://a4.topitme.com/o/201001/22/12641705441890.jpg
91387	http://a4.topitme.com/o/201001/22/12641706706190.jpg
91467	http://a4.topitme.com/o/201001/22/12641732566025.jpg
91510	http://a4.topitme.com/o/201001/22/12641738913050.jpg
90203	http://a4.topitme.com/o/201001/21/12640871787244.jpg

第一列是 id,第二列是 url.但是这里面可能一些是重复的.需要去重.

cat pic.list | sort | uniq > pic.list.2

这样也许就算快速搞定了.uniq可以去掉相邻行中的重复的行,如果先sort,让相同的行在一起,再执行uniq就可以完全去重了,我们在统计访问日志的时候经常这样使用.

如果再加一个条件,如果需要保持原有顺序不变的情况下去重,怎么办?没错,又是awk

cat pic.list|awk '!a[$1]++{print $1,$2}'> pic.list.3

评论列表