文本格式类似于:
90203 http://a4.topitme.com/o/201001/21/12640871787244.jpg 90384 http://a4.topitme.com/o/201001/22/12641319578893.jpg 90408 http://a4.topitme.com/o/201001/22/12641331731486.jpg 90413 http://a4.topitme.com/o/201001/22/12641333418124.jpg 90416 http://a4.topitme.com/o/201001/22/12641333819817.jpg 90419 http://a4.topitme.com/o/201001/22/12641333959821.jpg 90424 http://a4.topitme.com/o/201001/22/12641341447975.jpg 90640 http://a4.topitme.com/o/201001/22/12641415382210.jpg 90751 http://a4.topitme.com/o/201001/22/12641455812296.jpg 90763 http://a4.topitme.com/o/201001/22/12641458582048.jpg 90960 http://a4.topitme.com/o/201001/22/12641585925303.jpg 91323 http://a4.topitme.com/o/201001/22/12641698711549.jpg 91325 http://a4.topitme.com/o/201001/22/12641698822679.jpg 91365 http://a4.topitme.com/o/201001/22/12641703727549.jpg 91369 http://a4.topitme.com/o/201001/22/12641704738178.jpg 91377 http://a4.topitme.com/o/201001/22/12641705391280.jpg 91378 http://a4.topitme.com/o/201001/22/12641705426041.jpg 91379 http://a4.topitme.com/o/201001/22/12641705441890.jpg 91387 http://a4.topitme.com/o/201001/22/12641706706190.jpg 91467 http://a4.topitme.com/o/201001/22/12641732566025.jpg 91510 http://a4.topitme.com/o/201001/22/12641738913050.jpg 90203 http://a4.topitme.com/o/201001/21/12640871787244.jpg
第一列是 id,第二列是 url.但是这里面可能一些是重复的.需要去重.
cat pic.list | sort | uniq > pic.list.2
这样也许就算快速搞定了.uniq
可以去掉相邻行中的重复的行,如果先sort
,让相同的行在一起,再执行uniq
就可以完全去重了,我们在统计访问日志的时候经常这样使用.
如果再加一个条件,如果需要保持原有顺序不变的情况下去重,怎么办?没错,又是awk
cat pic.list|awk '!a[$1]++{print $1,$2}'> pic.list.3