mardi 17 octobre 2017

sed keep last occurrence of duplicate characters/text/words in txt file

I have a txt file with contents that may be duplicates. Below is a simplified representation of my txt file. text means a unique character or word or phrase). Note that the separator ---------- may not be present. Also, the whole content of the file consists of unicode Japanese and Chinese characters.

text
text
text
aaaa
text
aaaa
aaaa
bbbb
cccc
dddd
eeee
ffff
gggg
----------
text
text
eeee
ffff
gggg
text
text:cccc
text:dddd
text
text

What I want to achieve is to keep only the line with the last occurrence of the duplicates like so:

text
text
text
text
aaaa
bbbb
text
text
eeee
ffff
gggg
text
text:cccc
text:dddd
text
text

The closest I found online is How to remove only the first occurrence of a line in a file using sed but this requires that you know which matching pattern(s) to delete. The suggested topics provided when writing the title gives Duplicating characters using sed and last occurence of date but they didn't work.

Look forward to receiving some help. I am on a Mac with Sierra. I am writing my executable commands in a script.sh file to execute commands line by line. I'm using sed and gsed as my primary stream editors.

Aucun commentaire:

Enregistrer un commentaire