vendredi 22 mai 2015

Perl: Keep only one of two consecutive characters

I'm having trouble applying a regex to keep only one of two specific consecutive characters in a column. I have the following file in which C-O appears for number 1 and number 2, as indicated. I would like to write a new file in which only C-O in number 1 is present. This functionality needs to be repeated throughout the file, for example between number 2 and 3 (keep number 2), and number 3 and 4 (keep number 3) etc .

Input:
    1       H       27.5310
    1       H       27.0882
    1       C       36.8857
    1       O       -118.2564
    2       C       36.6954
    2       O       -118.5597
    2       N       133.6704
    2       H       28.3581

Output:
    1       H       27.5310
    1       H       27.0882
    1       C       36.8857
    1       O       -118.2564
    2       N       133.6704
    2       H       28.3581

This is what I have so far, hope my logic is semi-clear. I'm still learning and any commentary is greatly appreciated!

#!/usr/bin/perl

use strict;
use warnings;

my $file = 'data.txt';

open my $fh, '<', $file or die "Can't read $file: $!";

while (my $line = <fh>) {
    chomp;
    my @column = split(/\t/,$line);
    if ($column[1] =~ s/COCO/\s+/g) {
    print "@columns\n";
   }
 }

Aucun commentaire:

Enregistrer un commentaire