Computer Question – awk, sed, grep or anything. – Chat Forum

jcromton

Posts: 0

Free Member

Topic starter

Hi, it's a very simple problem I'd like to solve, I know this is probably not the best place to ask, but I'm not ready yet to sign up to a new forum.

I have a list of numbers:
1
1
...
1
2
2
...
2
...
n
n
...
n

and I would like to replace the first occurrence of n with 1, the last occurrence of n with 3 and all n in between with 2, like this:

1
2
2
...
2
3
1
2
2
...
2
3
...
1
2
2
...
3

n is around 300 and the list is 16000 so it's not massive but may well become more massive in the future.

I suspect'awk' would be able to perform this task? This would be my preferred method before I try to use matlab.

Thanks for reading, and thanks in advance of any help or suggestions.

Chris

Posted : 07/04/2011 11:51 am

allthepies

Posts: 0

Free Member

perl FTW

Posted : 07/04/2011 12:15 pm

jimmyjames

Posts: 0

Free Member

This would be trivial in Excel. Can advise if you have that.

Posted : 07/04/2011 12:19 pm

geoffj

Posts: 0

Full Member

Choose your text wrangling tool of choice or use Excel.
In excel load the numbers in column a with an index/key in column b. A couple of simple forumlae will have you on your way.

Posted : 07/04/2011 12:20 pm

brassneck

Posts: 0

Full Member

awk is great for search and replace but you're going to need a tasty regex to sort that out, well beyond me I'm afraid.

Does it have to be done in batch? Wondering if dumping it to Excel might be easier.

Posted : 07/04/2011 12:22 pm

TheBrick

Posts: 4954

Free Member

my first attempt, can't remember how to edit a specified line in sed but you can see what I'm getting at. NOT TESTED!!!

[code]

file=infile.txt

firstoccurence=wc -l $file
lastoccurence=0

#grep gives us the line numbers where 'n' occurs in the format $file:x:how-n-occurs so we strip out the line number and find the first and last occurrence. Where x is the line number.

for i in grep -in 'n' $file
do
#remove front bit
tmp=${i#$file:}
#remove back bit
tmp=${tmp%:*}

if [ $tmp -lt $firstoccurence ]
then
firstoccurence=${tmp%:}
fi

if [ $tmp -gt $lastoccurence ]
then
lastoccurence=${tmp%:}
fi

done

#replace first occurrence with 1
some sed or awk command for line $firstoccurence

#replace last occurrence with 3
some sed or awk commad for line $lastoccurence

#replace all middle occurrence with 2
sed -i 's/n/2/g' $file

exit 0

[/code]

Posted : 07/04/2011 12:49 pm

TheBrick

Posts: 4954

Free Member

worked out the missing sed commands in above post.

[code]

#replacing first occurrence

sed $firstoccurence's/.*/1/' $file>tmp.txt;mv tmp.txt $file

[/code]

and

[code]

#replacing last occurrence

sed $lastoccurence's/.*/3/' $file>tmp.txt;mv tmp.txt $file

[/code]

so stitch it together and check.

p.s.

there should be back quotes around the wc -l $file command above and the grep command in the for loop but they are not showing for some reason.
e.g
firstoccurence=[backquote]wc -l $file[backquote]

Posted : 07/04/2011 1:30 pm

TheBrick

Posts: 4954

Free Member

just remembered you will have to add in a course of action for when there is only one occurrence of n in the file and hence.

$firstoccurence -eq $lastoccurence

which can be a simple find replace with sed of the value you wish n to be.

Posted : 07/04/2011 2:02 pm

jcromton

Posts: 0

Free Member

Topic starter

Thanks for the replies, I've never really used Excel and learning this way of doing it would be more beneficial in the long run I feel.

Wow, thanks TheBrick, that's some impressive script, can you confirm the first for loop as being:

[code] for i in grep '-in 'n' $file' [/code]

I'm not really following how we're defining n.

Thanks again for taking the time.

EDIT: p.s. it's a lot more complicated than I thought doing it this way!

Posted : 07/04/2011 3:16 pm

Aidy

Posts: 2965

Free Member

So if you've got 1 1 1 2 2 2 2 3 3 3 3 3, you should get 1 2 3 1 2 2 3 1 2 2 2 3?

What happens if there's only 1 or 2 occurrences of the number?

Posted : 07/04/2011 3:37 pm

tonyd

Posts: 1070

Full Member

I have to run but gave it a quick try, this should work:

#!/bin/sh
infile=/tmp/numbers.in
outfile=/tmp/numbers.out
first=grep -in n $infile | head -1 | cut -d: -f1
last=grep -in n $infile | tail -1 | cut -d: -f1
# replace first occurence, then last,then inbetweeners
sed -e "${first}s/n/1/" -e "${last}s/n/3/" -e 's/n/2/g' $infile > $outfile

Give it a go and let me know?!

Posted : 07/04/2011 3:39 pm

allthepies

Posts: 0

Free Member

scary shell skilz here 🙂

Posted : 07/04/2011 3:40 pm

TheBrick

Posts: 4954

Free Member

sorry I've misread your 1st post slightly but we can fix that.

can you confirm the first for loop as being:

for i in grep '-in 'n' $file'

nearly it's

for i in [backquote]grep -in 'n' $file[backquote]

where [backquote] is in the table on this page called "Command substitution" http://www.grymoire.com/Unix/Quote.html or here as grave accent http://en.wikipedia.org/wiki/Grave_accent.

I originally thought you had a file with numbers and the letter "n" which required the 1st instance of n to be replaced by 1 and the last by 3 and all others replaced by 2. I was thinking your n was some version of NaN for some reason. So my script is useless.

Let me think about and I'll get back to you.

Posted : 07/04/2011 4:01 pm

TheBrick

Posts: 4954

Free Member

re reading your first post I'm unsure of what you are trying to do exactly. You're example is not clear to me.

1
1

goes to

1
2
2

?

Posted : 07/04/2011 4:06 pm

jcromton

Posts: 0

Free Member

Topic starter

Aidy hit the nail on the head in his post:

1 1 1 2 2 2 2 3 3 3 3 3, you should get 1 2 3 1 2 2 3 1 2 2 2 3

There will never be less than 3 occurrences of the same number.

tonyd, thanks for that code too, unfortunately it just seems to copy the infile to the outfile and change nothing. I'll have a good look.

Thanks again guys

Posted : 07/04/2011 4:22 pm

Aidy

Posts: 2965

Free Member

#!perl

open FID, "$ARGV[0]" or die "Can't open file";

my @numbers;
my %uniq;
while (<FID>) {
chomp;
push @numbers, "2[$_]";
$uniq{$_}++;
}

close FID;

my $n = "@numbers";

foreach my $i (keys %uniq) {
$n =~ s/^(.*?)2\[$i\]/${1}1/;
$n =~ s/(.*)2\[$i\](.*?)\z/${1}3${2}/;
$n =~ s/\[$i\]//g;
}

print "$_\n" foreach split / /,$n;

There's probably a nicer way of doing that.

Posted : 07/04/2011 4:39 pm

jcromton

Posts: 0

Free Member

Topic starter

Hi Aidy, I've never used perl before. My open line is:

[code]open FID, "$numbers.txt" or die "Can't open file";[/code]

where numbers.txt is the file with the relevant numbers in. is that right?

EDIT: Congratulation Aidy, you win! (removed $)

Posted : 07/04/2011 4:55 pm

oldnpastit

Posts: 7090

Full Member

Do the groups of numbers overlap?

i.e. do you ever expect to get input that looks like:

1 1 2 3 2 1

If it's always just "1 1 1 2 2 2 2 3 3 3 4 4 4" then this might work:
#!/usr/bin/perl -w

use strict;

my $last;
while (<>) {
my $n = $_;
chomp $n;
if ($n != $last) {
if (defined $last) {
print "3\n";
}
print "1\n";
$last = $n;
} else {
print "2\n";
}
}
print "3\n";

EDIT: good interview question 🙂
EDIT: how do you put hard spaces or tabs into this forum?

Posted : 07/04/2011 4:56 pm

jcromton

Posts: 0

Free Member

Topic starter

oldnpastit: the groups of numbers will never overlap.

I have literally not even the faintest idea of perl so any sed/awk scripts would be very useful but I feel I've pushed it far enough already! Also, is there a way out outputting to a new file?

You've been great thank you so much.

Posted : 07/04/2011 5:10 pm

Aidy

Posts: 2965

Free Member

You run it as "perl perlscript.pl filename.txt"

And it prints output to stdout, i.e. "perl perlscript.pl filename.txt > output.txt"

Posted : 07/04/2011 5:14 pm

Aidy

Posts: 2965

Free Member

And I didn't realise there were non-overlapping numbers. That's an easier problem.

Posted : 07/04/2011 5:17 pm

tonyd

Posts: 1070

Full Member

Oh sorry - I read the OP to mean that you actually had n's in the file!

Posted : 07/04/2011 6:19 pm

Aidy

Posts: 2965

Free Member

#!perl

open FID, $ARGV[0] or die "Can't open file";

my @l = map {my $i = <FID>} 0..1;
chomp(@l);

print "1\n";

while (<FID>) {
chomp;
push @l, $_;
print $l[1] != $l[2] ? 3 : $l[1] != $l[0] ? 1 : 2, "\n";
shift @l;
}
close FID;

print "3\n";

Although, the post above is probably neater.

Posted : 07/04/2011 8:32 pm

richP

Posts: 117

Full Member

if you want to use awk then I think that the following will work:
awk "BEGIN{old=-1000};{nxt=$1};{ if (NR>1) {if (cur!=old) {print 1} else {if (nxt!=cur) {print 3} else {print 2}}};old=cur;cur=nxt};END{print 3}" input.txt > output.txt

on linux you probably should chage the double quotes to single.

The above relies on there being no overlaps in the groups of numbers and also blank lines will probably need to be stripped out beforehand

Posted : 07/04/2011 8:43 pm

buzz-lightyear

Posts: 0

Free Member

In Excel...

Put 9999999 in Column A, Row 1
Paste your number series in Column A, starting with Row 2.

In Column B, row 2 enter the formula: =IF(A2=A1, IF(A2=A3,2, IF(A2<>A3,3)),1)

Drag this all the way down Column 2, which copies the formula into each cell, transposing the row numbers automatically.

Column B should now contain the results. Note that Column B row 1 is left empty.

Did you say 16,000 numbers? Oops, Excel wont work.

Posted : 07/04/2011 9:06 pm

oldnpastit

Posts: 7090

Full Member

[code]
#include <stdio.h>
#include <string.h>
int main(void) {
...char buf[16];
...static char lastbuf[16];
...while (fgets(buf, sizeof(buf), stdin)) {
......if (!buf[0])
.........continue;
......if (strcmp(buf,lastbuf) != 0) {
.........if (lastbuf[0])
............printf("3\n");
.........printf("1\n");
.........strcpy(lastbuf, buf);
......} else {
.........printf("2\n");
......}
...}
...printf("3\n");
...return 0;
}

[/code]

Posted : 07/04/2011 9:47 pm

DaveyBoyWonder

Posts: 8839

Free Member

shell = awesome
perl = awesomer

Posted : 08/04/2011 8:12 am

jcromton

Posts: 0

Free Member

Topic starter

Guys, thank you so much for your help on this. I'll apply these today and hopefully make some worthwhile contributions to Marine Science.

Chris

Posted : 11/04/2011 8:59 am

geoffj

Posts: 0

Full Member

Did you say 16,000 numbers? Oops, Excel wont work.

Why not?

Posted : 11/04/2011 9:10 am

jcromton

Posts: 0

Free Member

Topic starter

RichP - that works great, apart from it starts with a 2. Everything else is perfect. See my other thread if you fancy more awk related banter.

Posted : 15/04/2011 3:42 pm

[Closed] Computer Question - awk, sed, grep or anything.

Latest Stories

Members’ Crossword Generator: give us a clue

Product of the Year: Maxxis Forekaster 3C Maxx Terra

Best eMTB of the Year: Cotic Rocket

Editors’ Choice 2025 – All our fave stuff of the year