MegaSack DRAW - This year's winner is user - rgwb
We will be in touch
Hi, it's a very simple problem I'd like to solve, I know this is probably not the best place to ask, but I'm not ready yet to sign up to a new forum.
I have a list of numbers:
1
1
...
1
2
2
...
2
...
n
n
...
n
and I would like to replace the first occurrence of n with 1, the last occurrence of n with 3 and all n in between with 2, like this:
1
2
2
...
2
3
1
2
2
...
2
3
...
1
2
2
...
3
n is around 300 and the list is 16000 so it's not massive but may well become more massive in the future.
I suspect'awk' would be able to perform this task? This would be my preferred method before I try to use matlab.
Thanks for reading, and thanks in advance of any help or suggestions.
Chris
perl FTW
This would be trivial in Excel. Can advise if you have that.
Choose your text wrangling tool of choice or use Excel.
In excel load the numbers in column a with an index/key in column b. A couple of simple forumlae will have you on your way.
awk is great for search and replace but you're going to need a tasty regex to sort that out, well beyond me I'm afraid.
Does it have to be done in batch? Wondering if dumping it to Excel might be easier.
my first attempt, can't remember how to edit a specified line in sed but you can see what I'm getting at. NOT TESTED!!!
[code]
file=infile.txt
firstoccurence=wc -l $file
lastoccurence=0
#grep gives us the line numbers where 'n' occurs in the format $file:x:how-n-occurs so we strip out the line number and find the first and last occurrence. Where x is the line number.
for i in grep -in 'n' $file
do
#remove front bit
tmp=${i#$file:}
#remove back bit
tmp=${tmp%:*}
if [ $tmp -lt $firstoccurence ]
then
firstoccurence=${tmp%:}
fi
if [ $tmp -gt $lastoccurence ]
then
lastoccurence=${tmp%:}
fi
done
#replace first occurrence with 1
some sed or awk command for line $firstoccurence
#replace last occurrence with 3
some sed or awk commad for line $lastoccurence
#replace all middle occurrence with 2
sed -i 's/n/2/g' $file
exit 0
[/code]
worked out the missing sed commands in above post.
[code]
#replacing first occurrence
sed $firstoccurence's/.*/1/' $file>tmp.txt;mv tmp.txt $file
[/code]
and
[code]
#replacing last occurrence
sed $lastoccurence's/.*/3/' $file>tmp.txt;mv tmp.txt $file
[/code]
so stitch it together and check.
p.s.
there should be back quotes around the wc -l $file command above and the grep command in the for loop but they are not showing for some reason.
e.g
firstoccurence=[backquote]wc -l $file[backquote]
just remembered you will have to add in a course of action for when there is only one occurrence of n in the file and hence.
$firstoccurence -eq $lastoccurence
which can be a simple find replace with sed of the value you wish n to be.
Thanks for the replies, I've never really used Excel and learning this way of doing it would be more beneficial in the long run I feel.
Wow, thanks TheBrick, that's some impressive script, can you confirm the first for loop as being:
[code] for i in grep '-in 'n' $file' [/code]
I'm not really following how we're defining n.
Thanks again for taking the time.
EDIT: p.s. it's a lot more complicated than I thought doing it this way!
So if you've got 1 1 1 2 2 2 2 3 3 3 3 3, you should get 1 2 3 1 2 2 3 1 2 2 2 3?
What happens if there's only 1 or 2 occurrences of the number?
I have to run but gave it a quick try, this should work:
#!/bin/sh
infile=/tmp/numbers.in
outfile=/tmp/numbers.out
first=grep -in n $infile | head -1 | cut -d: -f1
last=grep -in n $infile | tail -1 | cut -d: -f1
# replace first occurence, then last,then inbetweeners
sed -e "${first}s/n/1/" -e "${last}s/n/3/" -e 's/n/2/g' $infile > $outfile
Give it a go and let me know?!
scary shell skilz here 🙂
sorry I've misread your 1st post slightly but we can fix that.
can you confirm the first for loop as being:for i in grep '-in 'n' $file'
nearly it's
for i in [backquote]grep -in 'n' $file[backquote]
where [backquote] is in the table on this page called "Command substitution" http://www.grymoire.com/Unix/Quote.html or here as grave accent http://en.wikipedia.org/wiki/Grave_accent.
I originally thought you had a file with numbers and the letter "n" which required the 1st instance of n to be replaced by 1 and the last by 3 and all others replaced by 2. I was thinking your n was some version of NaN for some reason. So my script is useless.
Let me think about and I'll get back to you.
re reading your first post I'm unsure of what you are trying to do exactly. You're example is not clear to me.
1
1
goes to
1
2
2
?
Aidy hit the nail on the head in his post:
1 1 1 2 2 2 2 3 3 3 3 3, you should get 1 2 3 1 2 2 3 1 2 2 2 3
There will never be less than 3 occurrences of the same number.
tonyd, thanks for that code too, unfortunately it just seems to copy the infile to the outfile and change nothing. I'll have a good look.
Thanks again guys
#!perlopen FID, "$ARGV[0]" or die "Can't open file";
my @numbers;
my %uniq;
while (<FID>) {
chomp;
push @numbers, "2[$_]";
$uniq{$_}++;
}close FID;
my $n = "@numbers";
foreach my $i (keys %uniq) {
$n =~ s/^(.*?)2\[$i\]/${1}1/;
$n =~ s/(.*)2\[$i\](.*?)\z/${1}3${2}/;
$n =~ s/\[$i\]//g;
}print "$_\n" foreach split / /,$n;
There's probably a nicer way of doing that.
Hi Aidy, I've never used perl before. My open line is:
[code]open FID, "$numbers.txt" or die "Can't open file";[/code]
where numbers.txt is the file with the relevant numbers in. is that right?
EDIT: Congratulation Aidy, you win! (removed $)
Do the groups of numbers overlap?
i.e. do you ever expect to get input that looks like:
1 1 2 3 2 1
If it's always just "1 1 1 2 2 2 2 3 3 3 4 4 4" then this might work:
#!/usr/bin/perl -w
use strict;
my $last;
while (<>) {
my $n = $_;
chomp $n;
if ($n != $last) {
if (defined $last) {
print "3\n";
}
print "1\n";
$last = $n;
} else {
print "2\n";
}
}
print "3\n";
EDIT: good interview question 🙂
EDIT: how do you put hard spaces or tabs into this forum?
oldnpastit: the groups of numbers will never overlap.
I have literally not even the faintest idea of perl so any sed/awk scripts would be very useful but I feel I've pushed it far enough already! Also, is there a way out outputting to a new file?
You've been great thank you so much.
You run it as "perl perlscript.pl filename.txt"
And it prints output to stdout, i.e. "perl perlscript.pl filename.txt > output.txt"
And I didn't realise there were non-overlapping numbers. That's an easier problem.
Oh sorry - I read the OP to mean that you actually had n's in the file!
#!perlopen FID, $ARGV[0] or die "Can't open file";
my @l = map {my $i = <FID>} 0..1;
chomp(@l);print "1\n";
while (<FID>) {
chomp;
push @l, $_;
print $l[1] != $l[2] ? 3 : $l[1] != $l[0] ? 1 : 2, "\n";
shift @l;
}
close FID;print "3\n";
Although, the post above is probably neater.
if you want to use awk then I think that the following will work:
awk "BEGIN{old=-1000};{nxt=$1};{ if (NR>1) {if (cur!=old) {print 1} else {if (nxt!=cur) {print 3} else {print 2}}};old=cur;cur=nxt};END{print 3}" input.txt > output.txt
on linux you probably should chage the double quotes to single.
The above relies on there being no overlaps in the groups of numbers and also blank lines will probably need to be stripped out beforehand
In Excel...
Put 9999999 in Column A, Row 1
Paste your number series in Column A, starting with Row 2.
In Column B, row 2 enter the formula: =IF(A2=A1, IF(A2=A3,2, IF(A2<>A3,3)),1)
Drag this all the way down Column 2, which copies the formula into each cell, transposing the row numbers automatically.
Column B should now contain the results. Note that Column B row 1 is left empty.
Did you say 16,000 numbers? Oops, Excel wont work.
[code]
#include <stdio.h>
#include <string.h>
int main(void) {
...char buf[16];
...static char lastbuf[16];
...while (fgets(buf, sizeof(buf), stdin)) {
......if (!buf[0])
.........continue;
......if (strcmp(buf,lastbuf) != 0) {
.........if (lastbuf[0])
............printf("3\n");
.........printf("1\n");
.........strcpy(lastbuf, buf);
......} else {
.........printf("2\n");
......}
...}
...printf("3\n");
...return 0;
}
[/code]
shell = awesome
perl = awesomer
Guys, thank you so much for your help on this. I'll apply these today and hopefully make some worthwhile contributions to Marine Science.
Chris
Did you say 16,000 numbers? Oops, Excel wont work.
Why not?
RichP - that works great, apart from it starts with a 2. Everything else is perfect. See my other thread if you fancy more awk related banter.
