Calling all *NIX ge...
 

MegaSack DRAW - This year's winner is user - rgwb
We will be in touch

[Closed] Calling all *NIX geeks - help!

27 Posts
9 Users
0 Reactions
63 Views
Posts: 8177
Free Member
Topic starter
 

I have 2 directories containing data files that differ in name. I want to copy files that differ from the source directory to the destination directory, but I need them to keep the name that already exists in the destination directory.

For example, in the source directory I have a file called emplo00171.dat and in the destination directory the file is called emplo00106.dat, so I would need to copy the file and keep the "new" name. The filenames are identical in each directory barring the last 3 characters.

I'm sure rsync or similar can do just this, but my brain just isn't working today and I can't figure it out. Any suggestions? SUSE Linux BTW.

Ta 😕


 
Posted : 04/05/2010 12:20 pm
Posts: 8
Free Member
 

You are a bit confusing...

If the files are the same then I'd be tempted to do a chksum against each file (or md5sum) to make sure that file A == file B then choose the name accordingly.


 
Posted : 04/05/2010 12:40 pm
Posts: 8177
Free Member
Topic starter
 

Sorry, what do I need to clarify?


 
Posted : 04/05/2010 12:42 pm
Posts: 17
Free Member
 

IS the mapping constant? i.e. 171 > 106, 172 > 107 etc?


 
Posted : 04/05/2010 12:43 pm
Posts: 8177
Free Member
Topic starter
 

Unfortunately not ck...!


 
Posted : 04/05/2010 12:44 pm
Posts: 0
Free Member
 

Following should work under ksh. BACKUP YOUR FILES BEFORE YOU RUN IT!

EDIT: Oh bugger, losing format. Sent you an email.


 
Posted : 04/05/2010 12:54 pm
Posts: 17
Free Member
 

Doesnt this pose the slight problem that you have non-identical files being mapped to non-sequential numbering - i.e. just copying a random bunch of filenames to new filenames? I suppose I'm confused as to:

Dir 1:
Files with X contents, named Y

Dir 2:
Files with F contents, named Z

you effectively just have 2 different sets of files, and I don't see how they link to have their replacement done? Ultimately what it seems you're looking for is a script/program that has a lookup of each files corresponding second name, and then copy it across?


 
Posted : 04/05/2010 12:55 pm
Posts: 0
Free Member
 

something like ....

cd source
for i in * ; do
j=$(echo $i | sed 's/...\.dat//')
k=$(ls ../destination/$j*) && cp -i $i $k
done


 
Posted : 04/05/2010 12:58 pm
Posts: 0
Free Member
 

Oooo, nice shell skilz 😉


 
Posted : 04/05/2010 1:01 pm
Posts: 8177
Free Member
Topic starter
 

Skillz indeed, I shall do some testing! Cheers all


 
Posted : 04/05/2010 1:21 pm
Posts: 0
Free Member
 

I tested it very briefly here on a RHEL system before posting, so should be ok for SLES.

It'll break if you have more than one file in the destination directory that matches the wildcard. You'd need to do something with the "ls" to sort the file you want in that case, like piping the output through sort & tail/head ...
k=$(ls ../destination/$j* | sort -n | tail -1)


 
Posted : 04/05/2010 1:29 pm
Posts: 8
Free Member
 

But this will copy in files that are the same. That is where I got confused - I thought the OP said the files were the same but with different names.

If you have a file with contents X in both directories A and B then you may end up with multiple copies. That is why I suggested cksum or md5sum to check to see if they had the same contents.

Or am I just confused here? I usually am.


 
Posted : 04/05/2010 2:30 pm
Posts: 0
Free Member
 

Me too.

In the example given: emplo00171 ~ emplo00106. But the content of source/emplo00171.dat is fresher, so you want to update the content of destination/emplo00106.dat, right?

How does a shell-script know that emplo00171 ~ emplo00106?

Maybe I'm being dim also so am curious to understand this!


 
Posted : 04/05/2010 2:39 pm
Posts: 0
Free Member
 

Agreed it's confusing.

I read it that the OP said that the last 3 chars of the filename will be different, the rest is unique. I assumed by that he meant that the directories have say in dest & src ....

foo00234 ~ foo00123
bar00567 ~ bar00456

and that he wanted to overwrite foo00234 in the dest directory with foo00123 from src, same for bar00567 ~ bar00456.

Yes, i didn't bother checking the contents if they match (i forgot that bit 😉 But if they do, it'll just use up a bit of disk bandwidth. An md5sum would be easy to add before the cp ...


 
Posted : 04/05/2010 2:59 pm
Posts: 8177
Free Member
Topic starter
 

Basically, the destination directory contains empty data files created by another shell script, the names of which are important to a database manager program. The source directory contains the data I want, but with the "wrong" filenames. I'm having some database connection issues which seem to point to some kind of low level permissions issue (the UNIX file permissions are wide open but I still can't connect to the database with SQL), so I've recreated the tables and now I want to just copy over the data files with the "right" name. I could just unload/reload the data, but there are a LOT of files! It's an Informix C-ISAM database if anyone's interested.

Any clearer.........?

Thought not! 🙂


 
Posted : 04/05/2010 3:05 pm
Posts: 0
Free Member
 

"last 3 chars of the filename will be different, the rest is unique"

Indeed, but that didn't stack up with the example given [scratches head].

I assumed that the source and destination file contents [u]must[/u] differ, because if the contents currently match (md5sum check), and he wants to preserve the destination filename - what is his purpose in copying at all?

Arguably this is this a microcosm of the problems facing software engineers: weak problem specification 😀

Nifty script BTW.

EDIT: OK the destination files are empty. Come on Woody my Toy Story mate, explain how you know which source filenames should match which destination filenames?


 
Posted : 04/05/2010 3:07 pm
Posts: 0
Free Member
 

>Indeed, but that didn't stack up with the example given [scratches head].

I had assumed that one had to ignore the filename suffix i.e strip off the .dat and then remove the last three chars from the resultant filename

so:-

>emplo00171.dat and in the destination directory the file is called emplo00106.dat

Remove the ".dat" suffix and then remove the 171 and 106 chars from each filename and you get the match.

Very confusing OP though 🙂


 
Posted : 04/05/2010 3:21 pm
Posts: 270
Free Member
 

Here's my punt at the logic:

foreach srcfile in srcdir
foreach destfile in destdir
if !diff srcfile destfile
cp srcfile destfile
fi
done
done

Please excuse the perl/shell muddle I'm in a mixed up world at the moment. If the logis's right I'll spend some thought on it.

Damion.


 
Posted : 04/05/2010 3:33 pm
Posts: 8177
Free Member
Topic starter
 

Sorry!

I'm confused too - I'd normally go and bother a programmer, but I'm just trying to figure it out for myself. allthepies - there's a corresponding .idx file too, so stripping the suffix is a no go.

Buzz - the filenames only differ by the last 3 digits, so for each file in the source directory, there's a corresponding file in the destination directory with a slightly different name.

Eg:

/src/file100101.dat /dest/file100102.dat
/src/file200101.dat /dest/file200101.dat

and so on. Any better?


 
Posted : 04/05/2010 3:35 pm
Posts: 0
Free Member
 

for future info a "filename" is the whole shebang, including the .dat / .idx / whatever suffix. So when you mention the last three chars of a filename then programmers will assume you mean the suffix (.dat/.idex etc)


 
Posted : 04/05/2010 3:42 pm
Posts: 270
Free Member
 

Does that mean we're not interested in the contents? So:

cp src/file1xxx dst/file1yyy
cp src/file2xxx dsr/file2yyy

assuming that there wouldn't be a file1zzz?

EDIT: Oh thats not clear either, Damn it, I'm going back to my crontab now...


 
Posted : 04/05/2010 3:45 pm
Posts: 8177
Free Member
Topic starter
 

Sorry - I told you my brain was out of order today!

damion - I could just copy each one by hand, but there's a lot of them!


 
Posted : 04/05/2010 3:48 pm
Posts: 2
Free Member
 

Damion's original post has the logic right from what I'm reading. Seems pretty simple to me. i'll knock up a script later tonight if no-one else already has.


 
Posted : 04/05/2010 3:48 pm
Posts: 270
Free Member
 

if we're only matching on everything bar the last three char before the suffix, then you could generate a filelist, match then copy. If thats what you want, give me a minute....


 
Posted : 04/05/2010 3:51 pm
Posts: 8177
Free Member
Topic starter
 

damion - I think that's about the long and the short of it, cheers


 
Posted : 04/05/2010 3:53 pm
Posts: 270
Free Member
 

woody YGM.

I've got to head off now, so if its not what you needed Samuri its over to you....


 
Posted : 04/05/2010 4:07 pm
Posts: 0
Free Member
 

Gotcha!


 
Posted : 04/05/2010 4:25 pm
Posts: 2
Free Member
 

not sure if this has been answered but I just knocked this up

I like to keep shell scripts simple so they're easy to edit so no clever scripting skillz here.

Obviously you'll need to edit this for the source and destination directories and for the lengths of the filename sections but as long as the filename lengths are consistent this will work. You can always apply your own suffix if there are multiple ones.

edit: oh ffs! The phorum code is interpreting the script as html and I can't be bothered working my way through it.

It's here
[url] http://www.samuri.co.uk/junk/script.txt [/url]


 
Posted : 04/05/2010 9:58 pm