Calling all *NIX geeks – help! – Chat Forum – Singletrack World Magazine Forum

woody2000

Posts: 8177

Free Member

Topic starter

I have 2 directories containing data files that differ in name. I want to copy files that differ from the source directory to the destination directory, but I need them to keep the name that already exists in the destination directory.

For example, in the source directory I have a file called emplo00171.dat and in the destination directory the file is called emplo00106.dat, so I would need to copy the file and keep the "new" name. The filenames are identical in each directory barring the last 3 characters.

I'm sure rsync or similar can do just this, but my brain just isn't working today and I can't figure it out. Any suggestions? SUSE Linux BTW.

Ta 😕

Posted : 04/05/2010 12:20 pm

AdamW

Posts: 8

Free Member

You are a bit confusing...

If the files are the same then I'd be tempted to do a chksum against each file (or md5sum) to make sure that file A == file B then choose the name accordingly.

Posted : 04/05/2010 12:40 pm

woody2000

Posts: 8177

Free Member

Topic starter

Sorry, what do I need to clarify?

Posted : 04/05/2010 12:42 pm

coffeeking

Posts: 17

Free Member

IS the mapping constant? i.e. 171 > 106, 172 > 107 etc?

Posted : 04/05/2010 12:43 pm

woody2000

Posts: 8177

Free Member

Topic starter

Unfortunately not ck...!

Posted : 04/05/2010 12:44 pm

zigzag69

Posts: 0

Free Member

Following should work under ksh. BACKUP YOUR FILES BEFORE YOU RUN IT!

EDIT: Oh bugger, losing format. Sent you an email.

Posted : 04/05/2010 12:54 pm

coffeeking

Posts: 17

Free Member

Doesnt this pose the slight problem that you have non-identical files being mapped to non-sequential numbering - i.e. just copying a random bunch of filenames to new filenames? I suppose I'm confused as to:

Dir 1:
Files with X contents, named Y

Dir 2:
Files with F contents, named Z

you effectively just have 2 different sets of files, and I don't see how they link to have their replacement done? Ultimately what it seems you're looking for is a script/program that has a lookup of each files corresponding second name, and then copy it across?

Posted : 04/05/2010 12:55 pm

grahamb

Posts: 0

Free Member

something like ....

cd source
for i in * ; do
j=$(echo $i | sed 's/...\.dat//')
k=$(ls ../destination/$j*) && cp -i $i $k
done

Posted : 04/05/2010 12:58 pm

allthepies

Posts: 0

Free Member

Oooo, nice shell skilz 😉

Posted : 04/05/2010 1:01 pm

woody2000

Posts: 8177

Free Member

Topic starter

Skillz indeed, I shall do some testing! Cheers all

Posted : 04/05/2010 1:21 pm

grahamb

Posts: 0

Free Member

I tested it very briefly here on a RHEL system before posting, so should be ok for SLES.

It'll break if you have more than one file in the destination directory that matches the wildcard. You'd need to do something with the "ls" to sort the file you want in that case, like piping the output through sort & tail/head ...
k=$(ls ../destination/$j* | sort -n | tail -1)

Posted : 04/05/2010 1:29 pm

AdamW

Posts: 8

Free Member

But this will copy in files that are the same. That is where I got confused - I thought the OP said the files were the same but with different names.

If you have a file with contents X in both directories A and B then you may end up with multiple copies. That is why I suggested cksum or md5sum to check to see if they had the same contents.

Or am I just confused here? I usually am.

Posted : 04/05/2010 2:30 pm

buzz-lightyear

Posts: 0

Free Member

Me too.

In the example given: emplo00171 ~ emplo00106. But the content of source/emplo00171.dat is fresher, so you want to update the content of destination/emplo00106.dat, right?

How does a shell-script know that emplo00171 ~ emplo00106?

Maybe I'm being dim also so am curious to understand this!

Posted : 04/05/2010 2:39 pm

grahamb

Posts: 0

Free Member

Agreed it's confusing.

I read it that the OP said that the last 3 chars of the filename will be different, the rest is unique. I assumed by that he meant that the directories have say in dest & src ....

foo00234 ~ foo00123
bar00567 ~ bar00456

and that he wanted to overwrite foo00234 in the dest directory with foo00123 from src, same for bar00567 ~ bar00456.

Yes, i didn't bother checking the contents if they match (i forgot that bit 😉 But if they do, it'll just use up a bit of disk bandwidth. An md5sum would be easy to add before the cp ...

Posted : 04/05/2010 2:59 pm

woody2000

Posts: 8177

Free Member

Topic starter

Basically, the destination directory contains empty data files created by another shell script, the names of which are important to a database manager program. The source directory contains the data I want, but with the "wrong" filenames. I'm having some database connection issues which seem to point to some kind of low level permissions issue (the UNIX file permissions are wide open but I still can't connect to the database with SQL), so I've recreated the tables and now I want to just copy over the data files with the "right" name. I could just unload/reload the data, but there are a LOT of files! It's an Informix C-ISAM database if anyone's interested.

Any clearer.........?

Thought not! 🙂

Posted : 04/05/2010 3:05 pm

buzz-lightyear

Posts: 0

Free Member

"last 3 chars of the filename will be different, the rest is unique"

Indeed, but that didn't stack up with the example given [scratches head].

I assumed that the source and destination file contents [u]must[/u] differ, because if the contents currently match (md5sum check), and he wants to preserve the destination filename - what is his purpose in copying at all?

Arguably this is this a microcosm of the problems facing software engineers: weak problem specification 😀

Nifty script BTW.

EDIT: OK the destination files are empty. Come on Woody my Toy Story mate, explain how you know which source filenames should match which destination filenames?

Posted : 04/05/2010 3:07 pm

allthepies

Posts: 0

Free Member

>Indeed, but that didn't stack up with the example given [scratches head].

I had assumed that one had to ignore the filename suffix i.e strip off the .dat and then remove the last three chars from the resultant filename

so:-

>emplo00171.dat and in the destination directory the file is called emplo00106.dat

Remove the ".dat" suffix and then remove the 171 and 106 chars from each filename and you get the match.

Very confusing OP though 🙂

Posted : 04/05/2010 3:21 pm

damion

Posts: 270

Free Member

Here's my punt at the logic:

foreach srcfile in srcdir
foreach destfile in destdir
if !diff srcfile destfile
cp srcfile destfile
fi
done
done

Please excuse the perl/shell muddle I'm in a mixed up world at the moment. If the logis's right I'll spend some thought on it.

Damion.

Posted : 04/05/2010 3:33 pm

woody2000

Posts: 8177

Free Member

Topic starter

Sorry!

I'm confused too - I'd normally go and bother a programmer, but I'm just trying to figure it out for myself. allthepies - there's a corresponding .idx file too, so stripping the suffix is a no go.

Buzz - the filenames only differ by the last 3 digits, so for each file in the source directory, there's a corresponding file in the destination directory with a slightly different name.

Eg:

/src/file100101.dat /dest/file100102.dat
/src/file200101.dat /dest/file200101.dat

and so on. Any better?

Posted : 04/05/2010 3:35 pm

allthepies

Posts: 0

Free Member

for future info a "filename" is the whole shebang, including the .dat / .idx / whatever suffix. So when you mention the last three chars of a filename then programmers will assume you mean the suffix (.dat/.idex etc)

Posted : 04/05/2010 3:42 pm

damion

Posts: 270

Free Member

Does that mean we're not interested in the contents? So:

cp src/file1xxx dst/file1yyy
cp src/file2xxx dsr/file2yyy

assuming that there wouldn't be a file1zzz?

EDIT: Oh thats not clear either, Damn it, I'm going back to my crontab now...

Posted : 04/05/2010 3:45 pm

woody2000

Posts: 8177

Free Member

Topic starter

Sorry - I told you my brain was out of order today!

damion - I could just copy each one by hand, but there's a lot of them!

Posted : 04/05/2010 3:48 pm

samuri

Posts: 2

Free Member

Damion's original post has the logic right from what I'm reading. Seems pretty simple to me. i'll knock up a script later tonight if no-one else already has.

Posted : 04/05/2010 3:48 pm

damion

Posts: 270

Free Member

if we're only matching on everything bar the last three char before the suffix, then you could generate a filelist, match then copy. If thats what you want, give me a minute....

Posted : 04/05/2010 3:51 pm

woody2000

Posts: 8177

Free Member

Topic starter

damion - I think that's about the long and the short of it, cheers

Posted : 04/05/2010 3:53 pm

damion

Posts: 270

Free Member

woody YGM.

I've got to head off now, so if its not what you needed Samuri its over to you....

Posted : 04/05/2010 4:07 pm

buzz-lightyear

Posts: 0

Free Member

Gotcha!

Posted : 04/05/2010 4:25 pm

samuri

Posts: 2

Free Member

not sure if this has been answered but I just knocked this up

I like to keep shell scripts simple so they're easy to edit so no clever scripting skillz here.

Obviously you'll need to edit this for the source and destination directories and for the lengths of the filename sections but as long as the filename lengths are consistent this will work. You can always apply your own suffix if there are multiple ones.

edit: oh ffs! The phorum code is interpreting the script as html and I can't be bothered working my way through it.

It's here
[url] http://www.samuri.co.uk/junk/script.txt [/url]

Posted : 04/05/2010 9:58 pm

[Closed] Calling all *NIX geeks - help!

Latest Stories

Members’ Crossword Generator: give us a clue

Product of the Year: Maxxis Forekaster 3C Maxx Terra

Best eMTB of the Year: Cotic Rocket

Editors’ Choice 2025 – All our fave stuff of the year