Who is best at SQL ...
 

MegaSack DRAW - This year's winner is user - rgwb
We will be in touch

[Closed] Who is best at SQL stuff? Quick test

16 Posts
13 Users
0 Reactions
69 Views
Posts: 13421
Full Member
Topic starter
 

What is the most efficient way to get the correct result for the scenario below. You can create temporary tables but you only select, not DELETE.

TableA – 17,000,000 rows - This holds the current contracts
TableB – 4,000,000 rows - This holds the current deals
TableC – 37,500,000 rows - This holds the historic deals

The TableA table provides the list of the current people that you wish to work with. The remaining data comes from TableB if there is a current contract but if there is no related record in TableB then the historic deal table, TableC, is used.

Although there will at most only ever be a single matching record in TableB it is quite likely that there will be many matches in TableC so the record with the latest timestamp should always be selected from TableC. The entries in TableC will always be older than their equivalents in TableB.

The end result should be for every contract record in the TableA there should be one, and only one, record that will contain details from either TableB or TableC.

TableA
KeyPt1 KeyPt2
1 1
1 2
2 2

TableB
KeyPt1 KeyPt2 SomeData TMPSTMP
1 1 AAA 00:00:00
2 2 EEE 00:00:03

TableC
KeyPt1 KeyPt2 SomeData TMPSTMP
1 2 BBB 00:00:00
1 2 CCC 00:00:01
2 2 DDD 00:00:02

Given the tables above, the output records selected would be:
KeyPt1 KeyPt2 SomeData TMPSTMP
1 1 AAA 00:00:00
1 2 CCC 00:00:01
2 2 EEE 00:00:03


 
Posted : 24/06/2009 2:53 pm
Posts: 13421
Full Member
Topic starter
 

Can you guys hurry up, this is a timed exercise


 
Posted : 24/06/2009 3:03 pm
Posts: 0
Free Member
 

most of us are asleep now...


 
Posted : 24/06/2009 3:05 pm
Posts: 0
Free Member
 

I thought you were *THE* DB guru ?


 
Posted : 24/06/2009 3:07 pm
Posts: 2
Free Member
 

If I understand you right:

union the current deals with historic deals

select max(timestamp) from the union with a group by

do your join on that?


 
Posted : 24/06/2009 3:10 pm
Posts: 13421
Full Member
Topic starter
 

We were heading towards unioning the tables.


 
Posted : 24/06/2009 3:27 pm
Posts: 45
Free Member
 

Depending on the situation I might look at a creating a materialized view for this if speed is an issue - in Oracle anyway.


 
Posted : 24/06/2009 3:30 pm
Posts: 0
Free Member
 

I'd show you, but you'd have to sign an NDA or pay for the IP 🙂


 
Posted : 24/06/2009 4:29 pm
Posts: 0
Free Member
 

I have a DBA friend who does "small jobs" in the evening. For enough spare parts supplied I might even be able to do it myself 😛


 
Posted : 24/06/2009 4:31 pm
Posts: 5755
Full Member
 

Do you really want to return a minimum of 17 million rows, what the hell would you do with all that data, unless you have a cheap developing world workforce to work through all the contacts?

Also, as Mudshark says, is this Oracle (shiny analytics for example), SQL Server - environment and available options might influence choice?


 
Posted : 24/06/2009 4:41 pm
Posts: 349
Free Member
 

From what I remember if you're using temp tables...then there's a better way to do it!


 
Posted : 24/06/2009 4:49 pm
Posts: 0
Free Member
 

Make a back up and then DROP the tables


 
Posted : 24/06/2009 5:00 pm
Posts: 0
Free Member
 

Is it a Baby Robin?

Woods behind Nationwide in Swindon?

42?


 
Posted : 24/06/2009 5:09 pm
Posts: 13421
Full Member
Topic starter
 

Data transfer from one system to another.

Just to add to the mix, the system does single record processing so every group statement gets repeated once for every record processed. Could be a long night


 
Posted : 24/06/2009 5:10 pm
Posts: 19
Free Member
 

I think that temporary tables behave differently on Oracle and SQL server so what type of DB you are using will affect the answer.

Is this a one off or a regular report?

If it is a one off and you are using Oracle I'd probably create a table (TableC1) of only the most recent TableC data using a select (CTAS) with a low PCTFREE and a analytic function to get the first row e.g row_number() over (partition by KeyPt1 KeyPt2 order by SomeData )

Then index the KeyPt1 KeyPt2 columns

Create another table (CTAS) (TableC2) by selecting from TableC1 where there isn't a record in TableB

Create another table (CTAS) (TableB1+C2) another table that duplicates TableB then insert into this table all the data in TableC2

Create a final table also using a CTAS by joining TableA and TableB1+C2.


 
Posted : 24/06/2009 5:14 pm
Posts: 19
Free Member
 

Data transfer from one system to another.

Might be better to create a new table by outer joining Table1 and Table2 then updating (17m rows) (TableA)

Create a new table (TableB) based on TableC containing only the most recent data.

Then creating a third table joining TableA and TableB where SomeData is null.


 
Posted : 24/06/2009 5:21 pm
Posts: 3294
Full Member
 

make D containing a left join of A and B

make E containing the key and max of C grouped by key where D has null data

update D from joining C with E on key and timestamp

if not fast enough and something you do often then change the schema so only 1 table has the latest data


 
Posted : 24/06/2009 6:08 pm