MegaSack DRAW - This year's winner is user - rgwb
We will be in touch
What is the most efficient way to get the correct result for the scenario below. You can create temporary tables but you only select, not DELETE.
TableA – 17,000,000 rows - This holds the current contracts
TableB – 4,000,000 rows - This holds the current deals
TableC – 37,500,000 rows - This holds the historic deals
The TableA table provides the list of the current people that you wish to work with. The remaining data comes from TableB if there is a current contract but if there is no related record in TableB then the historic deal table, TableC, is used.
Although there will at most only ever be a single matching record in TableB it is quite likely that there will be many matches in TableC so the record with the latest timestamp should always be selected from TableC. The entries in TableC will always be older than their equivalents in TableB.
The end result should be for every contract record in the TableA there should be one, and only one, record that will contain details from either TableB or TableC.
TableA
KeyPt1 KeyPt2
1 1
1 2
2 2
TableB
KeyPt1 KeyPt2 SomeData TMPSTMP
1 1 AAA 00:00:00
2 2 EEE 00:00:03
TableC
KeyPt1 KeyPt2 SomeData TMPSTMP
1 2 BBB 00:00:00
1 2 CCC 00:00:01
2 2 DDD 00:00:02
Given the tables above, the output records selected would be:
KeyPt1 KeyPt2 SomeData TMPSTMP
1 1 AAA 00:00:00
1 2 CCC 00:00:01
2 2 EEE 00:00:03
Can you guys hurry up, this is a timed exercise
most of us are asleep now...
I thought you were *THE* DB guru ?
If I understand you right:
union the current deals with historic deals
select max(timestamp) from the union with a group by
do your join on that?
We were heading towards unioning the tables.
Depending on the situation I might look at a creating a materialized view for this if speed is an issue - in Oracle anyway.
I'd show you, but you'd have to sign an NDA or pay for the IP 🙂
I have a DBA friend who does "small jobs" in the evening. For enough spare parts supplied I might even be able to do it myself 😛
Do you really want to return a minimum of 17 million rows, what the hell would you do with all that data, unless you have a cheap developing world workforce to work through all the contacts?
Also, as Mudshark says, is this Oracle (shiny analytics for example), SQL Server - environment and available options might influence choice?
From what I remember if you're using temp tables...then there's a better way to do it!
Make a back up and then DROP the tables
Is it a Baby Robin?
Woods behind Nationwide in Swindon?
42?
Data transfer from one system to another.
Just to add to the mix, the system does single record processing so every group statement gets repeated once for every record processed. Could be a long night
I think that temporary tables behave differently on Oracle and SQL server so what type of DB you are using will affect the answer.
Is this a one off or a regular report?
If it is a one off and you are using Oracle I'd probably create a table (TableC1) of only the most recent TableC data using a select (CTAS) with a low PCTFREE and a analytic function to get the first row e.g row_number() over (partition by KeyPt1 KeyPt2 order by SomeData )
Then index the KeyPt1 KeyPt2 columns
Create another table (CTAS) (TableC2) by selecting from TableC1 where there isn't a record in TableB
Create another table (CTAS) (TableB1+C2) another table that duplicates TableB then insert into this table all the data in TableC2
Create a final table also using a CTAS by joining TableA and TableB1+C2.
Data transfer from one system to another.
Might be better to create a new table by outer joining Table1 and Table2 then updating (17m rows) (TableA)
Create a new table (TableB) based on TableC containing only the most recent data.
Then creating a third table joining TableA and TableB where SomeData is null.
make D containing a left join of A and B
make E containing the key and max of C grouped by key where D has null data
update D from joining C with E on key and timestamp
if not fast enough and something you do often then change the schema so only 1 table has the latest data
