MegaSack DRAW - This year's winner is user - rgwb
We will be in touch

[Closed] Who is best at SQL stuff? Quick test

Chat Forum

Last Post by llama 17 years ago

16 Posts

13 Users

0 Reactions

70 Views

WorldClassAccident

Posts: 13421

Full Member

Topic starter

What is the most efficient way to get the correct result for the scenario below. You can create temporary tables but you only select, not DELETE.

TableA – 17,000,000 rows - This holds the current contracts
TableB – 4,000,000 rows - This holds the current deals
TableC – 37,500,000 rows - This holds the historic deals

The TableA table provides the list of the current people that you wish to work with. The remaining data comes from TableB if there is a current contract but if there is no related record in TableB then the historic deal table, TableC, is used.

Although there will at most only ever be a single matching record in TableB it is quite likely that there will be many matches in TableC so the record with the latest timestamp should always be selected from TableC. The entries in TableC will always be older than their equivalents in TableB.

The end result should be for every contract record in the TableA there should be one, and only one, record that will contain details from either TableB or TableC.

TableA
KeyPt1 KeyPt2
1 1
1 2
2 2

TableB
KeyPt1 KeyPt2 SomeData TMPSTMP
1 1 AAA 00:00:00
2 2 EEE 00:00:03

TableC
KeyPt1 KeyPt2 SomeData TMPSTMP
1 2 BBB 00:00:00
1 2 CCC 00:00:01
2 2 DDD 00:00:02

Given the tables above, the output records selected would be:
KeyPt1 KeyPt2 SomeData TMPSTMP
1 1 AAA 00:00:00
1 2 CCC 00:00:01
2 2 EEE 00:00:03

Posted : 24/06/2009 2:53 pm

WorldClassAccident

Posts: 13421

Full Member

Topic starter

Can you guys hurry up, this is a timed exercise

Posted : 24/06/2009 3:03 pm

simonfbarnes

Posts: 0

Free Member

most of us are asleep now...

Posted : 24/06/2009 3:05 pm

andywhit

Posts: 0

Free Member

I thought you were *THE* DB guru ?

Posted : 24/06/2009 3:07 pm

soma_rich

Posts: 2

Free Member

If I understand you right:

union the current deals with historic deals

select max(timestamp) from the union with a group by

do your join on that?

Posted : 24/06/2009 3:10 pm

WorldClassAccident

Posts: 13421

Full Member

Topic starter

We were heading towards unioning the tables.

Posted : 24/06/2009 3:27 pm

mudshark

Posts: 45

Free Member

Depending on the situation I might look at a creating a materialized view for this if speed is an issue - in Oracle anyway.

Posted : 24/06/2009 3:30 pm

IanMunro

Posts: 0

Free Member

I'd show you, but you'd have to sign an NDA or pay for the IP 🙂

Posted : 24/06/2009 4:29 pm

atlaz

Posts: 0

Free Member

I have a DBA friend who does "small jobs" in the evening. For enough spare parts supplied I might even be able to do it myself 😛

Posted : 24/06/2009 4:31 pm

toby1

Posts: 5755

Full Member

Do you really want to return a minimum of 17 million rows, what the hell would you do with all that data, unless you have a cheap developing world workforce to work through all the contacts?

Also, as Mudshark says, is this Oracle (shiny analytics for example), SQL Server - environment and available options might influence choice?

Posted : 24/06/2009 4:41 pm

chvck

Posts: 349

Free Member

From what I remember if you're using temp tables...then there's a better way to do it!

Posted : 24/06/2009 4:49 pm

Joxster

Posts: 0

Free Member

Make a back up and then DROP the tables

Posted : 24/06/2009 5:00 pm

RudeBoy

Posts: 0

Free Member

Is it a Baby Robin?

Woods behind Nationwide in Swindon?

42?

Posted : 24/06/2009 5:09 pm

WorldClassAccident

Posts: 13421

Full Member

Topic starter

Data transfer from one system to another.

Just to add to the mix, the system does single record processing so every group statement gets repeated once for every record processed. Could be a long night

Posted : 24/06/2009 5:10 pm

GrahamA

Posts: 19

Free Member

I think that temporary tables behave differently on Oracle and SQL server so what type of DB you are using will affect the answer.

Is this a one off or a regular report?

If it is a one off and you are using Oracle I'd probably create a table (TableC1) of only the most recent TableC data using a select (CTAS) with a low PCTFREE and a analytic function to get the first row e.g row_number() over (partition by KeyPt1 KeyPt2 order by SomeData )

Then index the KeyPt1 KeyPt2 columns

Create another table (CTAS) (TableC2) by selecting from TableC1 where there isn't a record in TableB

Create another table (CTAS) (TableB1+C2) another table that duplicates TableB then insert into this table all the data in TableC2

Create a final table also using a CTAS by joining TableA and TableB1+C2.

Posted : 24/06/2009 5:14 pm

GrahamA

Posts: 19

Free Member

Data transfer from one system to another.

Might be better to create a new table by outer joining Table1 and Table2 then updating (17m rows) (TableA)

Create a new table (TableB) based on TableC containing only the most recent data.

Then creating a third table joining TableA and TableB where SomeData is null.

Posted : 24/06/2009 5:21 pm

llama

Posts: 3297

Full Member

make D containing a left join of A and B

make E containing the key and max of C grouped by key where D has null data

update D from joining C with E on key and timestamp

if not fast enough and something you do often then change the schema so only 1 table has the latest data

Posted : 24/06/2009 6:08 pm

Latest Stories

Singletrack Issue 164: Editorial
by Mark Alker
Fresh Goods Friday 788: The I’ll Show You Mine… Edition
by Ben Haworth
Members’ Crossword Generator: give us a clue
by Mark Alker
Product of the Year: Maxxis Forekaster 3C Maxx Terra
by Ben Haworth