Viewing 15 posts - 1 through 15 (of 15 total)
  • Any DBAs in the house
  • molgrips
    Free Member

    Can anyone help me with this query?

    We have a table of account numbers which is loded from a flat file daily, more or less. Each time this is done, an entry is created in the load log and a new incremental load ID is assigned. The account table is not cleared out, but with each new load account numbers are associated with the latest load id. I need to check if an account number is in the table for the most recent load. There are two types of data, ABC and XYZ. The load IDs for each data type are different, since they are loaded separately. So I need to check the latest load ID for each data type.

    I’m sure this query is horribly inefficient – any way to improve it? Does my description make sense?

    SELECT
    COUNT(Account_Unique_ID) from Account A, RefData_Load_Log B
    WHERE
    (A.Account_Unique_ID = “value1” AND
    A.LoadId = B.LoadId AND
    B.LoadId = (SELECT MAX LoadId FROM RefData_Load_Log WHERE b.data_type = “ABC”))
    OR
    (A.Account_Unique_ID = “value1” AND
    A.LoadId = B.LoadId AND
    B.LoadId = (SELECT MAX LoadId FROM RefData_Load_Log WHERE b.data_type = “XYZ”))

    Aidy
    Free Member

    Hard to make a call on efficiency, but that I think that query could be written more simply:

    SELECT COUNT(Account_Unique_ID)
    FROM Account A JOIN RefData_Load_Log B ON A.LoadId = B.LoadID
    WHERE A.Account_Unique_ID = ‘value_1’
    AND B.LoadId IN ( SELECT MAX(LoadId) FROM RefData_Load_Log WHERE B.data_type IN (‘ABC’, ‘XYZ’ ) );

    Probably need to modify the join type for your data structure, though.

    molgrips
    Free Member

    I don’t think that’s the same, is it? That only checks the latest load ID for either data type. I need to check BOTH loads for both data types, the latest of each one.

    gary
    Full Member

    A CTE might be a bit more efficient again if using a recent version of SQL server. Presumably you need data type in the result too?

    Something like this (untested)

    ;WITH latestLoadIds (LoadId, data_type) AS
    (
    SELECT MAX LoadId, data_type FROM RefData_Load_Log
    WHERE b.data_type IN ( “ABC”, “XYZ” )
    GROUP BY data_type
    )
    SELECT latestLoadIds.data_type, COUNT(Account_Unique_ID)
    FROM Account A
    JOIN latestLoadIds ON latestLoadIds.loadId = A.LoadId

    molgrips
    Free Member

    Ok you’ve just gone beyond my SQL pay grade 🙂 I’ll have to google the WITH thing.

    This is DB2 btw.

    Aidy
    Free Member

    I don’t think that’s the same, is it? That only checks the latest load ID for either data type. I need to check BOTH loads for both data types, the latest of each one.

    ‘IN’, rather than ‘=’.

    molgrips
    Free Member

    So your select max will return two values?

    Aidy
    Free Member

    Er, no – good point. Throw a GROUP BY in there.

    tinribz
    Free Member

    Would like to help but can’t make head ner tail of the problem:

    There are two tables Account, RefData_Load_Log?

    They are joined by LoadId

    When you say types of data you mean ??? Which table what…

    Why are we counting? This is not Oracle anyway so I’m out.

    gary
    Full Member

    ‘IN’, rather than ‘=’.

    Yes but you will only get one loadId, not one for each type with your query. You can do that if you group by the type.

    Not touched DB2 in years, but a quick google suggests it has a similar CTE syntax. Its probably overkill in this case and you could actually rewrite it like this (I rewrote both in terms of tables in a db here and they each give the same results).

    There’s a reasonable chance it will help the query optimiser do a decent job and it is at least more readable when you come back to it in 12 months 🙂

    If you do use the CTE version, it needs the group by statement at the bottom of this query adding to it. Add an additional WHERE clause to filter the accounts as per your initial post.

    SELECT latestLoadIds.data_type, COUNT(Account_Unique_ID)
    FROM Account A
    JOIN
    (
    SELECT MAX LoadId, data_type FROM RefData_Load_Log
    WHERE b.data_type IN ( “ABC”, “XYZ” )
    GROUP BY data_type
    ) latestLoadIds ON latestLoadIds.loadId = A.LoadId
    GROUP BY latestLoadIds.data_type

    molgrips
    Free Member

    When you say types of data you mean ???

    ok perhaps ‘source’ would be a better word. We load data twice – one lot of accounts from ABC and one lot from XYZ.

    ABC and XYZ are exclusive btw, so I don’t think I need data_type in the select, gary…. do I?

    PS don’t know what CTE is 🙂

    gary
    Full Member

    You only need the specific data types if there are other types you are excluding from the query (e.g. “DEF”)

    CTE = Common Table Expression = the WITH bit. Its just funky syntax for generating a dynamic table ahead of the main query. In this example you can just do it inside the query as per the second example. It comes in quite handy if you need to evaluate hierarchies and then join against those.

    tinribz
    Free Member

    I’d probably focus on an efficient query for one ‘type’, nail that then repeat it for the next type with a union between to return two rows.

    gary
    Full Member

    But in this case the efficiency is probably mostly from working out the 2 load ids in one go (albeit that is me second guessing the optimiser on a DB I don’t work with 🙂 ). Once you have that, the rest of the query is trivial and just needs the appropriate indexes on loadId if (as it sounds) the tables are large.

    molgrips
    Free Member

    You only need the specific data types if there are other types you are excluding from the query

    There are only two data types, but there are lots of datatype/loadid combinations. Tables are large, yes.

Viewing 15 posts - 1 through 15 (of 15 total)

The topic ‘Any DBAs in the house’ is closed to new replies.