So, in the world I inhabit (telecomms, essentially) SLAs are a fact of life.
We divide them into two main sections:
Hardware is then divided into two:
2. Fault fix
Fault fixes then have two or three categories:
1. Respond (pick up phone, log issue)
2. Restore (software only - temporary fix/patch)
3. Resolve (full fix)
Divided this way, it is easier then to determine the KPIs that you measure against.
KPIs need to be sensible, reasonable, and apply to your customer's situaiton. there is no point in trying to impose something that is practically irrelevant or unworkable. Instead, you need to agree with your customer at the outset what the KPIs are, and what the failure trigger points are on the SLA.
Often, we find oursevelves in a situation where we grade the level of failure: minor, major critical.
You then need to establish how often SLA failures are measured, and whose responsibility it is for highlighting those failuers.
The world of telecomms relies on a service credits system - essentially, SLA failures equal money off the next bill.
We then consider (since all my SLAs are external) at what point we get the right to terminate the contract. We usually operate on a material and persistent breach basis (material = so bad we want to walk away now; persistent = not so bad as a one off, but enough of them mean we're p*ssed off enough to sack the supplier).
So, if the SLA is internal, then there is likely to be a budgetry alignment somewhere (otherwise why have an SLA). You could introduce a service credits (aka service level guarantee - SLG) regime where there is a re-allocaiton of budget for SLA failures. That ought to concentrate the minds of those whose incomes are performance related...