Forum search & shortcuts

IT End of World - S...
 

IT End of World - STW going strong

Posts: 3337
Full Member
 

Do any of the IT bods have a ‘plain english’ translation thing we can use to understand what you’re wittering on about.

You remember the bloke with the submarine that was kind of winging it and then it imploded?

It's the same as that, but with computers and trillions of dollars and broken transport and broken healthcare systems instead

Agree with cougar - more regulation is the likely outcome.

I work in IT. For an IT vendor. Risks are seen and tolerated chasing £. Corners get cut.


 
Posted : 19/07/2024 10:59 pm
Posts: 4418
Full Member
 

frankconway

Do any of the IT bods have a ‘plain english’ translation thing we can use to understand what you’re wittering on about.

To be fair I think @Cougar did that an hour ago?


 
Posted : 19/07/2024 11:09 pm
 zomg
Posts: 852
Free Member
 

“Well actually BSOD is Windows working exactly as intended.” Absolutely ****ing glorious! Chapeau.


 
Posted : 19/07/2024 11:26 pm
Posts: 8088
Full Member
 

I’m seeing a lot of predictable “Microsoft sucks” posts on places like Facebook.

I have seen a few posts flagging up similar stuff happened a couple of months back with a couple of Linux distros. Didnt play well with versions slightly behind and went pearshaped in a not dissimilar fashion.

Vendors like CrowdStrike operate with little to no regulation, “marking their own homework” if you will.

Its a tricky one because they need to be able to push stuff quickly to shut down zero days and who is going to regulate and mark their homework?

I would say getting sued might do the trick but Solarwind have managed to mostly defang an SEC lawsuit for their incompetent security practices.


 
Posted : 19/07/2024 11:27 pm
Posts: 7128
Full Member
 

The patch file is all zeros....

https://twitter.com/jeremyphoward/status/1814364640127922499


 
Posted : 19/07/2024 11:36 pm
Posts: 11674
Full Member
 

[img] [/img]


 
Posted : 20/07/2024 12:11 am
Posts: 13554
Free Member
 

All I’m taking away from this is stop updating stuff. Windows updates are bad enough. After each one the menus, icons and general feel get closer to a Fisher-Price My First Computer vibe.


 
Posted : 20/07/2024 12:33 am
Posts: 11605
Free Member
 

Does anyone know if TicketMaster is affected?

Your gig aside, that's one company I'd love to see take a nosedive right down the shitter.


 
Posted : 20/07/2024 12:42 am
Posts: 78655
Full Member
 

To be fair I think @Cougar did that an hour ago?

I attempted to.

“Well actually BSOD is Windows working exactly as intended.” Absolutely ****ing glorious! Chapeau.

Rewind far enough and you could BSOD a Windows box by unplugging the keyboard. Remember the days when you could have a power cut and you'd spend two days trying to recover it because it'd shat itself?  Times have changed.

Today Windows architecture is far more tightly secured, you cannot just slot in any old shit into it.  The issue in this particular case is that Falcon is essentially a rootkit, it operates at a very low level because it has to. A compromise at that level could be wildly catastrophic, so in the event of this kind of failure - ie fundamental code not being what it claims to be - Windows just stops rather than allowing who-knows-what to run amok with gay abandon. This is literally what a modern BSOD is, it's damage limitation because code is not what it claims to be.  If your car caught fire would you rather it stopped to let you get out or just carry on burning in the middle lane of the M6?

As I said, the alternative is far worse.

Its a tricky one because they need to be able to push stuff quickly to shut down zero days and who is going to regulate and mark their homework?

That my friend is a very interesting question indeed.


 
Posted : 20/07/2024 12:43 am
Posts: 78655
Full Member
 

All I’m taking away from this is stop updating stuff.

Don't, that is akin to an anti-vax argument.  Once more with feeling, "the alternative is far worse."

This incident is, in the panorama of incidents over decades, extraordinary.  Patching avoids many such incidents daily only a) they don't hit the news any more than we see headlines going "still no Polio" and b) would you rather have an outage from a mistake or by deliberate malicious intent?

The semi-recent outbreak which took out half of the NHS, the vulnerability had been patched for months but it was never applied.  This scenario is FAR more common than what we've seen today.  In fact, I'd probably go so far as to say that today has been unique.

Patch your shit.


 
Posted : 20/07/2024 12:56 am
Posts: 3139
Full Member
 

My brother is the Sky News newsreader that launches all their weekday live broadcasts at 6 am. Certainly was an interesting morning, when you become the news but don’t yet know the news as to why your systems are so broken. They drank a lot of coffee until they managed to fully get back on air at 9 am.


 
Posted : 20/07/2024 8:29 am
Posts: 860
Free Member
 

Interesting, I was absolutely oblivious to any of this happening until i saw this thread .

What's been missing I haven't missed

What got turned off I haven't turned on

What couldn't run well I always walk or cycle

I really do wish Whatever button got pressed stuck like a 13 year old reverb in a well used cotic soul and stayed off.

Even if that meant obviously losing here


 
Posted : 20/07/2024 9:12 am
Posts: 28593
Free Member
 

The problem is that lots of big companies use windows computers and this CrowdStrike software, so lots of computers all stopped working at the same time.

Genius move to push an potentially bricking update to every single client machine in one go!

Have to admit, if an enemy government wanted the ability to screw with western critical infrastructure worldwide, all they have to do is start-up a cybersecurity firm in the US and wait a bit.

Didn't the founder of Kapersky have some 'interesting' links with the KGB/FSB?


 
Posted : 20/07/2024 9:30 am
Posts: 1893
Full Member
 

Interesting, I was absolutely oblivious to any of this happening

I realised how big an issue this is when I went to the doctor's yesterday morning. Staff working from slips of paper with patients' names and DOB on, no access to medical history, couldn't prescribe or raise a referral. Basically, all they could do was have a chat and a physical examination with no follow-up.

The link to medical records is still down 24 hours later. Guessing they have a massive data validation job on their hands now, it's not as simple as getting a few BSOD Windows boxes going again.


 
Posted : 20/07/2024 9:40 am
Posts: 1681
Full Member
 

@Big-Bud most folk need this stuff at some point or another:

For our health

For family

For friends

For our jobs

Despite media reports, having a connected digital world parallel to the real one has improved health, communications, creativity and productivity. Going back would be a regression.


 
Posted : 20/07/2024 10:01 am
Posts: 2372
Full Member
 

According to the CrowdStrike blog it wasn't a code update signed or otherwise, just a config file update that caused 'logic error' making their Falcon engine bomb. These config files get released several times a day so they can quickly respond to new threats, but also promptly take down any machine online during a one hour window the other night. That explains the breadth and swiftness of the issue, and why booting into safe mode and deleting the config file fixes it.


 
Posted : 20/07/2024 10:06 am
Posts: 78655
Full Member
 

Didn’t the founder of Kapersky have some ‘interesting’ links with the KGB/FSB?

There are/were rumours.  However you slice it though, they're a Russian company.  Make what you will of having anything to do with Russia as your security provider.


 
Posted : 20/07/2024 3:41 pm
Posts: 3231
Full Member
 

it wasn’t a code update signed or otherwise, just a config file update

I find it funny that suppliers spew and customers accept this distinction as somehow more excusable


 
Posted : 20/07/2024 4:13 pm
 PJay
Posts: 5057
Free Member
 

According to the CrowdStrike blog it wasn’t a code update signed or otherwise, just a config file update that caused ‘logic error’ making their Falcon engine bomb.

I'm no coder, but aren't logic errors avoidable? Generally one checks, for example, that a value isn't zero before trying to divide by it


 
Posted : 20/07/2024 4:20 pm
Posts: 28593
Free Member
 

Sounds a bit like the logic bomb from Portal 2.


 
Posted : 20/07/2024 4:24 pm
Posts: 3279
Free Member
 

Should have put a try...catch block around the bit of code that tries to read the config file to handle the exception cleanly lol!


 
Posted : 20/07/2024 5:20 pm
Posts: 13554
Free Member
 

Don’t, that is akin to an anti-vax argument.

Where’s the laughing face emoji when you need it. It was meant in jest.

Technology is great but we have become hugely over reliant on it. Worse, we’ve left all the people with no social skills in charge of it! Every IT department should have a protocol droid like C3PO to translate things from coder/IT bod in to language the rest of us can understand.  (Inset winking emoji here). Now go, roll your hundred sided dice and hope for a +7 to reversing ****ups!


 
Posted : 20/07/2024 5:28 pm
Posts: 78655
Full Member
 

it wasn’t a code update signed or otherwise, just a config file update

.

I find it funny that suppliers spew and customers accept this distinction as somehow more excusable

In CrowdStrike parlance it was a "channel file."  But to report that would be meaningless to most people.  I was wrong earlier about it being a driver update - I'm piecing this together on holiday when I can - but the rest of the post stands, the component which uses the channel file is a system level driver.  Windows stopped procedures because it saw something it didn't like and couldn't terminate gracefully because of the level it operates..


 
Posted : 20/07/2024 6:48 pm
Posts: 34016
Full Member
 

Flights grounded,
Trains halted,
Stock exchange not trading,
Sky news off air.
Paxman and his underwear

You’ll be telling us you started a fire next…

Very well done, sir! Highly commended. [img] [/img]

At least I’ve still got my source of emojis that works… *smug picachu face*

Does anyone know if TicketMaster is affected? Trying to login and it says Email address not recognised despite it working yesterday..

Got a gig at weekend so need to access the tickets

Ticketmaster are bloody awful, but I always try to save my tickets onto my phone, just in case. If you’ve got your ticket order reference, then, depending on the venue, if you get there early a member of the venue staff should be able to print a copy of the ticket - I had a problem with mine at a Roundhouse gig, and I had paper tickets printed at the desk beforehand.


 
Posted : 20/07/2024 7:10 pm
Posts: 78655
Full Member
 

Where’s the laughing face emoji when you need it. It was meant in jest.

Doh. 😀


 
Posted : 20/07/2024 7:23 pm
Posts: 34016
Full Member
 

I can’t find the article now, but I read earlier that one of the main individuals behind Crowdstrike sold a bunch of shares worth over a million dollars a day or so before this whole mess happened, which has both raised a few eyebrows, and instigated a call for an in-depth investigation. Unsurprisingly. Apparently, the sale was set up some months ago, with a deliberate delay to avoid charges of insider trading, the date it was set to go live unfortunately fell just before this all kicked off.
Still looks iffy, though.


 
Posted : 20/07/2024 8:01 pm
Posts: 15555
Free Member
 

Where’s the laughing face emoji when you need it. It was meant in jest.

Due to technical difficulties, we now have to do manual emojis ¯\_(ツ)_/¯

😉


 
Posted : 20/07/2024 8:18 pm
Posts: 13554
Free Member
 

:’(


 
Posted : 20/07/2024 8:57 pm
Posts: 4136
Full Member
 

Our Tesla thinks it's a 30mph speed limit everywhere today until the cameras pick up an actual sign. It's normally eerily accurate.

End of days I tell thee.


 
Posted : 20/07/2024 9:33 pm
Posts: 23635
Full Member
 

It affects a different version of Windows.

have you tried closing the curtains then opening them again?


 
Posted : 20/07/2024 10:56 pm
Posts: 7128
Full Member
 

Genius move to push an potentially bricking update to every single client machine in one go!

My employer has many millions of embedded (not Windows) devices with updates of one sort or another going out pretty regularly. All of those updates go through some kind of "Canary" phase - deploy to internal alpha/beta, then to a small population, and then rollout to the entire population while monitoring various metrics. It's not rocket surgery.

Anything that ends up affecting code like bootloaders - where bricking a device is a real possibility - gets huge amounts of care taken over it - everyone's nightmare is waking up to a slack message from someone you've never met before asking you to join an urgent 2am call.

On the one hand, I do feel a lot of sympathy for whoever it was made whatever change it was that did this, and I'm sure it won't be much fun being that person, or writing the RCA.

On the other hand, they've got a huge market cap, and insane valuation so they must have huge amounts of cash sloshing around so surely they could afford to do a better job than they did, and foresee this kind of thing and defend against it?

As a wise old engineer once said to me when I was a young whippersnapper, "If it hasn't been tested, it doesn't work".


 
Posted : 20/07/2024 11:21 pm
Posts: 15555
Free Member
 

As a wise old engineer once said to me when I was a young whippersnapper, “If it hasn’t been tested, it doesn’t work”.

As someone else upthread said though...

...do you trust your security firm for a zero day fix, or do you run an multi-million pound business unpatched for 48hrs to allow for testing and hope you don't get hacked, either way it's a risk.

[URL= https://images2.imgbox.com/88/95/NR1mVS0w_o.jp g" target="_blank">https://images2.imgbox.com/88/95/NR1mVS0w_o.jp g"/> [/IMG][/URL]


 
Posted : 20/07/2024 11:25 pm
 zomg
Posts: 852
Free Member
 

48 hours? You’re doing it wrong.

edit: Ah, you’re talking about staging in the customer environment? That’s probably fair, though a smoke test could hopefully be automated and be done much quicker. Perhaps there’s now a product niche there, courtesy of engineering management at Crowdstrike who presided over a pipeline that didn’t test what they were publishing.


 
Posted : 20/07/2024 11:38 pm
Posts: 7128
Full Member
 

.


 
Posted : 20/07/2024 11:53 pm
Posts: 11674
Full Member
 

Our Tesla thinks it’s a 30mph speed limit everywhere today until the cameras pick up an actual sign. It’s normally eerily accurate.

My mg hs trophy thinks every single road is 40mph and constantly flashes red in the display, it’s quite distracting and no fix for it according to the dealer. I fixed it myself with a bit of duct tape over the flashing icon.


 
Posted : 21/07/2024 12:18 am
Posts: 78655
Full Member
 

…do you trust your security firm for a zero day fix, or do you run an multi-million pound business unpatched for 48hrs to allow for testing and hope you don’t get hacked, either way it’s a risk.

This, really.

The IoT example above is all well and good, but it's apples and oranges.  EDR/XDR is not like "normal" software.  Falcon's very raison d'etre is to respond to threats fast. How often does your lightbulb get an update?(*)  Falcon Sensor receives multiple updates every day.

If the building's on fire, do you say "well, the hosepipe is still in Alpha so we'll get to you in a couple of weeks?"  I'm increasingly of the mind that this wasn't a testing issue, it was a QA issue.

Quite what the solution is, I do not know.  But as I said at the outset, I expect will be some robust exchanges of view when it's mostly all over.  Vendors like CrowdStrike essentially mark their own homework, that surely has to change.  If this incident had been malicious rather than a big whoopsie we would be in a VERY bad place right now.

(* - probably answer: "not often enough")


 
Posted : 21/07/2024 10:39 am
geeh and geeh reacted
Posts: 1893
Full Member
 

A config file change that blue-screens the device and puts it in a boot-loop obviously would never get through CrowdStrike's testing, so IMO something has gone drastically wrong with the deployment process. Either what was distributed was not the intended update, file got corrupted somehow, human error etc.


 
Posted : 21/07/2024 10:58 am
Posts: 78655
Full Member
 

I've seen all of those posited and more.  I too find it hard to believe from CrowdStrike, but here we are.

I just tripped over this blog post, which seems to be as comprehensive and accurate technical overview as any I've found.


 
Posted : 21/07/2024 11:33 am
Posts: 7128
Full Member
 

That medium article is interesting. Sounds like they rolled out some new and broken code without testing it.

Very poor. And nothing to do with urgently needing to fix threats as soon as possible (not that that is an excuse anyway).

If the building’s on fire, do you say “well, the hosepipe is still in Alpha so we’ll get to you in a couple of weeks?

The hosepipe had not been tested on your fire. Hard to believe there even was a fire.


 
Posted : 21/07/2024 1:34 pm
Posts: 78655
Full Member
 

Hard to believe there even was a fire.

You don't have a fire brigade because there's a fire.  You have a fire brigade in case there is.


 
Posted : 21/07/2024 3:15 pm
Posts: 15555
Free Member
 

Vendors like CrowdStrike essentially mark their own homework, that surely has to change.

True, but ...cost? given the frequency of updates of this nature, Imagine clients would have to have a permenent 'Security test and release' team who's only job is to test and release security patches/AV definition files etc.. it sound like a full time job, even if its just 2 or 3 people it could easily cost £100k a year or more..

The bean counters would not like that... I have a hard enough time trying to convince clients up slightly up-spec thier VMs at sensible cost, despite...oh look thier SQL server has bombed again as it's out of RAM...again 😀

"but our environment didn't go down, so why should we pay?"

"because we had to manually fail over environment A to environment B when we were getting critical resource alerts, AGAIN!"

Maybe it could be automated as a half way house, if its just a simple 'smoke test', is the definition file in the expected/correct format, simple stuff like that, but then you'd think that would happen at crowdstrike anyway as part of the automated deployments...


 
Posted : 21/07/2024 3:25 pm
jonwe and jonwe reacted
 zomg
Posts: 852
Free Member
 

Crowdstrike could be publishing their homework along with their product. Testing isn’t a sideband activity. It is the product too.


 
Posted : 21/07/2024 3:34 pm
oldnpastit, TedC, TedC and 1 people reacted
Posts: 15555
Free Member
 

Virgin radio calling it a 'microsoft windows outage' just now...

Thats like me crashing my car into a crowded bus stop and calling it a Ford issue, FFS, lol


 
Posted : 21/07/2024 6:33 pm
Posts: 2372
Full Member
 

We know what went wrong but there’s still a question over how and why it happened. It’s almost unthinkable that some level of testing didn’t take place before making the update, so why was it inadequate? I think the clue is in CrowdStikes own blog, that these channel files are updated several times a day. This is the Falcon USP that they are responding to threats as they emerge, so the normal develop / test / release cycle is highly compressed and probably highly automated..

Instead of a phased rollout, every machine online got updated at the same time. A little over an hour after release CrowdStrike realised there is a problem and pulled the channel file but by this time 8.5 million machines have already been compromised. CrowdStike themselves seemed surprised that an issue could even occur as they state there’s not been an issue with Falcon before, so I think a combination of trying to be the fastest to respond and their own hubris created the perfect storm.

FWIW there could have been a simple failsafe – if Falcon fails after channel update, roll back that channel update, reboot, and you are back. The fact that a simple mechanism like this wasn’t considered leads me to think they didn’t believe a channel file could take down Falcon, which may have fed into a minimal testing strategy.


 
Posted : 21/07/2024 8:24 pm
verses, Jamze, verses and 1 people reacted
Posts: 1893
Full Member
 

 if Falcon fails after channel update, roll back that channel update, reboot, and you are back

Once you've caused the memory exception and blue-screened, don't think you can then have a script do something else.


 
Posted : 21/07/2024 8:50 pm
Page 3 / 4