Major IT outage affecting banks, airlines, media outlets across the world

Technology@lemmy.world – 1187 points – 4 months ago

Live: Major IT outage affecting banks, airlines, media outlets across the world

abc.net.au

All our servers and company laptops went down at pretty much the same time. Laptops have been bootlooping to blue screen of death. It's all very exciting, personally, as someone not responsible for fixing it.

Apparently caused by a bad CrowdStrike update.

Edit: now being told we (who almost all generally work from home) need to come into the office Monday as they can only apply the fix in-person. We'll see if that changes over the weekend...

Reading into the updates some more... I'm starting to think this might just destroy CloudStrike as a company altogether. Between the mountain of lawsuits almost certainly incoming and the total destruction of any public trust in the company, I don't see how they survive this. Just absolutely catastrophic on all fronts.

If all the computers stuck in boot loop can't be recovered... yeah, that's a lot of cost for a lot of businesses. Add to that all the immediate impact of missed flights and who knows what happening at the hospitals. Nightmare scenario if you're responsible for it.

This sort of thing is exactly why you push updates to groups in stages, not to everything all at once.

Looks like the laptops are able to be recovered with a bit of finagling, so fortunately they haven't bricked everything.

And yeah staged updates or even just... some testing? Not sure how this one slipped through.

Not sure how this one slipped through.

I'd bet my ass this was caused by terrible practices brought on by suits demanding more "efficient" releases.

"Why do we do so much testing before releases? Have we ever had any problems before? We're wasting so much time that I might not even be able to buy another yacht this year"

At least nothing like this happens in the airline industry

Certainly not! Or other industries for that matter. It's a good thing executives everywhere aren't just concentrating on squeezing the maximum amount of money out of their companies and funneling it to themselves and their buddies on the board.

Sure, let's "rightsize" the company by firing 20% of our workforce (but not management!) and raise prices 30%, and demand that the remaining employees maintain productivity at the level it used to be before we fucked things up. Oh and no raises for the plebs, we can't afford it. Maybe a pizza party? One slice per employee though.

1 more...

Agreed, this will probably kill them over the next few years unless they can really magic up something.

They probably don't get sued - their contracts will have indemnity clauses against exactly this kind of thing, so unless they seriously misrepresented what their product does, this probably isn't a contract breach.

If you are running crowdstrike, it's probably because you have some regulatory obligations and an auditor to appease - you aren't going to be able to just turn it off overnight, but I'm sure there are going to be some pretty awkward meetings when it comes to contract renewals in the next year, and I can't imagine them seeing much growth

Nah. This has happened with every major corporate antivirus product. Multiple times. And the top IT people advising on purchasing decisions know this.

Yep. This is just uninformed people thinking this doesn't happen. It's been happening since av was born. It's not new and this will not kill CS they're still king.

1 more...

Don't most indemnity clauses have exceptions for gross negligence? Pushing out an update this destructive without it getting caught by any quality control checks sure seems grossly negligent.

1 more...

I think you're on the nose, here. I laughed at the headline, but the more I read the more I see how fucked they are. Airlines. Industrial plants. Fucking governments. This one is big in a way that will likely get used as a case study.

The London Stock Exchange went down. They're fukd.

Yeah saw that several steel mills have been bricked by this, that's months and millions to restart

Got a link? I find it hard to believe that a process like that would stop because of a few windows machines not booting.

a few windows machines with controller application installed

That's the real kicker.

Those machines should be airgapped and no need to run Crowdstrike on them. If the process controller machines of a steel mill are connected to the internet and installing auto updates then there really is no hope for this world.

But daddy microshoft says i gotta connect the system to the internet uwu

1 more...

4 more...

5 more...

Testing in production will do that

Not everyone is fortunate enough to have a seperate testing environment, you know? Manglement has to cut cost somewhere.

Manglement is the good term lmao

27 more...

The amount of servers running Windows out there is depressing to me

The four multinational corporations I worked at were almost entirely Windows servers with the exception of vendor specific stuff running Linux. Companies REALLY want that support clause in their infrastructure agreement.

I've worked as an IT architect at various companies in my career and you can definitely get support contracts for engineering support of RHEL, Ubuntu, SUSE, etc. That isn't the issue. The issue is that there are a lot of system administrators with "15 years experience in Linux" that have no real experience in Linux. They have experience googling for guides and tutorials while having cobbled together documents of doing various things without understanding what they are really doing.

I can't tell you how many times I've seen an enterprise patch their Linux solutions (if they patched them at all with some ridiculous rubberstamped PO&AM) manually without deploying a repo and updating the repo treating it as you would a WSUS. Hell, I'm pleasantly surprised if I see them joined to a Windows domain (a few times) or an LDAP (once but they didn't have a trust with the Domain Forest or use sudoer rules...sigh).

The issue is that there are a lot of system administrators with “15 years experience in Linux” that have no real experience in Linux.

Reminds me of this guy I helped a few years ago. His name was Bob, and he was a sysadmin at a predominantly Windows company. The software I was supporting, however, only ran on Linux. So since Bob had been a UNIX admin back in the 80s they picked him to install the software.

But it had been 30 years since he ever touched a CLI. Every time I got on a call with him, I'd have to give him every keystroke one by one, all while listening to him complain about how much he hated it. After three or four calls I just gave up and used the screenshare to do everything myself.

AFAIK he's still the only Linux "sysadmin" there.

"googling answers", I feel personally violated.

To be fare, there is not reason to memorize things that you need once or twice. Google is tool, and good for Linux issues. Why debug some issue for few hours, if you can Google resolution in minutes.

3 more...

Companies REALLY want that support clause in their infrastructure agreement.

RedHat, Ubuntu, SUSE - they all exist on support contracts.

3 more...

I dunno, but doesn't like a quarter of the internet kinda run on Azure?

And 60% of Azure is running Linux lol

so 40% of azure crashes a quarter of the internet...

I guess Spotify was running on the other 40%, as many other services

1 more...

I've had my PC shut down for updates three times now, while using it as a Jellyfin server from another room. And I've only been using it for this purpose for six months or so.

I can't imagine running anything critical on it.

Windows server, the OS, runs differently from desktop windows. So if you're using desktop windows and expecting it to run like a server, well, that's on you. However, I ran windows server 2016 and then 2019 for quite a few years just doing general homelab stuff and it is really a pain compared to Linux which I switched to on my server about a year ago. Server stuff is just way easier on Linux in my experience.

It doesn't have to, though. Linux manages to do both just fine, with relatively minor compromises.

Expecting an OS to handle keeping software running is not a big ask.

10 more...

Not judging, but why wouldn't you run Linux for a server?

4 more...

18 more...

Where did you think Microsoft was getting all (hyperbole) of their money from?

23 more...

>Make a kernel-level antivirus
>Make it proprietary
>Don't test updates... for some reason??

I mean I know it's easy to be critical but this was my exact thought, how the hell didn't they catch this in testing?

I have had numerous managers tell me there was no time for QA in my storied career. Or documentation. Or backups. Or redundancy. And so on.

Move fast and break things! We need things NOW NOW NOW!

Just always make sure you have some evidence of them telling you to skip these.

There's a reason I still use lots of email in the age of IM. Permanent records, please. I will email a record of in person convos or chats on stuff like this. I do it politely and professionally, but I do it.

A lot of people really need to get into the habit of doing this.

"Per our phone conversation earlier, my understanding is that you would like me to deploy the new update without any QA testing. As this may potentially create significant risks for our customers, I just want to confirm that I have correctly understood your instructions before proceeding."

If they try to call you back and give the instruction over the phone, then just be polite and request that they reply to your email with their confirmation. If they refuse, say "Respectfully, if you don't feel comfortable giving me this direction in writing, then I don't feel comfortable doing it," and then resend your email but this time loop in HR and legal (if you've ever actually reached this point, it's basically down to either them getting rightfully dismissed, or you getting wrongfully dismissed, with receipts).

2 more...

4 more...

Push that into the technical debt. Then afterwards never pay off the technical debt

4 more...

Completely justified reaction. A lot of the time tech companies and IT staff get shit for stuff that, in practice, can be really hard to detect before it happens. There are all kinds of issues that can arise in production that you just can't test for.

But this... This has no justification. A issue this immediate, this widespread, would have instantly been caught with even the most basic of testing. The fact that it wasn't raises massive questions about the safety and security of Crowdstrike's internal processes.

I think when you are this big you need to roll out any updates slowly. Checking along the way they all is good.

The failure here is much more fundamental than that. This isn't a "no way we could have found this before we went to prod" issue, this is a "five minutes in the lab would have picked it up" issue. We're not talking about some kind of "Doesn't print on Tuesdays" kind of problem that's hard to reproduce or depends on conditions that are hard to replicate in internal testing, which is normally how this sort of thing escapes containment. In this case the entire repro is "Step 1: Push update to any Windows machine. Step 2: THERE IS NO STEP 2"

There's absolutely no reason this should ever have affected even one single computer outside of Crowdstrike's test environment, with or without a staged rollout.

God damn this is worse than I thought.. This raises further questions... Was there a NO testing at all??

1 more...

My guess is they did testing but the build they tested was not the build released to customers. That could have been because of poor deployment and testing practices, or it could have been malicious.

Such software would be a juicy target for bad actors.

1 more...

2 more...

most basic of testing

"I ran the update and now shit's proper fucked"

That would have been sufficient to notice this update's borked

8 more...

15 more...

You left out

>Pushed a new release on a Friday

You left out > Profit

Oh... Wait...Hang on a sec.

17 more...

never do updates on a Friday.

20 more...

Yeah my plans of going to sleep last night were thoroughly dashed as every single windows server across every datacenter I manage between two countries all cried out at the same time lmao

I always wondered who even used windows server given how marginal its marketshare is. Now i know from the news.

Marginal? You must be joking. A vast amount of servers run on Windows Server. Where I work alone we have several hundred and many companies have a similar setup. Statista put the Windows Server OS market share over 70% in 2019. While I find it hard to believe it would be that high, it does clearly indicate it's most certainly not a marginal percentage.

I'm not getting an account on Statista, and I agree that its marketshare isn't "marginal" in practice, but something is up with those figures, since overwhelmingly internet hosted services are on top of Linux. Internal servers may be a bit different, but "servers" I'd expect to count internet servers...

Most servers aren't Internet-facing.

1 more...

It's stated in the synopsis, below where it says you need to pay for the article. Anyway, it might be true as the hosting servers themselves often host up to hundreds of Windows machines. But it really depends on what is measured and the method used, which we don't know because who the hell has a statista account anyway.

2 more...

Well, I've seen some, but they usually don't have automatic updates and generally do not have access to the Internet.

This is a crowdstrike issue specifically related to the falcon sensor. Happens to affect only windows hosts.

It's only marginal for running custom code. Every large organization has at least a few of them running important out-of-the-box services.

Not too long ago, a lot of Customer Relationship Management (CRM) software ran on MS SQL Server. Businesses made significant investments in software and training, and some of them don't have the technical, financial, or logistical resources to adapt - momentum keeps them using Windows Server.

For example, small businesses that are physically located in rural areas can't use cloud based services because rural internet is too slow and unreliable. Its not quite the case that there's no amount of money you can pay for a good internet connection in rural America, but last time I looked into it, Verizon wanted to charge me $20,000 per mile to run a fiber optic cable from the nearest town to my client's farm.

8 more...

How many coffee cups have you drank in the last 12 hours?

I work in a data center

I lost count

What was Dracula doing in your data centre?

Because he's Dracula. He's twelve million years old.

THE WORMS

1 more...

I work in a datacenter, but no Windows. I slept so well.

Though a couple years back some ransomware that also impacted Linux ran through, but I got to sleep well because it only bit people with easily guessed root passwords. It bit a lot of other departments at the company though.

This time even the Windows folks were spared, because CrowdStrike wasn't the solution they infested themselves with (they use other providers, who I fully expect to screw up the same way one day).

1 more...

2 more...

Did you feel a great disturbance in the force?

1 more...

How's it going, Obi-Wan?

11 more...

Here's the fix: (or rather workaround, released by CrowdStrike) 1)Boot to safe mode/recovery 2)Go to C:\Windows\System32\drivers\CrowdStrike 3)Delete the file matching "C-00000291*.sys" 4)Boot the system normally

It's disappointing that the fix is so easy to perform and yet it'll almost certainly keep a lot of infrastructure down for hours because a majority of people seem too scared to try to fix anything on their own machine (or aren't trusted to so they can't even if they know how)

They also gotta get the fix through a trusted channel and not randomly on the internet. (No offense to the person that gave the info, it’s maybe correct but you never know)

Yeah, and it's unknown if CS is active after the workaround or not (source: hackernews commentator)

True, but knowing what the fix might be means you can Google it and see what comes back. It was on StackOverflow for example, but at the time of this comment has been taken offline for moderation - whatever that means.

2 more...

This sort of fix might not be accessible to a lot of employees who don't have admin access on their company laptops, and if the laptop can't be accessed remotely by IT then the options are very limited. Trying to walk a lot of nontechnical users through this over the phone won't go very well.

Yup, that's me. We booted into safe mode, tried navigating into the CrowdStrike folder and boom: permission denied.

Half our shit can't even boot into safe mode because it's encrypted and we don't have the keys rofl

2 more...

Might seem easy to someone with a technical background. But the last thing businesses want to be doing is telling average end users to boot into safe mode and start deleting system files.

If that started happening en masse we would quickly end up with far more problems than we started with. Plenty of users would end up deleting system32 entirely or something else equally damaging.

I do IT for some stores. My team lead briefly suggested having store managers try to do this fix. I HARD vetoed that. That's only going to do more damage.

It might not even be that. A lot of places have many servers (and even more virtual servers) running crowdstrike. Some places also seem to have it on endpoints too.

That's a lot of machines to manually fix.

1 more...

And people need to travel to remote machines to do this in person

2 more...

9 more...

I'm on a bridge still while we wait for Bitlocker recovery keys, so we can actually boot into safemode, but the Bitkocker key server is down as well...

Gonna be a nice test of proper backups and disaster recovery protocols for some organisations

Chaos Monkey test

Man, it sure would suck if you could still get to safe mode from pressing f8. Can you imagine how terrible that'd be?

You hold down Shift while restarting or booting and you get a recovery menu. I don’t know why they changed this behaviour.

That was the dumbest thing to learn this morning.

11 more...

This is going to be a Big Deal for a whole lot of people. I don't know all the companies and industries that use Crowdstrike but I might guess it will result in airline delays, banking outages, and hospital computer systems failing. Hopefully nobody gets hurt because of it.

Big chunk of New Zealands banks apparently run it, cos 3 of the big ones can't do credit card transactions right now

It was mayhem at PakNSave a bit ago.

In my experience it’s always mayhem at PakNSave.

1 more...

3 more...

4 more...

CrowdStrike: It's Friday, let's throw it over the wall to production. See you all on Monday!

^^so ^^hard ^^picking ^^which ^^meme ^^to ^^use

4 more...

We did it guys! We moved fast AND broke things!

When your push to prod on Friday causes a small but measurable drop in global GDP.

Actually, it may have helped slow climate change a little

The earth is healing 🙏

For part of today

1 more...

Definitely not small, our website is down so we can't do any business and we're a huge company. Multiply that by all the companies that are down, lost time on projects, time to get caught up once it's fixed, it'll be a huge number in the end.

GDP is typically stated by the year. One or two days lost, even if it was 100% of the GDP for those days, would still be less than 1% of GDP for the year.

3 more...

4 more...

They did it on Thursday. All of SFO was BSODed for me when I got off a plane at SFO Thursday night.

11 more...

Wow, I didn't realize CrowdStrike was widespread enough to be a single point of failure for so much infrastructure. Lot of airports and hospitals offline.

The Federal Aviation Administration (FAA) imposed the global ground stop for airlines including United, Delta, American, and Frontier.

Flights grounded in the US.

The System is Down

1 more...

Ironic. They did what they are there to protect against. Fucking up everyone's shit

Maybe centralizing everything onto one company's shoulders wasn't such a great idea after all...

Wait, monopolies are bad? This is the first I've ever heard of this concept. So much so that I actually coined the term "monopoly" just now to describe it.

Someone should invent a game, that while playing demonstrates how much monopolies suck for everyone involved (except the monopolist)

And make it so you lose friends and family over the course of the 4+ hour game. Also make a thimble to fight over, that would be dope.

Get your filthy fucking paws off my thimble!

1 more...

Crowdstrike is not a monopoly. The problem here was having a single point of failure, using a piece of software that can access the kernel and autoupdate running on every machine in the organization.

At the very least, you should stagger updates. Any change done to a business critical server should be validated first. Automatic updates are a bad idea.

Obviously, crowdstrike messed up, but so did IT departments in every organization that allowed this to happen.

You wildly underestimate most corporate IT security's obsession with pushing updates to products like this as soon as they release. They also often have the power to make such nonsense the law of the land, regardless of what best practices dictate. Maybe this incident will shed some light on how bad of an idea auto updates are and get C-levels to do something about it, but even if they do, it'll only last until the next time someone gets compromised by a flaw that was fixed in a dot-release

1 more...

5 more...

6 more...

Since when has any antivirus ever had the intent of actually protecting against viruses? The entire antivirus market is a scam.

8 more...

An offline server is a secure server!

Honestly my philosophy these days, when it comes to anything proprietary. They just can't keep their grubby little fingers off of working software.

At least this time it was an accident.

1 more...

Clownstrike

Crowdshite haha gotem

CrowdCollapse

The thought of a local computer being unable to boot because some remote server somewhere is unavailable makes me laugh and sad at the same time.

I don't think that's what's happening here. As far as I know it's an issue with a driver installed on the computers, not with anything trying to reach out to an external server. If that were the case you'd expect it to fail to boot any time you don't have an Internet connection.

Windows is bad but it's not that bad yet.

It’s just a fun coincidence that the azure outage was around the same time.

Yep, and it's harder to fix Windows VMs in Azure that are effected because you can't boot them into safe mode the same way you can with a physical machine.

Foof. Nightmare fuel.

1 more...

A remote server that you pay some serious money to that pushes a garbage driver that prevents yours from booting

Not only does it (possibly) prevent booting, but it will also bsod it first so you'll have to see how lucky you get.

Goddamn I hate crowdstrike. Between this and them fucking up and letting malware back into a system, I have nothing nice to say about them.

It's bsod on boot

And anything encrypted with bitlocker can't even go into safe mode to fix it

It doesn't consistently bsod on boot, about half of affected machines did in our environment, but all of them did experience a bsod while running. A good amount of ours just took the bad update, bsod'd and came back up.

1 more...

2 more...

Yep, stuck at the airport currently. All flights grounded. All major grocery store chains and banks also impacted. Bad day to be a crowdstrike employee!

Yep, this is the stupid timeline. Y2K happening to to the nuances of calendar systems might have sounded dumb at the time, but it doesn't now. Y2K happening because of some unknown contractor's YOLO Friday update definitely is.

https://www.theregister.com/ has a series of articles on what's going on technically.

Latest advice...

There is a faulty channel file, so not quite an update. There is a workaround...

Boot Windows into Safe Mode or WRE.
Go to C:\Windows\System32\drivers\CrowdStrike
Locate and delete file matching "C-00000291*.sys"
Boot normally.

For a second I thought this was going to say "go to C:\Windows\System32 and delete it."

I've been on the internet too long lol.

Yeah I had to do this with all my machines this morning. Worked.

Working on our units. But only works if we are able to launch command prompt from the recovery menu. Otherwise we are getting a F8 prompt and cannot start.

2 more...

My work PC is affected. Nice!

Plot twist: you're head of IT

Same! Got to log off early 😎

Dammit, hit us at 5pm on Friday in NZ

4:00PM here in Aus. Absolutely perfect for an early Friday knockoff.

1 more...

I'm so exhausted... This is madness. As a Linux user I've busy all day telling people with bricked PCs that Linux is better but there are just so many. It never ends. I think this is outage is going to keep me busy all weekend.

What are you, an apostle? Lol. This issue affects Windows, but it's not a Windows issue. It's wholly on CrowdStrike for a malformed driver update. This could happen to Linux just as easily given how CS operates. I like Linux too, but this isn't the battle.

🙄 and then everyone clapped

Yeah it's all fun and games until you actually convince someone and then you gotta explain how a bootloader works to someone who still calls their browser "Google"

A month or so ago a crowdstrike update was breaking some of our Linux vms with newer kernels. So it's not just the os.

3 more...

This isn't really a Windows vs Linux issue as far as I'm aware. It was a bad driver update made by a third party. I don't see why Linux couldn't suffer from the same kind of issue.

We should dunk on Windows for Windows specific flaws. Like how Windows won't let me reinstall a corrupted Windows Store library file because admins can't be trusted to manage Microsoft components on their own machine.

2 more...

You're comment I came looking for. You get a standing ovation or something.

5 more...

My dad needed a CT scan this evening and the local ER's system for reading the images was down. So they sent him via ambulance to a different hospital 40 miles away. Now I'm reading tonight that CrowdStrike may be to blame.

A few years ago when my org got the ask to deploy the CS agent in linux production servers and I also saw it getting deployed in thousands of windows and mac desktops all across, the first thought that came to mind was "massive single point of failure and security threat", as we were putting all the trust in a single relatively small company that will (has?) become the favorite target of all the bad actors across the planet. How long before it gets into trouble, either because if it's own doing or due to others?

I guess that we now know

No bad actors did this, and security goes in fads. Crowdstrike is king right now, just as McAfee/Trellix was in the past. If you want to run around without edr/xdr software be my guest.

If you want to run around without edr/xdr software be my guest.

I don't think anyone is saying that... But picking programs that your company has visibility into is a good idea. We use Wazuh. I get to control when updates are rolled out. It's not a massive shit show when the vendor rolls out the update globally without sufficient internal testing. I can stagger the rollout as I see fit.

1 more...

3 more...

All of the security vendors do it over enough time. McAfee used to be the king of them.

https://www.zdnet.com/article/defective-mcafee-update-causes-worldwide-meltdown-of-xp-pcs/

https://www.bleepingcomputer.com/news/security/trend-micro-antivirus-modified-windows-registry-by-mistake-how-to-fix/

https://www.techradar.com/news/microsoft-releases-fix-for-botched-windows-defender-update-but-its-still-facing-problems

3 more...

Honestly kind of excited for the company blogs to start spitting out their ~~disaster recovery~~ crisis management stories.

I mean - this is just a giant test of ~~disaster recovery~~ crisis management plans. And while there are absolutely real-world consequences to this, the fix almost seems scriptable.

If a company uses IPMI (~~Called~~ Branded AMT and sometimes vPro by Intel), and their network is intact/the devices are on their network, they ought to be able to remotely address this.
But that’s obviously predicated on them having already deployed/configured the tools.

I mean - this is just a giant test of disaster recovery plans.

Anyone who starts DR operations due to this did 0 research into the issue. For those running into the news here...

CrowdStrike Blue Screen solution

CrowdStrike blue screen of death error occurred after an update. The CrowdStrike team recommends that you follow these methods to fix the error and restore your Windows computer to normal usage.

Rename the CrowdStrike folder
Delete the “C-00000291*.sys” file in the CrowdStrike directory
Disable CSAgent service using the Registry Editor

No need to roll full backups... As they'll likely try to update again anyway and bsod again. Caching servers are a bitch...

I think we’re defining disaster differently. This is a disaster. It’s just not one that necessitates restoring from backup.

Disaster recovery is about the plan(s), not necessarily specific actions. I would hope that companies recognize rerolling the server from backup isn’t the only option for every possible problem.
I imagine CrowdStrike pulled the update, but that would be a nightmare of epic dumbness if organizations got trapped in a loop.

I think we’re defining disaster differently. This is a disaster.

I've not read a single DR document that says "research potential options". DR stuff tends to go into play AFTER you've done the research that states the system is unrecoverable. You shouldn't be rolling DR plans here in this case at all as it's recoverable.

I imagine CrowdStrike pulled the update

I also would imagine that they'd test updates before rolling them out. But we're here... I honestly don't know though. None of the systems under my control use it.

4 more...

Note this is easy enough to do if systems are booting or you dealing with a handful, but if you have hundreds of poorly managed systems, discard and do again.

1 more...

8 more...

I'm here right now just to watch it unfold in real time. Unfortunately Reddit is looking juicer on that front.

https://libreddit.northboot.xyz

IPMI (Called AMT and sometimes vPro by Intel),

IPMI is not AMT. AMT/vPro is closed protocol, right? Also people are disabling AMT, because of listed risks, which is too bad; but it's easier than properly firewalling it.

Better to just say "it lets you bring up the console remotely without windows running, so machines can be fixed by people who don't have to come into the office".

1 more...

On desktops, nobody does. Servers, yes, all the time.

3 more...

13 more...

Been at work since 5AM... finally finished deleting the C-00000291*.sys file in CrowdStrike directory.

182 machines total. Thankfully the process in of itself takes about 2-3 minutes. For virtual machines, it's a bit of a pain, at least in this org.

lmao I feel kinda bad for those companies that have 10k+ endpoints to do this to. Eff... that. Lot's of immediate short term contract hires for that, I imagine.

How do you deal with places with thousands of remote endpoints??

That's one of those situations where they need to immediately hire local contractors to those remote sites. This outage literally requires touching the equipment. lol

I'd even say, fly out each individual team member to those sites.. but even the airports are down.

Call the remote people in, deputize anyone who can work a command line, and prioritize the important stuff.

4 more...

crowdstrike sent a corrupt file with a software update for windows servers. this caused a blue screen of death on all the windows servers globally for crowdstrike clients causing that blue screen of death. even people in my company. luckily i shut off my computer at the end of the day and missed the update. It's not an OTA fix. they have to go into every data center and manually fix all the computer servers. some of these severs have encryption. I see a very big lawsuit coming...

I don't see how they can recover from that. They will get lawsuits from all around the world.

I'm never financially recovering from this. - George Kurtz

they have to go into every data center and manually fix all the computer servers.

Jesus christ, you would think that (a) the company would have safeguards in place and (b) businesses using the product would do better due diligence. Goes to show thwre are no grown ups in the room inside these massive corporations that rule every aspect of our lives.

I'm calling it now. In the future there will be some software update for your electric car, and due to some jackass, millions of cars will end up getting bricked in the middle of the road where they have to manually be rebooted.

Laid off one too many persons, finance bros taking over

4 more...

. they have to go into every data center and manually fix all the computer servers

Do they not have IPMI/BMC for the servers? Usually you can access KVM over IP and remotely power-off/power-on/reboot servers without having to physically be there. KVM over IP shows the video output of the system so you can use it to enter the UEFI, boot in safe/recovery mode, etc.

I've got IPMI on my home server and I'm just some random guy on the internet, so I'd be surprised if a data center didn't.

I’d be surprised if a data center didn’t.

Then you'd be surprised.

1 more...

4 more...

8 more...

We had a bad CrowdStrike update years ago where their network scanning portion couldn’t handle a load of DNS queries on start up. When asked how we could switch to manual updates we were told that wasn’t possible. So we had to black hole the update endpoint via our firewall, which luckily was separate from their telemetry endpoint. When we were ready to update, we’d have FW rules allowing groups to update in batches. They since changed that but a lot of companies just hand control over to them. They have both a file system and network shim so it can basically intercept **everything **

lol

too bad me posting this will bump the comment count though. maybe we should try to keep the vote count to 404

1 more...

My favourite thing has been watching sky news (UK) operate without graphics, trailers, adverts or autocue. Back to basics.

Linux and Mac just got free advertisment.

The words 'Mac' and 'free' aren't allowed in the same sentence.

Also, enterprise infrastructure running on a Mac? It’s something I’ve never heard of in over a decade and a half of working in tech. And now I’m curious. Is it a thing?

I’ve never heard of Macs running embedded systems - I think that would be a pretty crazy waste of money - but Mac OS Server was a thing for years. My college campus was all Mac in the G4 iMac days, running MacOS Server to administer the network. As far as I understand it was really solid and capable, but I guess it didn’t really fit Apples focus as their market moved from industry professionals to consumers, and they killed it.

3 more...

5 more...

Annoyingly, my laptop seems to be working perfectly.

That's the burden when you run Arch, right?

lol he said it's working

He said it’s working annoyingly.

AWS No!!!

Oh wait it's not them for once.

1 more...

One possible fix is to delete a particular file while booting in safe mode. But then they'll need to fix each system manually. My company encrypts the disks as well so it's going to be a even bigger pain (for them). I'm just happy my weekend started early.

You have ta have access to boot in safe mode too, I guess I can't on my work pc for example.

What a shitty workaround & might crowd strike burn in hell lol

Enjoy your weekend unless you are in IT

9 more...

CrowdStrike Holdings, Inc. is an American cybersecurity technology company based in Austin, Texas.

Never trust a texan

To be fair Austin is the least Texan of Texas

Is it still at all Austiney though? With all the companies moving to Austin, I wonder how much of the original Austin is left.

1 more...

2 more...

Huh. I guess this explains why the monitor outside of my flight gate tonight started BSoD looping. And may also explain why my flight was delayed by an additional hour and a half...

oh joy. can’t wait to have to fix this for all of our clients today…

You have no idea how much fun its being.

I'm so tired of all the fun....

2 more...

"Today", right. I wish you a good weekend stranger.

2 more...

My company used to use something else but after getting hacked switched to crowdstrike and now this. Hilarious clownery going on. Fingers crossed I'll be working from home for a few days before anything is fixed.

Stop running production services on M$. There is a better backend OS.

The issue was caused by a third-party vendor, though. A similar issue could have happened on other OSes too. There's relatively intrusive endpoint security systems for MacOS and Linux too.

That's the annoying thing here. Everyone, particularly Lemmy where everyone runs Linux and FOSS, thinks this is a Microsoft/Windows issue. It's not, it's a Crowdstrike issue.

More than that: it's an IT security and infrastructure admin issue. How was this 3rd party software update allowed to go out to so many systems to break them all at once with no one testing it?

3 more...

Everyone, particularly Lemmy where everyone runs Linux and FOSS, knows it is a Crowdstrike issue.

10 more...

Crowdstrike did the same to Linux servers previously.

There’s a better frontend OS

Doesn’t mean people want to go away from what they know

2 more...

12 more...

This is a better article. It's a CrowdStrike issue with an update (security software)

2 more...

I see a lot of hate ITT on kernel-level EDRs, which I wouldn't say they deserve. Sure, for your own use, an AV is sufficient and you don't need an EDR, but they make a world of difference. I work in cybersecurity doing Red Teamings, so my job is mostly about bypassing such solutions and making malware/actions within the network that avoids being detected by it as much as possible, and ever since EDRs started getting popular, my job got several leagues harder.

The advantage of EDRs in comparison to AVs is that they can catch 0-days. AV will just look for signatures, a known pieces or snippets of malware code. EDR, on the other hand, looks for sequences of actions a process does, by scanning memory, logs and hooking syscalls. So, if for example you would make an entirely custom program that allocates memory as Read-Write-Execute, then load a crypto dll, unencrypt something into such memory, and then call a thread spawn syscall to spawn a thread on another process that runs it, and EDR would correlate such actions and get suspicious, while for regular AV, the code would probably look ok. Some EDRs even watch network packets and can catch suspicious communication, such as port scanning, large data extraction, or C2 communication.

Sure, in an ideal world, you would have users that never run malware, and network that is impenetrable. But you still get at avarage few % of people running random binaries that came from phishing attempts, or around 50% people that fall for vishing attacks in your company. Having an EDR increases your chances to avoid such attack almost exponentionally, and I would say that the advantage it gives to EDRs that they are kernel-level is well worth it.

I'm not defending CrowdStrike, they did mess up to the point where I bet that the amount of damages they caused worldwide is nowhere near the amount damages all cyberattacks they prevented would cause in total. But hating on kernel-level EDRs in general isn't warranted here.

Kernel-level anti-cheat, on the other hand, can go burn in hell, and I hope that something similar will eventually happen with one of them. Fuck kernel level anti-cheats.

1 more...

Irrelevant but I keep reading "crowd strike" as "counter strike" and it's really messing with me

Think of it as ClownStrike, they will be known as a bunch of clowns after this.

1 more...

No one bother to test before deploying to all machines? Nice move.

This outage is probably costing a significant portion of Crowd strike's market cap. They're an 80 billion dollar company but this is a multibillion outage.

Someone's getting fired for this. Massive process failures like this means that it should be some high level managers or the CTO going out.

6 more...

YOLO 🚀🙈

6 more...

I was quite surprised when I heard the news. I had been working for hours on my PC without any issues. It pays off not to use Windows.

It's not a flaw with Windows causing this.

The issue is with a widely used third party security software that installs as a kernel level driver. It had an auto update that causes bluescreening moments after booting into the OS.

This same software is available for Linux and Mac, and had similar issues with specific Linux distros a month ago. It just didn't get reported on because it didn't have as wide of an impact.

9 more...

10 more...

Huh, so that's why the office couldn't order pizza last night lmfao

I picked the right week to be on PTO hahaha

This is proof you shouldn't invest everything in one technology. I won't say everyone should change to Linux because it isn't immune to this, but we need to push companies to support several OS

The issue here is kernel level applications that can brick a box. Anti viruses compete for resources, no one should run 2 at once

2 more...

So that's why my work laptop is down for the count today. I'm even getting that same error as the thumbnail picture

If these affected systems are boot looping, how will they be fixed? Reinstall?

There is a fix people have found which requires manual booting into safe mode and removal of a file causing the BSODs. No clue if/how they are going to implement a fix remotely when the affected machines can't even boot.

Probably have to go old-skool and actually be at the machine.

And hope you are not using BitLocker cause then you are screwed since BitLocker is tied to CS.

4 more...

9 more...

It is possible to edit a folder name in windows drivers. But for IT departments that could be more work than a reimage

3 more...

12 more...

Everyone is assuming it’s some intern pushing a release out accidentally or a lack of QA but Microsoft also pushed out July security updates that have been causing bsods on the 9th(?). These aren’t optional either.

What’s the likelihood that the CS file was tested on devices that hadn’t got the latest windows security update and it was an unholy union of both those things that’s caused this meltdown. The timelines do potentially line up when you consider your average agile delivery cadence.

3 more...

Apparently at work "some servers are experiencing problems". Sadly, none of the ones I need to use :(

A lot of people I work with were affected, I wasn't one of them. I had assumed it was because I put my machine to sleep yesterday (and every other day this week) and just woke it up after booting it. I assumed it was an on startup thing and that's why I didn't have it.

Our IT provider already broke EVERYTHING earlier this month when they remote installed" Nexthink Collector" which forced a 30+ minute CHKDSK on every boot for EVERYONE, until they rolled out a fix (which they were at least able to do remotely), and I didn't want to have to deal with that the week before I go in leave.

But it sounds like it even happened to running systems so now I don't know why I wasn't affected, unless it's a windows 10 only thing?

Our IT have had some grief lately, but at least they specified Intel 12th gen on our latest CAD machines, rather than 13th or 14th, so they've got at least one win.

Your computer was likely not powered on during the time window between the fucked update pushing out and when they stopped pushing it out.

1 more...

Meanwhile Kaspersky: *thinks if so incompetent people can even make antivirus at all*

don't rely on one desktop OS too much. diversity is the best.

Dont rely on corpo trash at al.

Servers on Windows? Even domain controllers can be Linux-based.

I'm a long-time Samba fan, but even I wouldn't run them as DCs in a production environment.

3 more...

Old servers. Also Crowdstrike took down Linux servers a few years ago.

3 more...

Xfinity H&I network it down so I can't watch Star Trek. I get an error msg connection failure. Other channels work though.

Interesting day

I think we're getting a lot of pictures for !pbsod@lemmy.ohaa.xyz

1 more...

I legit have never been more happy to be unemployed.

This is the best summary I could come up with:

There are reports of IT outages affecting major institutions in Australia and internationally.

The ABC is experiencing a major network outage, along with several other media outlets.

Crowd-sourced website Downdetector is listing outages for Foxtel, National Australia Bank and Bendigo Bank.

Follow our live blog as we bring you the latest updates.

The original article contains 52 words, the summary contains 52 words. Saved 0%. I'm a bot and I'm open source!

The original article contains 52 words, the summary contains 52 words. Saved 0%.

Good bot!