Shadow AI - Where are the CIOs?
When I joined Hugging Face 5 years ago, it was the place to be for AI researchers. Since then, the Hub has grown into the day-to-day collaboration platform for data scientists, machine learning engineers, and now software developers building apps with AI. Today, there are over 13 million of us, AI Builders, in the house.
The Hub is now the GitHub for AI. Which brings me to Shadow AI.
I think we’re in a similar moment today for enterprises adopting AI, as they were 15 years ago adopting software and cloud. Employees are all doing it, CIOs are blind to it.
It was Shadow IT then, it is Shadow AI now.
There are over 300,000 organizations now on the Hub. For the most part, they’re created by professionals to collaborate privately on models and datasets. Doing work. And for the most part, they’re using free organizations.
The thing is, these free organizations were never meant to securely host company collaboration. They’re meant for community projects, where everyone has access to everything, and everyone is welcome.
But now, we have Fortune 500 companies with hundreds if not thousands of employees using Hugging Face every day. Using free organizations, where everyone has access to everything, and everyone is welcome…
What could go wrong? Take your pick:
- Ex employees still have (write) access to all company private repos
- A user access token is leaked on GitHub compromising all company private repos
- User pushes a dataset with customer information, without realizing it’s set as public
I could go on.
This is basic security stuff, and we have a solution for it: Hugging Face Enterprise. SSO, RBAC, Audit logs, the nine yards.
Upgrading security should feel urgent, so where are the CIOs and CISOs? It seems the scale of the problem is not well understood. Maybe some data can help…
Today, Hugging Face handles billions of requests a month. These are requests to access any of the 6+ million models, datasets and applications hosted on the platform. About half are private. In response, we serve hundreds of petabytes of data a month. It’s a lot. We serve models at the scale Netflix serves movies.
Hugging Face is not just the GitHub of AI, it’s also the Netflix of AI.
But I digress ; let’s take an example organization to make this real.
Here’s one Fortune 500 US company we will not name. It has an organization set up on Hugging Face - a free one, so no SSO, yolo anything goes - with over 2,000 members signed up.
We looked at requests coming from the known corporate network of the company, and which users they were coming from. Here’s what we found.
In the span of one week, we saw about 5 million requests coming from the company. And here’s where it gets interesting:
- 750,000 of these requests (15%) came from authenticated users using work emails
- 1.9 million of these requests (40%) came from authenticated users using non-work emails
- 2.2 million of these requests (45%) came from unauthenticated users
So 85% of the requests to Hugging Face are not going through company managed channels, nor through the official organization on Hugging Face.
There’s your Shadow AI right there.
If you’re wondering what these numbers look like for your own company, comment below and I’ll reach out privately!
