Microsoft AI Team Leaks 38TB of Company Data

Microsoft's AI team has made a terrible mistake.

The team had accidentally leaked around 40TB of Microsoft's private data, exposing company-only information to the public.

Microsoft has already rectified the issue and completed its investigation into the potential impacts of such a leak in August.

A Blunder Of Significant Proportions

A recent report from cloud security company Wiz stated that Microsoft's AI research team accidentally leaked 38TB of the company's private data. The cloud security company revealed that the team uploaded a bucket of open-source training data on GitHub consisting of open-source code and AI models for image recognition.

Although Microsoft's AI research team provided users who came across the GitHub repository with a link, the team didn't do anything to keep these materials private. The link the team provided gave visitors complete access to the entire Azure storage account.

The accident occurred because of a Microsoft Azure feature, Shared Access Signature (SAS) tokens, which allows users to share data from Azure Storage accounts. The AI research team can limit the access level to specific files only; however, in this case, the team configured the link it gave to share the entire storage account.

Because of this lapse of judgment, visitors could view everything in the repository and upload, overwrite, or delete files too. The repository contains full backups of two employees' computers containing sensitive personal data like passwords to Microsoft services, secret keys, and more than 30,000 internal Microsoft Teams messages from more than 350 Microsoft employees.

Unfortunately, the gravity of the error became more grave when Wiz discovered that the GitHub repository's contents have been exposed since 2020. While Microsoft's AI Research team set the repository's SAS token to expire on Oct. 5, 2021, it updated its expiration date to Oct. 6, 2051.

Wiz found the error and reported it to Microsoft in mid-June. Microsoft has since invalidated the SAS token, replaced it with a new one on GitHub, and completed its internal investigation of the leak's potential impact.

Microsoft stated (via Tech Crunch) that the leak didn't expose customer data and that "no other internal services were put at risk" because of it.

Wiz's Takeaway

The leak of 38TB of Microsoft's private data presents an example of "the new risks organizations face when starting to leverage the power of AI more broadly," Wiz stated in its report.

Because data scientists and engineers are racing to bring new AI solutions, the data they handle may require additional security checks and safeguards to prevent exposure or leaks.

The cloud security company advises that security teams should get more visibility into the processes of AI research and development teams to prevent similar leaks from happening in the future.

Additionally, it said that raising awareness of relevant security risks at every step of the AI development process is important. As such, security teams must work closely with the data science and research teams to ensure proper data access limits are defined and enforced.