Microsoft Copilot exposes private repositories despite legal efforts
- Repositories that were once public on GitHub and later made private remained accessible through Microsoft Copilot.
- Lasso researchers identified this issue, finding that sensitive data was exposed due to Bing's caching mechanism.
- This incident highlights persistent security vulnerabilities in the management of private information on GitHub.
In the latter half of 2024, an AI security firm named Lasso identified a significant issue with Microsoft's Copilot, whereby repositories that had once been made public on GitHub, then switched to private, remained accessible through Copilot. This pattern was confirmed to impact more than 16,000 organizations, wherein users inadvertently published sensitive data within these repositories. These revelations came after researchers from Lasso attempted to evaluate the breadth of this phenomenon, which they termed "zombie repositories," and focused on eliminating the public exposure of data that was believed to be secured after being made private. Upon further investigation, the Lasso researchers traced the cause of the issue to the way Microsoft's Bing cache indexed these repositories. Although Bing had stopped presenting the private content in its search results, Copilot still retained the indexed data, allowing users to access it. This raised concerns over the efficacy of removing confidential information from public view, especially after private repositories were eliminated from GitHub entirely. The findings of this investigation revealed that even when repositories were deleted due to legal disputes, such as a lawsuit filed by Microsoft, Copilot could still retrieve the relevant data from its cache. This dynamic presents a critical challenge for developers who rely on GitHub's repository privacy settings to safeguard sensitive information, as this mechanism does not prevent its exposure through AI tools like Copilot. The issue of retaining access to deleted or changed private repositories persists, as researchers have found Copilot repeatedly able to access cached data that human users cannot. As this pattern has emerged several times over the past decade, Microsoft faces ongoing legal implications, having incurred substantial costs after disputing the validity of software tools based on alleged breaches of multiple legal frameworks, including the Computer Fraud and Abuse Act and the Digital Millennium Copyright Act.