The generative AI models that power ChatGPT, Copilot Gemini, and other assistants are built with lots of training data. Now, unless you specifically opt out of collection, Microsoft will start using your interactions with GitHub Copilot as another source of this data.
GitHub, the popular Microsoft-owned coding platform, announced today that interactions with GitHub Copilot will be used to “train and improve our AI models.” GitHub Copilot Visual Studio Code is an AI code helper integrated into the GitHub website, the Copilot CLI tool (competing with Claude Code), and other services. This includes any input or output data, code snippets, comments and documentation, filenames, repository structure, and other information.
If you’ve never used GitHub Copilot, this won’t change anything. However, if you’ve used code completion in Visual Studio Code, asked a question to Copilot on the GitHub website, or used another related AI feature, your interactions and code snippets may be collected.
Importantly, automatic data collection applies to both free and paid accounts. This includes Copilot Free, Copilot Pro and Copilot Pro+ users, but not Copilot Business and Copilot Enterprise accounts.
The blog post explained that the initial AI models for GitHub Copilot were “built using a mix of public data and hand-crafted code samples” (not everyone liked it) and the company has seen positive improvements by incorporating data from Microsoft employees. Now GitHub hopes the service will improve even more with more interactions used as training data.
GitHub said in an announcement: “This approach is consistent with established industry practices and will improve model performance for all users. By participating, you’ll help our models better understand development workflows, provide more accurate and secure code sample suggestions, and help you catch potential bugs before they reach production.”
How to refuse
You can pause data collection Copilot features page In the GitHub account settings. After logging into your account, in the Privacy section there is a setting “Allow GitHub to use my data for AI model training”.
You just need to set that dropdown to “Disabled” and that’s it. If you have multiple GitHub accounts, be sure to do this for each account.
Source: GitHub Blog




