The new policy affirms Google’s intent to use this wealth of public information to enhance its services and develop novel products, features, and technologies. Notably, this data is instrumental in training Google’s AI models and refining products like Google Translate, Bard, and Cloud AI capabilities.
Historically, privacy policies have typically outlined how a company uses data posted on its own platforms. However, Google’s updated policy deviates from this norm, suggesting it reserves the right to harvest publicly available data from any online source, transforming the entire internet into an AI training playground.
This policy update indeed brings new and exciting privacy debates to the forefront. The general consensus has always been that anything posted in the public domain remains public. But now, it seems we need to reassess what it means to post content online. The focus shifts from who can see the information to how it can be utilized, opening the door for AI systems like Bard and ChatGPT to possibly regurgitate a version of our words in ways we could never predict or fully comprehend.
In the wake of the post-ChatGPT era, the source of data for these information-guzzling chatbots has become a pressing issue. Tech companies like Google and OpenAI have scoured vast expanses of the internet to feed their AI models, raising legal and ethical questions that would’ve seemed far-fetched just a few years ago.
Notably, Twitter and Reddit, having faced their share of challenges due to AI-related issues, have made contentious changes to their platforms. Both social media giants have locked down their API access, which was previously open to the public. This move ostensibly protects their intellectual property, but it also disrupts third-party tools that many users relied upon.
Web scraping, in the spotlight due to these changes, has become the scapegoat for several tech disasters, as exemplified by Elon Musk’s recent Twitter issues. Musk attributed a limit on the number of tweets viewable per day to measures combating “data scraping” and “system manipulation”. However, many experts suggested that these restrictions were more likely due to technical difficulties stemming from mismanagement or incompetence.