Bluesky's public API allows third-party access to user data, raising concerns about unauthorized use, such as AI training.

An AI researcher recently collected 1 million public Bluesky posts for a machine learning dataset, highlighting this vulnerability. While the data was later removed, the incident emphasizes the public nature of Bluesky posts. Open-source language models like OLMo 2 could potentially be trained on such datasets.

Bluesky acknowledges the issue and is exploring ways for users to express consent preferences regarding external data usage. However, enforcing these preferences outside their platform remains a challenge. They are currently in discussions with legal and engineering experts to address this. This situation mirrors concerns raised by the FTC regarding smart device data usage.

Key Takeaways

  • Public Bluesky posts are accessible via API, posing risks of unauthorized use.
  • Third-party AI training on user data is a growing concern, similar to issues with Microsoft's use of Office data.
  • Bluesky is working on consent mechanisms but faces enforcement challenges.

Bluesky's increasing popularity brings increased scrutiny regarding data privacy and security, much like other major social platforms.