OpenAI has raised concerns about potential emotional reliance on the new Voice Mode feature of ChatGPT, which is powered by the updated GPT-4o model. This mode enables users to engage in more natural, real-time conversations with the chatbot. However, the company warns that some users may anthropomorphize the chatbot and start attributing human-like characteristics to it.
During the testing phase before the model’s release in June, some users displayed behaviours suggesting they were forming emotional connections with the chatbot. For instance, testers would make remarks like “This is our last day together,” indicating a perceived bond with the AI. OpenAI researchers noted that while such comments appear harmless, they emphasize the need for further research into the long-term effects of these interactions.
The company is particularly concerned that prolonged use of the Voice Mode could lead some individuals to form social relationships with the AI, potentially diminishing their need for human interaction. While this might benefit lonely users, it also poses risks. Extended interactions could influence social norms, given that the AI’s deferential nature—allowing users to interrupt at any time—contrasts with typical human conversational etiquette.
Another feature of GPT-4o is its ability to remember user information and preferences across different chats. OpenAI cautions that this could increase users’ dependence on the chatbot, potentially leading to over-reliance. OpenAI plans to continue studying these effects and is considering how the deeper integration of various features with the audio modality might affect user behaviour.
In terms of safety, OpenAI has released a detailed system card for GPT-4o, highlighting the model’s functionalities and safety measures. This card is part of OpenAI’s commitment to the Safety Summit’s foundation model framework and its own Preparedness Framework, where new models are evaluated for potential risks. GPT-4o received a “medium” risk score overall, deemed suitable for deployment after rigorous internal evaluation.
The model received a “low” risk score for cybersecurity, biological threats, and model autonomy, but a “medium” score for its persuasiveness. Testing revealed that while GPT-4o’s text generation abilities were marginally more persuasive than human-generated content in some instances, its voice features were not found to be more persuasive than human interactions.
Additionally, OpenAI tested GPT-4o for its ability to identify speakers, generate unauthorized voices, and create content that violates ChatGPT’s terms of service. The company has implemented safeguards at both the model and system levels to address these risks.
Overall, OpenAI acknowledges the potential benefits and risks of GPT-4o’s advanced features and is committed to ongoing research and safety measures to ensure responsible use of the technology.