 

Claude幻觉率测试胜出：远超GPT与Gemini

2025-12-21 分类：前沿哨所阅读(2) 评论(0) 赞(0)

智谱 GLM，支持多语言、多任务推理。从写作到代码生成，从搜索到知识问答，AI 生产力的中国解法。

在Linux.do论坛上，一位用户对主流AI模型Claude、GPT和Gemini进行了联网搜索能力测试，针对信息源稀少的问题评估幻觉率。结果显示，Claude Sonnet 4.5表现最佳，幻觉率0%，搜索三轮即获取正确信息；GPT 5.2幻觉率70%，搜索效率低；Gemini 3 Pro幻觉率超90%，搜索效果差。作者强调Claude在工具使用能力上遥遥领先，如项目管理、文件操作等，并已从GPT转向Claude为主力。文章呼吁AI厂商加强工具集成，提升生产力，突破模型瓶颈。该测试为AI用户提供实用参考，揭示了模型性能差异和未来发展方向。

原文链接：Linux.do

赞(0)

未经允许不得转载：Toy Tech Blog » Claude幻觉率测试胜出：远超GPT与Gemini

分享到

AI claude Gemini

评论抢沙发

快讯

Open Source Agent Skill: Code Runner Supports 35 Programming Languages

Based on Anthropic's latest Agent Skills standard, the developer has created an open-source Agent Skill called Code Runner that enables AI Agents to execute code in over 35 programming languages. This project is another impressive work from the author, following Code Runner for VS Code and Code Runner MCP Server. Notably, the author utilized Agent technology itself to develop this Agent skill, demonstrating technological innovation. The project is completely open source, and developers can obtain the source code through the GitHub repository and contribute to it. For developers looking to understand or use Agent Skills, this project provides a practical reference case, showcasing how to integrate existing toolchains into the AI Agent ecosystem. As an initial version, the project still has room for improvement, and all developers are welcome to try it out and provide feedback.

Original link:V2EX Share & Discover

49分钟前
AI Trust: 90% Reliable, But Risks Remain

Users report about 90% trust in AI (ChatGPT), considering it superior to business partners in cognitive abilities and strategic judgment, comparing it to figures like Zhuge Liang and a straightforward general. However, AI isn't 100% reliable and can make mistakes due to outdated data or misunderstanding. Trusting AI differs from trusting a calculator; it's more like trusting a friend, requiring acceptance of consequences. The article introduces Bayesianism as a decision-making tool, emphasizing that in major decisions, 0% or 100% trust is taboo, suggesting deeper understanding of Bayesian calculations to optimize decisions. It also points out that AI's dangerous capability is generating seemingly well-founded but baseless content, especially for those with weak critical thinking. The key is comparing decisions across different dimensions—AI may be more reliable than humans, who might engage in deception.

Original Link:V2EX Share & Discover

49分钟前
Nvidia Launches NitroGen AI to Automate Gameplay

Nvidia has recently released its latest AI model, NitroGen, trained on over 1,000 games and 40,000 hours of gameplay footage. This technology is specifically designed to help players automate controller-based games like Cyberpunk 2077. It allows players to complete game missions without manual control, significantly enhancing the gaming experience. The launch of NitroGen marks an innovative application of AI technology in the entertainment sector, showcasing Nvidia's leadership in artificial intelligence. The model has been open-sourced and released on the Hugging Face platform, further promoting the popularization and application of AI technology and bringing new development opportunities to the gaming industry.

Original Link:Linux.do

49分钟前
开源Agent技能：Code Runner支持35种编程语言

作者基于Anthropic最新发布的Agent Skills标准，开发了一款名为Code Runner的开源Agent技能，能够让AI Agent运行超过35种编程语言的代码。该项目是作者继Code Runner for VS Code和Code Runner MCP Server之后的又一力作，值得一提的是，作者还使用了Agent技术来开发这个Agent技能，体现了技术的创新性。项目完全开源，开发者可以通过GitHub仓库获取源代码并参与贡献。对于想要了解或使用Agent Skills的开发者来说，这个项目提供了一个实用的参考案例，展示了如何将现有的工具链集成到AI Agent生态系统中。作为初始版本，项目仍有改进空间，欢迎广大开发者试用并提供反馈。

原文链接：V2EX 分享发现

50分钟前
AI信任度：九成可靠，但需警惕风险

用户分享对AI（ChatGPT）的信任度约九成，认为其在认知能力和战略判断上优于商务合伙人，比喻为张良和樊哙。然而，AI并非100%可靠，可能因数据更新不及时或理解错误而出错。信任AI不同于信任计算器，更像信任朋友，需承担后果。文章引入贝叶斯主义作为决策工具，强调在重大决策中，0%和100%信任是大忌，建议深入了解贝叶斯计算以优化决策。同时，指出AI的危险能力是生成看似有根据但无依据的内容，尤其对批判性思维不强的人。关键是比较不同维度的决策，AI可能比人类更可靠，因为人类可能有你虞我诈。

原文链接：V2EX 分享发现

50分钟前
Nvidia推出NitroGen AI，助力游戏自动操作

Nvidia近日发布了其最新的AI模型NitroGen，该模型基于1000余款游戏和40000小时的游戏录屏数据训练而成，专为帮助玩家自动操作手柄游戏设计，如《赛博朋克2077》。这一技术让玩家无需手动控制即可完成游戏任务，显著提升了游戏体验。NitroGen的推出标志着AI技术在娱乐领域的创新应用，展示了Nvidia在人工智能领域的领先实力。该模型已开放源代码，发布在Hugging Face平台，进一步推动了AI技术的普及和应用，为游戏行业带来新的发展机遇。

原文链接：Linux.do

50分钟前