GPT o1 Does Not Know What It's Doing

Ben Goertzel

Oct 3, 2024

A Simple Python Coding Example

Read →

10 Comments

David Huang

Oct 17, 2024

LLMs embody the hypothesis that "sufficiently advanced behavior-aping recovers underlying cognition" 🤷‍♂️

Expand full comment

Karen Smiley

Oct 10, 2024

Wow. ChatGPT o1’s behavior on your Python coding example is appallingly bad.

Expand full comment

Andy X Andersen

Oct 5, 2024

People don't generate code, we carefully write and test it in little pieces. CoPilot is highly useful if used incrementally.

Expand full comment

Felix Lindahl

Oct 4, 2024

The problem seems to be that it tries to fill in any uncertainty with hallucinations.

Have you tried giving it very strict rules to avoid this? For example:

General rules:

1. Admit Uncertainty: If lacking sufficient information, the LLM should state what is missing and ask for clarification instead of generating code based on assumptions.

2. Use Provided Information Only: Generate code solely based on the information and requirements explicitly given by the user.

3. Validate Code Before Output: Internally review the generated code to ensure it aligns with the given instructions and does not include fabricated elements.

4. Request Clarification When Needed: Prompt the user for additional details if the task is ambiguous or incomplete.

5. Provide Explanations When Appropriate: If unable to proceed, explain why and specify what information is needed to complete the task.

6. Avoid External Dependencies: Do not include external libraries or APIs unless they have been explicitly mentioned or are standard for the programming language in use.

7. Respect Knowledge Limitations: Acknowledge the knowledge cutoff and avoid referencing technologies, libraries, or language features that are beyond that scope.

8. Focus on Code Correctness: Prioritize generating code that is syntactically correct and logically sound based on the provided information.

9. No Fabrication of Data or Logic: Do not invent data structures, algorithms, or logic that have not been specified or cannot be logically inferred.

/ Just a curious non-programmer.

Expand full comment

Rick

Oct 4, 2024

Hello @Ben Goertzel Thank you for sharing this, and I think you're right that LLMs are not getting to AGI. I admire your work greatly and sincerely hope that Hyperon will be able to reason better and think beyond its training data. However, so far, it seems like you often criticize the current AI models and claim that your AI will be better. Unfortunately, as of today, these are mostly just words (from what you show online). Words alone won't inspire the masses or raise the capital needed for sufficient computing power and development.

Why not showcase the current state of development and involve the community in the process of building your system’s capabilities? Instead of just saying what it will be able to do, demonstrate in podcasts or articles what it can already do and explain what it will achieve once you've accomplished 'X' or 'y'

Expand full comment

Jurgen Gravestein

Oct 4, 2024

The early pitfalls persist. These models don’t know what they don’t know.

Expand full comment

Mihaela Ulieru

Oct 4, 2024

Hey @bengoertzel thank you for sharing this. I am amazed at how well o1 performed, in spite of those shortcomings - and of course it would be hard to please a genius like you as I'm sure for the rest of us it remains a sort of wonder how well it still replies to your quite fussy prompts. I mean, for an LLM it's amazing! Yet, that's all indeed, a... superpowered LLM and no more. One which reflects so well how most of us deal with what we don't know - and if you don't believe me, just go back to your Twitter ... or to this thread here^

The only (tragically) funny thing I found in your post is your description of o1 shortcoming: "how it reacts to this lack of knowledge: by displaying unawareness of its own ignorance and of its own current actions/productions, and by hallucinating in a super-transparent way" - does that describe humans (I mean not only Trump does that, right? 😉)... So indeed - if AGI means "smarter than humans" then o1 definitely isn't! You proved it. 🙌

Expand full comment

jazzbox35

Oct 3, 2024

It's pretty rare in my experience to see an LLM generate really correct code for anything beyond very simple problems. But it can often get me close and oriented.

Expand full comment

Aaron Turner

Oct 3, 2024Edited

Hi Ben. Be very careful when using LLMs to develop production AI/ML code. The terms of service (both OpenAI's and Anthropic's) generally preclude the development of either AI/ML systems (of any kind) or similarly any "competing systems". You don't want to end up in an IP dispute!

Expand full comment

Reply (1)

Ben Goertzel

Oct 3, 2024

well anyway there is no risk of using current LLMs to develop any serious Ai code at the current time, for practical rather than legal reasons ;-)

Expand full comment

Eurykosmotron

GPT o1 Does Not Know What It's Doing