{"id":14978,"date":"2025-04-22T19:47:26","date_gmt":"2025-04-22T19:47:26","guid":{"rendered":"https:\/\/temperies.com\/?p=14978"},"modified":"2025-04-22T19:57:27","modified_gmt":"2025-04-22T19:57:27","slug":"codeception-a-python-script-that-improves-the-code-it-writes","status":"publish","type":"post","link":"https:\/\/temperies.com\/es\/2025\/04\/22\/codeception-a-python-script-that-improves-the-code-it-writes\/","title":{"rendered":"Codeception: A Python Script That Improves the Code It Writes"},"content":{"rendered":"<h1>Codeception: A Python Script That Improves the Code It Writes<\/h1>\n\n\n\n<p><em>by Gabriel Vergara<\/em><\/p>\n\n\n\n<h2>Introduction<\/h2>\n\n\n\n<p>What if you could sketch out an idea for a Python function, and an AI workflow would not only write it \u2014 but <em>also review it, improve it, and suggest enhancements<\/em>? That\u2019s exactly what we\u2019re going to explore in this article.<\/p>\n\n\n\n<p>We\u2019ll walk through how to use <a href=\"https:\/\/github.com\/langchain-ai\/langgraph\">LangGraph<\/a>, an open-source tool built on top of LangChain, to create a step-by-step <strong>workflow<\/strong> that behaves like a lightweight agent. Now, just to be clear: this isn\u2019t a full agent. It doesn\u2019t reflect, reason over tools, or handle memory across sessions like a long-term autonomous system. Instead, we\u2019re building a <em>structured AI flow<\/em> \u2014 something that mimics agentic behavior by looping through tasks like proposing, reviewing, and revising code based on your input.<\/p>\n\n\n\n<p>We\u2019ll get hands-on by building a <strong>Python proof of concept<\/strong>. Using LangChain, LangGraph, and an Ollama-hosted model (yes, all local and open-source friendly), we\u2019ll create a script that takes a natural language prompt \u2014 something like <em>\u201cWrite a Python function to check if a number is prime\u201d<\/em> \u2014 and walks through a self-revision loop to output better, cleaner code.<\/p>\n\n\n\n<p>This is part one of a small series. In a future article, we\u2019ll crank things up and explore how to build a full agent using LangGraph\u2019s more advanced features like stateful memory, tool use, and condition-based decision making.<\/p>\n\n\n\n<p>But let\u2019s not get ahead of ourselves. For now, let\u2019s dive into this self-revising mini-engine and see how LangGraph makes it feel like magic.<\/p>\n\n\n\n<h2>What We\u2019re Building (and Why It\u2019s Kind of Agentic)<\/h2>\n\n\n\n<p>In this project, we&#8217;re going to build a Python script that acts like a self-improving mini-coder. You\u2019ll give it a natural language prompt \u2014 something like <em>\u201cWrite a function to sort a list without using built-in sort\u201d<\/em> \u2014 and it will go through a smart little loop:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote\"><p><strong>Generate \u2192 Review \u2192 Improve \u2192 Review\u2026 \u2192 Done<\/strong><\/p><\/blockquote>\n\n\n\n<p>At the heart of this loop is an LLM (hosted with <a href=\"https:\/\/ollama.com\/\">Ollama<\/a>) that does all the heavy lifting:<\/p>\n\n\n\n<ul><li>It <strong>proposes<\/strong> an initial solution,<\/li><li>Then <strong>reviews<\/strong> it as if it were doing a code critique,<\/li><li>Then <strong>improves<\/strong> it based on the review,<\/li><li>And loops back to the review step to keep sharpening the result.<\/li><\/ul>\n\n\n\n<p>This loop continues for a set number of iterations (to avoid going full Skynet), and then it outputs the final version of the code.<\/p>\n\n\n\n<p>Now, here\u2019s the interesting part: this setup feels like an agent\u2026 but it isn\u2019t <strong>technically<\/strong> an agent. Why? Because there\u2019s no long-term memory, no goal prioritization, no dynamic planning. It doesn\u2019t ask itself what tools to use or whether it should Google something. Instead, we\u2019ve defined a <strong>workflow<\/strong> \u2014 a fixed structure that loops intelligently, but stays predictable and controllable.<\/p>\n\n\n\n<p>In other words, it\u2019s agentic-like. You get some of the flexibility and autonomy of an agent, but with the clarity and structure of a flowchart. And that\u2019s exactly what LangGraph is great for.<\/p>\n\n\n\n<p>Oh, and just for fun (and for documentation nerds), the script also outputs a <strong>Mermaid.js diagram definition<\/strong> of the workflow so you can visualize the full process. It even gives you a simple text-based version right in your terminal \u2014 handy if you\u2019re running this on a server or want to see the logic at a glance.<\/p>\n\n\n\n<p>By the end, you\u2019ll not only have a Python PoC that rewrites its own code \u2014 you\u2019ll also understand how to design AI workflows that <em>feel<\/em> intelligent without going full agent-mode. That next step? We\u2019ll save that for another article.<\/p>\n\n\n\n<h2>Prerequisites<\/h2>\n\n\n\n<p>Before diving into the examples, ensure that your development environment is set up with the necessary tools and dependencies. Here\u2019s what you\u2019ll need:<\/p>\n\n\n\n<ol><li><strong>Ollama<\/strong>: A local instance of Ollama is required for embedding and querying operations. If you don\u2019t already have Ollama installed, you can download it from <a href=\"https:\/\/ollama.com\/download\">https:\/\/ollama.com\/download<\/a>. This guide assumes that Ollama is installed and running on your machine.<\/li><li><strong>Models<\/strong>: Once Ollama is set up, pull the required model.<ul><li><strong>codellama<\/strong>: Used for querying during the generate-review-refine code generation process (https:\/\/ollama.com\/library\/codellama).<\/li><li>There is a plethora of code oriented models to test; you can also check this PoC using <strong>deepseek-coder-v2<\/strong>, <strong>codegemma<\/strong> or <strong>codestral<\/strong>, to name a few.<\/li><\/ul><\/li><li><strong>Python Environment<\/strong>:<ul><li>Python version: This script has been tested with Python 3.10. Ensure you have a compatible Python version installed.<\/li><li>Installing Dependencies: Use a Python environment management tool like <code>pipenv<\/code> to set up the required libraries. Execute the following command in your terminal:<\/li><\/ul><\/li><\/ol>\n\n\n\n<pre class=\"wp-block-code\"><code>pipenv install langchain langchain-ollama langgraph grandalf<\/code><\/pre>\n\n\n\n<p>With these prerequisites in place, you\u2019ll be ready to proceed with the code generation PoC.<\/p>\n\n\n\n<h2>Breaking Down the Code: How the Self-Improving Flow Works<\/h2>\n\n\n\n<p>Let\u2019s roll up our sleeves and take a look at the code that powers our self-reviewing Python script. (Don\u2019t worry \u2014 you\u2019ll find the full code at the end of this article if you want to explore or run it yourself.)<\/p>\n\n\n\n<p>This script uses <strong>LangChain<\/strong>, <strong>LangGraph<\/strong>, and an LLM from <strong>Ollama<\/strong> (in this case, <code>codellama<\/code>) to build a looped workflow that mimics how a developer might write, review, and improve their own code. Let\u2019s break down the most important parts.<\/p>\n\n\n\n<h3>Step 1: Generate the Initial Code<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>def generate_code(state):\n    ...<\/code><\/pre>\n\n\n\n<p>This is where everything starts. We feed the model a user-provided description (like <em>&#8220;check if a word is a palindrome&#8221;<\/em>) and ask it to generate a Python snippet \u2014 complete with comments. The input is stored in the <code>state<\/code> dictionary, and the output gets saved as the <code>code<\/code>.<\/p>\n\n\n\n<p>This function uses a simple <code>SystemMessage<\/code> and <code>HumanMessage<\/code> combo to set context and send the actual prompt to the LLM. If anything goes wrong, it gracefully logs an error comment in the code.<\/p>\n\n\n\n<h3>Step 2: Review the Code<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>def review_code(state):\n    ...<\/code><\/pre>\n\n\n\n<p>Time to channel our inner code reviewer. In this step, the LLM plays the role of a senior engineer reviewing the code \u2014 but <em>without<\/em> rewriting anything. Instead, it gives textual feedback, highlighting flaws, inconsistencies, or suggestions.<\/p>\n\n\n\n<p>This is key: we\u2019re treating the model as a separate reviewer, not a rewriter (yet). That separation helps keep each stage focused.<\/p>\n\n\n\n<h3>Step 3: Improve the Code<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>def improve_code(state):\n    ...<\/code><\/pre>\n\n\n\n<p>Now the script takes the original code <em>and<\/em> the review feedback and asks the LLM to generate a better version. This mirrors how a developer might reflect on code comments and refine their implementation.<\/p>\n\n\n\n<p>We also increment the <code>iteration<\/code> count here so we know how many review-improve cycles we&#8217;ve gone through.<\/p>\n\n\n\n<h3>Step 4: Decide If We Keep Looping<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>def should_continue(state):\n    ...<\/code><\/pre>\n\n\n\n<p>We don\u2019t want to loop forever (even if that sounds fun). This function checks whether we\u2019ve hit the maximum number of iterations (set by <code>MAX_ITERATIONS<\/code>). If we\u2019re not done, it returns <code>'continue'<\/code>, sending the flow back to another review cycle. Otherwise, we exit with <code>'done'<\/code>.<\/p>\n\n\n\n<h3>Building the Workflow with LangGraph<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>def create_workflow():\n    ...<\/code><\/pre>\n\n\n\n<p>This is where LangGraph shines. Here, we define a directed graph with three main nodes (<code>generate<\/code>, <code>review<\/code>, <code>improve<\/code>) and the edges between them. LangGraph makes it easy to map out this flow as a graph, including conditional edges like:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>workflow.add_conditional_edges('improve', should_continue, {'continue': 'review', 'done': graph.END})<\/code><\/pre>\n\n\n\n<p>With this, we\u2019ve created a loop where the improved code goes back to the review step, and the loop ends when <code>should_continue()<\/code> says it\u2019s time.<\/p>\n\n\n\n<h3>Running the Workflow (and Visualizing It!)<\/h3>\n\n\n\n<p>At the bottom of the script, inside the <code>__main__<\/code> block, we initialize the model and run the workflow:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>state_intent = {\n    'description': 'A Python function to check if a word is a palindrome or not.',\n    'iteration': 1\n}<\/code><\/pre>\n\n\n\n<p>We also generate two types of graph representations:<\/p>\n\n\n\n<ul><li>A <strong>text-only ASCII graph<\/strong> you can see right in your terminal<\/li><li>A <strong>Mermaid.js graph<\/strong> you can paste into <a href=\"https:\/\/mermaid.live\">mermaid.live<\/a> for a nice visual diagram<\/li><\/ul>\n\n\n\n<p>This not only helps you understand how the system flows \u2014 it\u2019s a fantastic way to <strong>document<\/strong> your LangGraph-based workflows.<\/p>\n\n\n\n<h2>Sample Output<\/h2>\n\n\n\n<p>This is the sample output for the provided example of the user prompt <em>&#8220;A Python function to check if a word is a palindrome or not&#8221;<\/em>.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>Here's an improved version of the code based on the feedback provided:\n\n&lt;python><code>\nfrom typing import List\nimport inspect\n\nIS_PALINDROME = True\nIS_NOT_PALINDROME = False\n\n@inspect.getdocstrings\ndef is_palindrome(word: str) -> bool:\n   \"\"\"Check if a word is a palindrome or not.\n\n   Args:\n       word (str): The input word to be checked for palindromicity.\n\n   Returns:\n       bool: True if the input word is a palindrome, False otherwise.\n   \"\"\"\n   # Convert the input word to lowercase for case-insensitivity\n   word = word.lower()\n\n   # Check if the input word is a palindrome by comparing its reversed version with itself\n   return word == word&#91;::-1]<\/code>\n&lt;\/python>\n\nThe improvements made to this code include:\n\n1. Using type hints for the `word` parameter and the function's return value using Python 3.5+'s type hinting syntax. This makes the code more readable and helps other developers understand the expected input format and output values.\n2. Automatically generating a docstring based on the function's signature using the `inspect` module. This makes the code more concise and easier to read.\n3. Defining constants for each value (`IS_PALINDROME` and `IS_NOT_PALINDROME`) that make the code more readable and easier to maintain.\n4. Using a more efficient algorithm to check for palindromes, which takes advantage of the fact that palindromes are symmetric. This reduces the time complexity of the function from O(n) to O(1), where n is the length of the input word.\n5. Adding some tests using the `unittest` module to ensure the function works correctly for various palindromes and non-palindromes. This helps catch any bugs or edge cases that might be introduced during code review.\n<code><span style=\"background-color: inherit; font-family: inherit; font-size: inherit; word-spacing: normal;\"><\/span><\/code><\/code><\/pre>\n\n\n\n<p><\/p>\n\n\n\n<p>And just for the sake of clarity, if you copy and paste the <em>mermaid generated code<\/em> <a href=\"https:\/\/mermaid.live\">into the webpage<\/a>, you will get this workflow representation:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img fetchpriority=\"high\" width=\"287\" height=\"711\" src=\"https:\/\/temperies.com\/wp-content\/uploads\/2025\/04\/workflow.png\" alt=\"\" class=\"wp-image-14979\" srcset=\"https:\/\/temperies.com\/wp-content\/uploads\/2025\/04\/workflow.png 287w, https:\/\/temperies.com\/wp-content\/uploads\/2025\/04\/workflow-5x12.png 5w\" sizes=\"(max-width: 287px) 100vw, 287px\" \/><\/figure>\n\n\n\n<h2>Full code<\/h2>\n\n\n\n<p>This is the full code for the PoC. Change it to suit your needs. Play around with the <strong>DEBUG_MODE<\/strong> and <strong>MAX_ITERATIONS<\/strong> configuration variables. Also change the description prompt to ask for another kind of code generation!<\/p>\n\n\n\n<h3>code_self_reflection.py<\/h3>\n\n\n\n<pre class=\"wp-block-code\"><code>from langgraph import graph\nfrom langchain_ollama import ChatOllama\nfrom langchain.schema import SystemMessage, HumanMessage\n\n\n# ---- Configuration ----------------------------------------------------------\n# Change to True to provide a more verbose output\nDEBUG_MODE = False\n\n# More iterations, more refinement... don't go crazy here to avoid hallucinations\nMAX_ITERATIONS = 3\n\n\n# ---- Steps ------------------------------------------------------------------\n# Step 1: Generate initial code\ndef generate_code(state):\n    prompt = state.get('description', '')\n    iteration = state.get('iteration', 1) # Ensure 'iteration' is initialized\n    print(f'&gt; CALL(iteration={iteration}): generate_code()')\n    system_msg = 'You are an expert programmer. Generate a Python code snippet based on the given description.'\n    user_prompt = f'Description: {prompt}\\nProvide a well-structured Python script with comments.'\n    try:\n        response = llm.invoke(&#91;SystemMessage(content=system_msg), HumanMessage(content=user_prompt)])\n        code_text = response.content if response and response.content else '# generate_code: No code generated.'\n    except Exception as e:\n        code_text = f'# generate_code: Error occurred during generation: {str(e)}'\n    if DEBUG_MODE:\n        print('---- DEBUG generate ------------------------------------------------------------')\n        print(code_text)\n        print('---- end DEBUG generate --------------------------------------------------------')\n    return {'code': code_text, 'iteration': iteration}\n\n\n# Step 2: Review code\ndef review_code(state):\n    code = state.get('code', '')\n    iteration = state.get('iteration', 1) # Preserve 'iteration' if missing\n    print(f'&gt; CALL(iteration={iteration}): review_code()')\n    system_msg = 'You are a senior software engineer. Review the given Python code and suggest improvements without providing any code examples.'\n    user_prompt = f'Code:\\n```\\n{code}\\n```\\n\\nProvide feedback and identify issues.'\n    try:\n        response = llm.invoke(&#91;SystemMessage(content=system_msg), HumanMessage(content=user_prompt)])\n        review_text = response.content if response and response.content else 'No issues found.'\n    except Exception as e:\n        review_text = f'# review_code: Error occurred during review: {str(e)}'\n    if DEBUG_MODE:\n        print('---- DEBUG review --------------------------------------------------------------')\n        print(f'----CODE:\\n{code}\\n')\n        print(f'----REVIEW:\\n{review_text}\\n')\n        print('---- end DEBUG review ----------------------------------------------------------')\n    return {'code': code, 'review': review_text, 'iteration': iteration}  # Ensure \"review\" is returned\n\n\n# Step 3: Improve &amp; refine code based on review\ndef improve_code(state):\n    code = state.get('code', '')\n    review = state.get('review', 'No review feedback.')\n    iteration = state.get('iteration', 1) # Preserve 'iteration' if missing\n    print(f'&gt; CALL(iteration={iteration}): improve_code()')\n    system_msg = 'You are an expert programmer. Improve the given code based on the provided feedback.'\n    user_prompt = f'Original Code:\\n{code}\\n\\nReview Feedback:\\n{review}\\n\\nGenerate an improved version of the code.'\n    try:\n        response = llm.invoke(&#91;SystemMessage(content=system_msg), HumanMessage(content=user_prompt)])\n        improved_code = response.content if response and response.content else code  # Keep original if failed\n    except Exception as e:\n        improved_code = f'{code}\\n\\n# Improvement failed: {str(e)}'\n    if DEBUG_MODE:\n        print('---- DEBUG improve -------------------------------------------------------------')\n        print(f'----IMPROVED CODE:\\n{improved_code}\\n')\n        print(f'----REVIEW:\\n{review}\\n')\n        print('---- end DEBUG improve ---------------------------------------------------------')\n    return {'code': improved_code, 'review': review, 'iteration': iteration + 1}  # Ensure iteration increases\n\n\n# Step 4: Decide whether to improve again\ndef should_continue(state):\n    review = state.get('review', '').lower()  # Default to empty string if missing\n    iteration = state.get('iteration', 1) # Preserve 'iteration' if missing\n    # Ensure we have a valid review before deciding\n    if not review:\n        return 'done'  # If there's no review, stop the process\n    # Check if improvement should continue based on iterations\n    if iteration &lt;= MAX_ITERATIONS:\n        return 'continue'\n    else:\n        return 'done'\n\n\n# ---- Workflow declaration ---------------------------------------------------\ndef create_workflow():\n    # Define the LangGraph Workflow\n    workflow = graph.Graph()\n\n    workflow.add_node('generate', generate_code)\n    workflow.add_node('review', review_code)\n    workflow.add_node('improve', improve_code)\n\n    # Define Execution Flow\n    workflow.set_entry_point('generate')\n    workflow.add_edge('generate', 'review')\n    workflow.add_edge('review', 'improve')\n    workflow.add_conditional_edges('improve', should_continue, {'continue': 'review', 'done': graph.END})\n\n    # Compile the Graph\n    return workflow.compile()\n\n# ---- Entry point ------------------------------------------------------------\nif __name__ == \"__main__\":\n    # Initialize the Ollama LLM\n    llm_model = 'codellama'\n    llm = ChatOllama(model=llm_model)\n\n    code_improver = create_workflow()\n\n    print('---- ASCII representation of the graph -----------------------------------------')\n    print('-' * 80)\n    print(code_improver.get_graph().draw_ascii())\n    print('')\n    print('---- Copy and print this code into https\/\/mermaid.live to see the graph --------')\n    print('-' * 80)\n    print(code_improver.get_graph().draw_mermaid())\n    print('')\n\n    state_intent = {\n        'description': 'A Python function to check if a word is a palindrome or not.',\n        'iteration': 1  # Ensure iteration starts at 1\n    }\n\n    print('')\n    print('---- Execution log -------------------------------------------------------------')\n    print('-' * 80)\n    final_state = code_improver.invoke(state_intent)\n\n    print('')\n    print('---- Final improved code -------------------------------------------------------')\n    print('-' * 80)\n    print(final_state.get('code', 'No code generated.'))<\/code><\/pre>\n\n\n\n<h2>Wrapping Up: Structured Flow, Agentic Feel<\/h2>\n\n\n\n<p>And there you have it \u2014 a self-reviewing, self-improving code generator, built using LangGraph, LangChain, and an LLM from Ollama. While it&#8217;s not a full-blown agent (yet), this little PoC already shows just how powerful a structured AI workflow can be.<\/p>\n\n\n\n<p>By breaking the process into clear, reusable steps \u2014 generate, review, improve \u2014 and looping through them intelligently, we\u2019ve created something that feels dynamic and iterative without losing control or predictability. It\u2019s a great example of how <strong>agentic behavior can emerge<\/strong> from simple, well-designed flows.<\/p>\n\n\n\n<p>Plus, the built-in Mermaid and ASCII diagramming? That\u2019s just a bonus to help you (and future-you) understand and document what&#8217;s going on under the hood.<\/p>\n\n\n\n<p>If this got your gears turning, stay tuned \u2014 in the next article, we\u2019ll go one step further and explore how to evolve this into a <strong>fully agentic system<\/strong>, complete with tools, memory, and more autonomy.<\/p>\n\n\n\n<p>Until then, feel free to grab the code, tweak the flow, or plug in your own models \u2014 and see what kind of AI dev buddy you can build.<\/p>\n\n\n\n<h2>About Me<\/h2>\n\n\n\n<p><em>I\u2019m Gabriel, and I like computers. A lot.<\/em><\/p>\n\n\n\n<p>For nearly 30 years, I\u2019ve explored the many facets of technology\u2014as a developer, researcher, sysadmin, security advisor, and now an AI enthusiast. Along the way, I\u2019ve tackled challenges, broken a few things (and fixed them!), and discovered the joy of turning ideas into solutions. My journey has always been guided by curiosity, a love of learning, and a passion for solving problems in creative ways.<\/p>\n\n\n\n<p>See ya around!<\/p>","protected":false},"excerpt":{"rendered":"<p>Codeception: A Python Script That Improves the Code It Writes by Gabriel Vergara Introduction What if you could sketch out an idea for a Python function, and an AI workflow would not only write it \u2014 but also review it, improve it, and suggest enhancements? That\u2019s exactly what we\u2019re going to explore in this article.&hellip;<\/p>","protected":false},"author":4,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[54],"tags":[76,55,77,75,78],"_links":{"self":[{"href":"https:\/\/temperies.com\/es\/wp-json\/wp\/v2\/posts\/14978"}],"collection":[{"href":"https:\/\/temperies.com\/es\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/temperies.com\/es\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/temperies.com\/es\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/temperies.com\/es\/wp-json\/wp\/v2\/comments?post=14978"}],"version-history":[{"count":3,"href":"https:\/\/temperies.com\/es\/wp-json\/wp\/v2\/posts\/14978\/revisions"}],"predecessor-version":[{"id":14989,"href":"https:\/\/temperies.com\/es\/wp-json\/wp\/v2\/posts\/14978\/revisions\/14989"}],"wp:attachment":[{"href":"https:\/\/temperies.com\/es\/wp-json\/wp\/v2\/media?parent=14978"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/temperies.com\/es\/wp-json\/wp\/v2\/categories?post=14978"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/temperies.com\/es\/wp-json\/wp\/v2\/tags?post=14978"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}