Giving AI coding assistants vision

Ai generation disclosure. I wanted testing easier, so i asked chatgpt and claude ai how do i give vision to cluade code, they told me and gave a script to use. I ran ollama to download llava, then i prompted cluade code with the script, to use llava as eyes to test ..

import requests
import base64

def ask_local_vision(image_path, question):
with open(image_path, ‘rb’) as f:
image_data = base64.b64encode(f.read()).decode(‘utf-8’)

response = requests.post('http://localhost:11434/api/generate',
    json={
        'model': 'llava',
        'prompt': question,
        'images': [image_data]
    })

return response.json()['response']

Code presemted jere is gen by claude code. I told it document how it used ai for visible feedback, that way it lets user -me- do stuff more productive 

On my testing for my UI system, i got tired of testing to see if all the items are rendered, so in a nushell, i needed Claude code to have vision. I gave vision to it. i had it write what i did and what it did to use ollama with the llava model for testing visual stuff. its long, it works great. hope this helps folks ..
![image|690x488](upload://f7nYShS7lLnpIMLskzQBNm80zPy.png)
 a sample image of the library..
--------------------------------- Note, cluade code is really proud of his work..lol
# AI Vision Integration Documentation

## 🤖 AI Vision System for Mojo GUI Analysis

This project includes a cutting-edge **AI vision-assisted debugging system** that uses **Ollama with LLaVA (Large Language and Vision Assistant)** to analyze GUI screenshots and provide human-like feedback about rendering, layout, and visual issues.

## 📋 Table of Contents

1. [Overview](#overview)
2. [System Architecture](#system-architecture)
3. [Installation and Setup](#installation-and-setup)
4. [Core Components](#core-components)
5. [Usage Examples](#usage-examples)
6. [API Reference](#api-reference)
7. [Advanced Usage](#advanced-usage)
8. [Troubleshooting](#troubleshooting)

## 🔍 Overview

### What is AI Vision?

The AI Vision system allows you to:
- **Take screenshots** of your MojoGUI applications automatically
- **Analyze GUI rendering** using AI that can "see" like a human
- **Get natural language feedback** about visual issues, layout problems, or rendering bugs
- **Verify GUI functionality** by asking specific questions about what's visible
- **Debug text rendering** and font issues through AI analysis

### Why Use AI Vision?

Traditional debugging shows you code errors, but **AI Vision shows you what users actually see**:
- ✅ **Human-like analysis** - AI describes what it "sees" in natural language
- ✅ **Visual verification** - Confirm that GUI elements are actually visible and correctly rendered
- ✅ **Layout debugging** - Identify spacing, alignment, and color issues
- ✅ **Cross-platform testing** - Works regardless of OpenGL drivers or graphics issues
- ✅ **Automated testing** - Verify GUI appearance programmatically

## 🏗️ System Architecture

┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ MojoGUI App │ │ Screenshot │ │ Local LLaVA │
│ │───▶│ System │───▶│ Model │
│ (Advanced │ │ │ │ │
│ Widgets) │ │ • pyautogui │ │ • Ollama Server │
└─────────────────┘ │ • PIL/ImageGrab │ │ • Vision Model │
│ • ImageMagick │ │ • API Interface │
└─────────────────┘ └─────────────────┘


┌─────────────────┐
│ AI Analysis │
│ │
│ • Natural Lang. │
│ • GUI Feedback │
│ • Issue Reports │
└─────────────────┘


## 🚀 Installation and Setup

### Step 1: Install Ollama

```bash
# Install Ollama (AI model runner)
curl -fsSL https://ollama.ai/install.sh | sh

Step 2: Download LLaVA Model

# Download the vision model (this may take several minutes)
ollama pull llava

Step 3: Start Ollama Server

# Start the local AI server
ollama serve
# Server runs on http://localhost:11434

Step 4: Install Python Dependencies

# Install screenshot and API libraries
pip install requests pillow pyautogui

Step 5: Test the Setup

# Test that everything is working
python3 ollama_vision_setup.py

:puzzle_piece: Core Components

1. ollama_vision_setup.py - Main Setup and Interface

The core class that manages Ollama and LLaVA integration:

from ollama_vision_setup import OllamaVision

# Create vision interface
vision = OllamaVision(model_name='llava')

# Setup (downloads model if needed)
if vision.setup():
    # Analyze an image
    response = vision.ask_vision("screenshot.png", "What do you see?")
    print(response)

Key methods:

  • setup() - Initialize Ollama and download LLaVA if needed
  • ask_vision(image_path, question) - Analyze image with AI
  • check_ollama_running() - Verify Ollama server status
  • test_vision() - Test the vision model

2. ai_vision_integration.py - User-Provided Pattern

Simple, direct integration pattern provided by user:

import requests
import base64

def ask_local_vision(image_path, question):
    with open(image_path, 'rb') as f:
        image_data = base64.b64encode(f.read()).decode('utf-8')
    
    response = requests.post('http://localhost:11434/api/generate',
        json={
            'model': 'llava',
            'prompt': question,
            'images': [image_data]
        })
    
    return response.json()['response']

3. ai_vision_debug.py - Complete Debug Workflow

Comprehensive debugging that:

  • Runs MojoGUI test with high-contrast colors
  • Takes screenshot automatically
  • Analyzes with multiple diagnostic questions
  • Provides detailed AI feedback

4. Vision Test Scripts

  • simple_vision_test.py - Basic screenshot and analysis
  • manual_vision_test.py - Manual testing workflow
  • full_vision_analysis.py - Comprehensive analysis

:open_book: Usage Examples

Basic Vision Analysis

from ollama_vision_setup import OllamaVision

# Setup AI vision
vision = OllamaVision()
vision.setup()

# Take screenshot (your preferred method)
# screenshot_path = take_screenshot()

# Analyze GUI
response = vision.ask_vision("gui_screenshot.png", 
    "What widgets and UI elements are visible in this GUI?")

print(f"AI sees: {response}")

GUI Debugging Workflow

# 1. Run your MojoGUI application
# mojo advanced_widgets_demo.mojo &

# 2. Take screenshot after GUI stabilizes
import time
time.sleep(3)  # Let GUI render

# 3. Analyze with specific questions
questions = [
    "Are all text labels clearly readable?",
    "Do you see any rendering issues or visual bugs?", 
    "What colors are used in the interface?",
    "Are the buttons and panels properly aligned?"
]

for question in questions:
    answer = vision.ask_vision("screenshot.png", question)
    print(f"Q: {question}")
    print(f"A: {answer}\n")

Professional GUI Verification

# Verify specific widgets are working
verification_questions = [
    "Do you see a docking panel system with left, right, and bottom panels?",
    "Are there accordion sections that can expand and collapse?",
    "Is there a toolbar with buttons like New, Open, Save, Bold, Italic?",
    "Do you see a floating panel that can be dragged?",
    "Are all text elements using professional TTF fonts?"
]

for question in verification_questions:
    response = vision.ask_vision("screenshot.png", question)
    if "yes" in response.lower():
        print(f"✅ PASS: {question}")
    else:
        print(f"❌ ISSUE: {question}")
        print(f"   AI Response: {response}")

:books: API Reference

OllamaVision Class

class OllamaVision:
    def __init__(self, model_name='llava', base_url='http://localhost:11434')
    def setup() -> bool
    def ask_vision(image_path: str, question: str) -> str
    def check_ollama_running() -> bool
    def list_models() -> list
    def download_model(model_name: str) -> bool
    def test_vision() -> bool

Core Functions

# User-provided simple pattern
def ask_local_vision(image_path: str, question: str) -> str

# Screenshot functions
def take_screenshot() -> str  # Returns path to screenshot
def take_screenshot_pyautogui() -> str
def take_screenshot_pil() -> str

Common Questions for GUI Analysis

# Layout and structure
"What widgets and UI elements are visible?"
"Describe the overall layout and organization."
"Are there any panels, toolbars, or menus?"

# Visual quality
"Are all text elements clearly readable?"
"What colors are used in the interface?"
"Do you see any rendering issues or visual bugs?"

# Specific widget verification
"Do you see buttons labeled New, Open, Save?"
"Are there any accordion sections or collapsible panels?"
"Is there a docking system with moveable panels?"

# Professional assessment  
"Does this look like a professional desktop application?"
"What desktop application does this interface remind you of?"
"Is the visual design modern and consistent?"

:wrench: Advanced Usage

Custom Screenshot Integration

def custom_screenshot_analysis():
    # Your custom screenshot method
    screenshot_path = my_screenshot_function()
    
    # Analyze with AI
    vision = OllamaVision()
    
    # Multiple analysis passes
    layout_feedback = vision.ask_vision(screenshot_path,
        "Analyze the layout and organization of this GUI interface.")
    
    color_feedback = vision.ask_vision(screenshot_path,
        "Evaluate the color scheme and visual consistency.")
    
    usability_feedback = vision.ask_vision(screenshot_path,
        "From a usability perspective, how intuitive is this interface?")
    
    return {
        'layout': layout_feedback,
        'colors': color_feedback, 
        'usability': usability_feedback
    }

Automated Testing Integration

def automated_gui_test():
    """Run automated GUI test with AI verification"""
    
    # Start your GUI application
    start_gui_application()
    
    # Wait for rendering
    time.sleep(3)
    
    # Take screenshot
    screenshot = take_screenshot()
    
    # Define test criteria
    test_cases = [
        ("Widget Visibility", "Are all expected widgets visible and properly rendered?"),
        ("Text Readability", "Is all text clear and readable?"),
        ("Layout Quality", "Is the layout professional and well-organized?"),
        ("Color Scheme", "Are colors consistent and appropriate?"),
        ("Interactive Elements", "Do buttons and controls look clickable and functional?")
    ]
    
    # Run AI analysis for each test case
    results = {}
    vision = OllamaVision()
    
    for test_name, question in test_cases:
        response = vision.ask_vision(screenshot, question)
        results[test_name] = {
            'question': question,
            'ai_response': response,
            'passed': 'yes' in response.lower() and 'no' not in response.lower()
        }
    
    return results

Integration with Existing Test Frameworks

import unittest

class AIVisionGUITests(unittest.TestCase):
    def setUp(self):
        self.vision = OllamaVision()
        self.vision.setup()
        
    def test_widget_rendering(self):
        """Test that all widgets render correctly"""
        screenshot = self.take_test_screenshot()
        response = self.vision.ask_vision(screenshot,
            "Are all GUI widgets visible and properly rendered?")
        
        self.assertIn("yes", response.lower())
        self.assertNotIn("missing", response.lower())
        
    def test_professional_appearance(self):
        """Test that GUI looks professional"""
        screenshot = self.take_test_screenshot()
        response = self.vision.ask_vision(screenshot,
            "Does this interface look professional and modern?")
            
        self.assertIn("professional", response.lower())

:hammer_and_wrench: Troubleshooting

Common Issues and Solutions

1. Ollama Not Running

# Error: Connection refused
# Solution: Start Ollama server
ollama serve

2. LLaVA Model Not Found

# Error: Model not found
# Solution: Download model
ollama pull llava

3. Screenshot Failed

# Error: No screenshot library
# Solution: Install dependencies
pip install pyautogui pillow

# For Linux, may also need:
sudo apt install gnome-screenshot

4. Image Analysis Failed

# Check image file exists and is readable
if not os.path.exists(image_path):
    print(f"Image not found: {image_path}")

# Check image format
from PIL import Image
try:
    img = Image.open(image_path)
    print(f"Image: {img.size}, {img.format}")
except Exception as e:
    print(f"Invalid image: {e}")

5. API Connection Issues

# Test Ollama connection
import requests
try:
    response = requests.get('http://localhost:11434/api/tags')
    print(f"Ollama status: {response.status_code}")
except Exception as e:
    print(f"Connection failed: {e}")

Debugging AI Vision

def debug_vision_system():
    """Debug the AI vision system step by step"""
    
    print("🔍 Debugging AI Vision System")
    
    # 1. Check Ollama
    vision = OllamaVision()
    if vision.check_ollama_running():
        print("✅ Ollama is running")
    else:
        print("❌ Ollama not running - start with: ollama serve")
        return
    
    # 2. Check model
    if vision.model_exists():
        print("✅ LLaVA model available")
    else:
        print("❌ LLaVA model missing - download with: ollama pull llava")
        return
    
    # 3. Test vision
    if vision.test_vision():
        print("✅ Vision system working")
    else:
        print("❌ Vision test failed")
        return
    
    print("🎯 AI Vision system is fully operational!")

Performance Optimization

# Optimize screenshot size for faster analysis
def optimize_screenshot(image_path, max_size=1024):
    from PIL import Image
    
    img = Image.open(image_path)
    if max(img.size) > max_size:
        # Resize while maintaining aspect ratio
        img.thumbnail((max_size, max_size), Image.Resampling.LANCZOS)
        optimized_path = image_path.replace('.png', '_optimized.png')
        img.save(optimized_path)
        return optimized_path
    return image_path

# Use optimized image for faster AI analysis
screenshot = take_screenshot()
optimized_screenshot = optimize_screenshot(screenshot)
response = vision.ask_vision(optimized_screenshot, question)

:bullseye: Best Practices

1. Effective Questions

  • Be specific about what you want to know
  • Ask one question at a time for clarity
  • Use descriptive language the AI can understand

2. Screenshot Quality

  • Ensure GUI is fully rendered before screenshot
  • Use high-contrast colors for better AI recognition
  • Avoid overlapping windows or visual clutter

3. Error Handling

  • Always check if screenshot was successful
  • Handle API timeouts gracefully
  • Validate AI responses for consistency

4. Performance

  • Resize large screenshots for faster analysis
  • Cache frequently used model responses
  • Use appropriate timeout values

:rocket: Future Enhancements

Potential improvements to the AI vision system:

  1. Multi-Model Support - Support for other vision models beyond LLaVA
  2. Automated Testing - Integration with CI/CD pipelines
  3. Visual Regression Testing - Compare screenshots over time
  4. Performance Metrics - Measure GUI rendering performance
  5. Accessibility Analysis - AI-powered accessibility auditing

:page_facing_up: Files in the AI Vision System

📁 AI Vision Files:
├── 🤖 ollama_vision_setup.py        # Main setup and interface
├── 👁️ ai_vision_debug.py           # Complete debug workflow  
├── 🔧 ai_vision_integration.py     # User-provided pattern
├── 📸 simple_ai_vision.py          # Simple screenshot analysis
├── 🧪 vision_debug.py              # Vision-assisted debugging
├── 📋 manual_vision_test.py        # Manual testing workflow
├── 🎯 full_vision_analysis.py      # Comprehensive analysis
├── 📊 ai_vision_analysis.py        # Advanced widget analysis
├── 📚 AI_VISION_DOCUMENTATION.md   # This documentation
└── 🎮 vision_test_demo.mojo        # GUI test for vision analysis

:tada: Conclusion

The AI Vision system represents a breakthrough in GUI debugging and testing. By leveraging the power of modern AI vision models like LLaVA, developers can now get human-like feedback about their graphical interfaces, identifying issues that traditional debugging might miss.

Key Benefits:

  • :magnifying_glass_tilted_left: Human-like analysis of GUI appearance and functionality
  • :rocket: Automated visual testing capabilities
  • :bullseye: Natural language feedback about rendering issues
  • :hammer_and_wrench: Easy integration with existing development workflows
  • :chart_increasing: Scalable testing for complex GUI applications

Start using AI Vision today to enhance your MojoGUI development process!

1 Like

Per Rule 6, please disclose if some or all of the code is AI generated.

1 Like

Done, and why i did, to not do repetitive tasks best for machines

What’s the question here? Please note this sort of posts that has no relation to MAX and Mojo will be closed.

Then i am sorry , i thougjt tgis would help. I will delelte