Ai generation disclosure. I wanted testing easier, so i asked chatgpt and claude ai how do i give vision to cluade code, they told me and gave a script to use. I ran ollama to download llava, then i prompted cluade code with the script, to use llava as eyes to test ..
import requests
import base64
def ask_local_vision(image_path, question):
with open(image_path, ‘rb’) as f:
image_data = base64.b64encode(f.read()).decode(‘utf-8’)
response = requests.post('http://localhost:11434/api/generate',
json={
'model': 'llava',
'prompt': question,
'images': [image_data]
})
return response.json()['response']
Code presemted jere is gen by claude code. I told it document how it used ai for visible feedback, that way it lets user -me- do stuff more productive
On my testing for my UI system, i got tired of testing to see if all the items are rendered, so in a nushell, i needed Claude code to have vision. I gave vision to it. i had it write what i did and what it did to use ollama with the llava model for testing visual stuff. its long, it works great. hope this helps folks ..

a sample image of the library..
--------------------------------- Note, cluade code is really proud of his work..lol
# AI Vision Integration Documentation
## 🤖 AI Vision System for Mojo GUI Analysis
This project includes a cutting-edge **AI vision-assisted debugging system** that uses **Ollama with LLaVA (Large Language and Vision Assistant)** to analyze GUI screenshots and provide human-like feedback about rendering, layout, and visual issues.
## 📋 Table of Contents
1. [Overview](#overview)
2. [System Architecture](#system-architecture)
3. [Installation and Setup](#installation-and-setup)
4. [Core Components](#core-components)
5. [Usage Examples](#usage-examples)
6. [API Reference](#api-reference)
7. [Advanced Usage](#advanced-usage)
8. [Troubleshooting](#troubleshooting)
## 🔍 Overview
### What is AI Vision?
The AI Vision system allows you to:
- **Take screenshots** of your MojoGUI applications automatically
- **Analyze GUI rendering** using AI that can "see" like a human
- **Get natural language feedback** about visual issues, layout problems, or rendering bugs
- **Verify GUI functionality** by asking specific questions about what's visible
- **Debug text rendering** and font issues through AI analysis
### Why Use AI Vision?
Traditional debugging shows you code errors, but **AI Vision shows you what users actually see**:
- ✅ **Human-like analysis** - AI describes what it "sees" in natural language
- ✅ **Visual verification** - Confirm that GUI elements are actually visible and correctly rendered
- ✅ **Layout debugging** - Identify spacing, alignment, and color issues
- ✅ **Cross-platform testing** - Works regardless of OpenGL drivers or graphics issues
- ✅ **Automated testing** - Verify GUI appearance programmatically
## 🏗️ System Architecture
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ MojoGUI App │ │ Screenshot │ │ Local LLaVA │
│ │───▶│ System │───▶│ Model │
│ (Advanced │ │ │ │ │
│ Widgets) │ │ • pyautogui │ │ • Ollama Server │
└─────────────────┘ │ • PIL/ImageGrab │ │ • Vision Model │
│ • ImageMagick │ │ • API Interface │
└─────────────────┘ └─────────────────┘
│
▼
┌─────────────────┐
│ AI Analysis │
│ │
│ • Natural Lang. │
│ • GUI Feedback │
│ • Issue Reports │
└─────────────────┘
## 🚀 Installation and Setup
### Step 1: Install Ollama
```bash
# Install Ollama (AI model runner)
curl -fsSL https://ollama.ai/install.sh | sh
Step 2: Download LLaVA Model
# Download the vision model (this may take several minutes)
ollama pull llava
Step 3: Start Ollama Server
# Start the local AI server
ollama serve
# Server runs on http://localhost:11434
Step 4: Install Python Dependencies
# Install screenshot and API libraries
pip install requests pillow pyautogui
Step 5: Test the Setup
# Test that everything is working
python3 ollama_vision_setup.py
Core Components
1. ollama_vision_setup.py
- Main Setup and Interface
The core class that manages Ollama and LLaVA integration:
from ollama_vision_setup import OllamaVision
# Create vision interface
vision = OllamaVision(model_name='llava')
# Setup (downloads model if needed)
if vision.setup():
# Analyze an image
response = vision.ask_vision("screenshot.png", "What do you see?")
print(response)
Key methods:
setup()
- Initialize Ollama and download LLaVA if neededask_vision(image_path, question)
- Analyze image with AIcheck_ollama_running()
- Verify Ollama server statustest_vision()
- Test the vision model
2. ai_vision_integration.py
- User-Provided Pattern
Simple, direct integration pattern provided by user:
import requests
import base64
def ask_local_vision(image_path, question):
with open(image_path, 'rb') as f:
image_data = base64.b64encode(f.read()).decode('utf-8')
response = requests.post('http://localhost:11434/api/generate',
json={
'model': 'llava',
'prompt': question,
'images': [image_data]
})
return response.json()['response']
3. ai_vision_debug.py
- Complete Debug Workflow
Comprehensive debugging that:
- Runs MojoGUI test with high-contrast colors
- Takes screenshot automatically
- Analyzes with multiple diagnostic questions
- Provides detailed AI feedback
4. Vision Test Scripts
simple_vision_test.py
- Basic screenshot and analysismanual_vision_test.py
- Manual testing workflowfull_vision_analysis.py
- Comprehensive analysis
Usage Examples
Basic Vision Analysis
from ollama_vision_setup import OllamaVision
# Setup AI vision
vision = OllamaVision()
vision.setup()
# Take screenshot (your preferred method)
# screenshot_path = take_screenshot()
# Analyze GUI
response = vision.ask_vision("gui_screenshot.png",
"What widgets and UI elements are visible in this GUI?")
print(f"AI sees: {response}")
GUI Debugging Workflow
# 1. Run your MojoGUI application
# mojo advanced_widgets_demo.mojo &
# 2. Take screenshot after GUI stabilizes
import time
time.sleep(3) # Let GUI render
# 3. Analyze with specific questions
questions = [
"Are all text labels clearly readable?",
"Do you see any rendering issues or visual bugs?",
"What colors are used in the interface?",
"Are the buttons and panels properly aligned?"
]
for question in questions:
answer = vision.ask_vision("screenshot.png", question)
print(f"Q: {question}")
print(f"A: {answer}\n")
Professional GUI Verification
# Verify specific widgets are working
verification_questions = [
"Do you see a docking panel system with left, right, and bottom panels?",
"Are there accordion sections that can expand and collapse?",
"Is there a toolbar with buttons like New, Open, Save, Bold, Italic?",
"Do you see a floating panel that can be dragged?",
"Are all text elements using professional TTF fonts?"
]
for question in verification_questions:
response = vision.ask_vision("screenshot.png", question)
if "yes" in response.lower():
print(f"✅ PASS: {question}")
else:
print(f"❌ ISSUE: {question}")
print(f" AI Response: {response}")
API Reference
OllamaVision Class
class OllamaVision:
def __init__(self, model_name='llava', base_url='http://localhost:11434')
def setup() -> bool
def ask_vision(image_path: str, question: str) -> str
def check_ollama_running() -> bool
def list_models() -> list
def download_model(model_name: str) -> bool
def test_vision() -> bool
Core Functions
# User-provided simple pattern
def ask_local_vision(image_path: str, question: str) -> str
# Screenshot functions
def take_screenshot() -> str # Returns path to screenshot
def take_screenshot_pyautogui() -> str
def take_screenshot_pil() -> str
Common Questions for GUI Analysis
# Layout and structure
"What widgets and UI elements are visible?"
"Describe the overall layout and organization."
"Are there any panels, toolbars, or menus?"
# Visual quality
"Are all text elements clearly readable?"
"What colors are used in the interface?"
"Do you see any rendering issues or visual bugs?"
# Specific widget verification
"Do you see buttons labeled New, Open, Save?"
"Are there any accordion sections or collapsible panels?"
"Is there a docking system with moveable panels?"
# Professional assessment
"Does this look like a professional desktop application?"
"What desktop application does this interface remind you of?"
"Is the visual design modern and consistent?"
Advanced Usage
Custom Screenshot Integration
def custom_screenshot_analysis():
# Your custom screenshot method
screenshot_path = my_screenshot_function()
# Analyze with AI
vision = OllamaVision()
# Multiple analysis passes
layout_feedback = vision.ask_vision(screenshot_path,
"Analyze the layout and organization of this GUI interface.")
color_feedback = vision.ask_vision(screenshot_path,
"Evaluate the color scheme and visual consistency.")
usability_feedback = vision.ask_vision(screenshot_path,
"From a usability perspective, how intuitive is this interface?")
return {
'layout': layout_feedback,
'colors': color_feedback,
'usability': usability_feedback
}
Automated Testing Integration
def automated_gui_test():
"""Run automated GUI test with AI verification"""
# Start your GUI application
start_gui_application()
# Wait for rendering
time.sleep(3)
# Take screenshot
screenshot = take_screenshot()
# Define test criteria
test_cases = [
("Widget Visibility", "Are all expected widgets visible and properly rendered?"),
("Text Readability", "Is all text clear and readable?"),
("Layout Quality", "Is the layout professional and well-organized?"),
("Color Scheme", "Are colors consistent and appropriate?"),
("Interactive Elements", "Do buttons and controls look clickable and functional?")
]
# Run AI analysis for each test case
results = {}
vision = OllamaVision()
for test_name, question in test_cases:
response = vision.ask_vision(screenshot, question)
results[test_name] = {
'question': question,
'ai_response': response,
'passed': 'yes' in response.lower() and 'no' not in response.lower()
}
return results
Integration with Existing Test Frameworks
import unittest
class AIVisionGUITests(unittest.TestCase):
def setUp(self):
self.vision = OllamaVision()
self.vision.setup()
def test_widget_rendering(self):
"""Test that all widgets render correctly"""
screenshot = self.take_test_screenshot()
response = self.vision.ask_vision(screenshot,
"Are all GUI widgets visible and properly rendered?")
self.assertIn("yes", response.lower())
self.assertNotIn("missing", response.lower())
def test_professional_appearance(self):
"""Test that GUI looks professional"""
screenshot = self.take_test_screenshot()
response = self.vision.ask_vision(screenshot,
"Does this interface look professional and modern?")
self.assertIn("professional", response.lower())
Troubleshooting
Common Issues and Solutions
1. Ollama Not Running
# Error: Connection refused
# Solution: Start Ollama server
ollama serve
2. LLaVA Model Not Found
# Error: Model not found
# Solution: Download model
ollama pull llava
3. Screenshot Failed
# Error: No screenshot library
# Solution: Install dependencies
pip install pyautogui pillow
# For Linux, may also need:
sudo apt install gnome-screenshot
4. Image Analysis Failed
# Check image file exists and is readable
if not os.path.exists(image_path):
print(f"Image not found: {image_path}")
# Check image format
from PIL import Image
try:
img = Image.open(image_path)
print(f"Image: {img.size}, {img.format}")
except Exception as e:
print(f"Invalid image: {e}")
5. API Connection Issues
# Test Ollama connection
import requests
try:
response = requests.get('http://localhost:11434/api/tags')
print(f"Ollama status: {response.status_code}")
except Exception as e:
print(f"Connection failed: {e}")
Debugging AI Vision
def debug_vision_system():
"""Debug the AI vision system step by step"""
print("🔍 Debugging AI Vision System")
# 1. Check Ollama
vision = OllamaVision()
if vision.check_ollama_running():
print("✅ Ollama is running")
else:
print("❌ Ollama not running - start with: ollama serve")
return
# 2. Check model
if vision.model_exists():
print("✅ LLaVA model available")
else:
print("❌ LLaVA model missing - download with: ollama pull llava")
return
# 3. Test vision
if vision.test_vision():
print("✅ Vision system working")
else:
print("❌ Vision test failed")
return
print("🎯 AI Vision system is fully operational!")
Performance Optimization
# Optimize screenshot size for faster analysis
def optimize_screenshot(image_path, max_size=1024):
from PIL import Image
img = Image.open(image_path)
if max(img.size) > max_size:
# Resize while maintaining aspect ratio
img.thumbnail((max_size, max_size), Image.Resampling.LANCZOS)
optimized_path = image_path.replace('.png', '_optimized.png')
img.save(optimized_path)
return optimized_path
return image_path
# Use optimized image for faster AI analysis
screenshot = take_screenshot()
optimized_screenshot = optimize_screenshot(screenshot)
response = vision.ask_vision(optimized_screenshot, question)
Best Practices
1. Effective Questions
- Be specific about what you want to know
- Ask one question at a time for clarity
- Use descriptive language the AI can understand
2. Screenshot Quality
- Ensure GUI is fully rendered before screenshot
- Use high-contrast colors for better AI recognition
- Avoid overlapping windows or visual clutter
3. Error Handling
- Always check if screenshot was successful
- Handle API timeouts gracefully
- Validate AI responses for consistency
4. Performance
- Resize large screenshots for faster analysis
- Cache frequently used model responses
- Use appropriate timeout values
Future Enhancements
Potential improvements to the AI vision system:
- Multi-Model Support - Support for other vision models beyond LLaVA
- Automated Testing - Integration with CI/CD pipelines
- Visual Regression Testing - Compare screenshots over time
- Performance Metrics - Measure GUI rendering performance
- Accessibility Analysis - AI-powered accessibility auditing
Files in the AI Vision System
📁 AI Vision Files:
├── 🤖 ollama_vision_setup.py # Main setup and interface
├── 👁️ ai_vision_debug.py # Complete debug workflow
├── 🔧 ai_vision_integration.py # User-provided pattern
├── 📸 simple_ai_vision.py # Simple screenshot analysis
├── 🧪 vision_debug.py # Vision-assisted debugging
├── 📋 manual_vision_test.py # Manual testing workflow
├── 🎯 full_vision_analysis.py # Comprehensive analysis
├── 📊 ai_vision_analysis.py # Advanced widget analysis
├── 📚 AI_VISION_DOCUMENTATION.md # This documentation
└── 🎮 vision_test_demo.mojo # GUI test for vision analysis
Conclusion
The AI Vision system represents a breakthrough in GUI debugging and testing. By leveraging the power of modern AI vision models like LLaVA, developers can now get human-like feedback about their graphical interfaces, identifying issues that traditional debugging might miss.
Key Benefits:
Human-like analysis of GUI appearance and functionality
Automated visual testing capabilities
Natural language feedback about rendering issues
Easy integration with existing development workflows
Scalable testing for complex GUI applications
Start using AI Vision today to enhance your MojoGUI development process!