Uncategorized

Agentic AI Automated Pull Request Code Refactoring against Guidelines

November 1, 2025

179

Giving a high level description, design, code and sample results for an Agentic AI solution that leverages current agentic frameworks, LLMs, Prompt engineering, RAG and related algorithms to automate a large git pull request for code refactoring against guidelines effort. It shows how multiple agents working in parallel using a master orchestrator using multiple worker agents to perform multiple analysis tasks in parallel and then updating the code using the diff files generated.

The objective is to show the ability to use master worker agentic ai architecture that leverages LLMs, RAG and standard agent frameworks to automate large manual or semi automated tasks. It can be extended to various kinds of analysis and industries like analyzing documents, data etc against requirements in industries like finance, healthcare, technology, consulting, legal, etc that analyze large amounts of data.

Lot of AI thought leaders, researchers, etc have doubts about Agentic AI solutions working at scale to solve large complex business problems at scale consistently with replicable results. Also, Current IDEs AI agents and AI coding tools run into lost of issues when running multiple agents. They eventually crash, do incomplete or inconsistent analysis due to controlled localized environment setups and no access to LLM API parameters. This solution overcomes these issues and can be scaled to 1000s of agents.

Disclaimer: Design and code will require stress testing for scale and most likely updates to the code to allow the orchestrator agent to work at this scale. Please use the design and code as per your problem to be solved and enhance as required. Hope this helps people looking to solve industry problems using Agentic AI architecture. All feedback welcome.

The Problem

Teams struggle to consistently enforce coding standards across large codebases. Manual reviews are slow, inconsistent, and don’t scale—leading to technical debt, security risks, and poor maintainability. The number of codebases, languages, technologies, versions and their size/scale depending on the size and type of enterprises, means the amount of code to be refactored can be huge and analyzing against different coding standards could require many man years of effort. Humanly its a very difficult and costly exercise.

The Idea

An AI-powered agentic system that automatically scans, analyzes, and refactors code files against custom coding guidelines, then creates Git pull requests with fixes—all without human intervention. This solution automates a laborious, complex, and error prone exercise into an automated solution that leverages multi agents running in parallel in the background that perform their task with minimal human intervention, freeing up enterprise resources for critical business activities.

The Solution

Input: Folder or Git repo URL + coding guidelines (CSV/text/web)
AI Agents:
- Master Orchestrator (LangGraph) coordinates
- Parallel Worker Agents analyze one file each
Analysis Engine:
- Parses code with AST
- Uses RAG + cosine similarity in PostgreSQL (pgvector) to find relevant rules
- Prompts LLM to generate precise diff fixes
- Use a Tree-of-Thought + Chain-of-Thought Prompting strategy for multi path planning and multi step execution
Output:
- .diff files per updated file
- New Git branch + Pull Request with refactored code

Technical Architecture

RAG Pipeline: Coding guidelines stored as embeddings in PostgreSQL + pgvector
AST Parser: Deep structural analysis of Python code
LLM (OpenAI/Anthropic): Context-aware violation detection and fix generation
LangGraph: Orchestrates parallel analysis via master-worker pattern
Git Automation: Creates branches and PRs with diffs

Business Benefit Drastically reduces review time, enforces standards at scale, improves code quality, accelerates development velocity, and integrates seamlessly into CI/CD — all in a secure, containerized, enterprise-ready system.

High Level Architecture Diagram

High Level Technical Architecture

Step-by-Step Design & Functionality

Step 1: Ingest & Store Coding Guidelines (RAG Setup)

Input: Company-specific coding standards (e.g., PEP 8, internal style guides, security rules).
Process:
1. Split guidelines into logical chunks (by rule, section, or pattern).
2. Generate vector embeddings using a sentence transformer or LLM encoder.
3. Store embeddings + metadata (rule ID, severity, category) in PostgreSQL with pgvector.
Output: A searchable knowledge base of enforceable rules.

Step 2: Receive Code for Refactoring

Trigger: Git hook, CLI command, or API call with a list of Python files or repo URL.
Input: Multiple .py files (local or from Git).
Preprocessing:
- Clone repo (if remote).
- Filter Python files.
- Create refactoring job queue.

Step 3: Parallel File Processing (LangGraph Master-Worker)

Orchestrator (Master): Uses LangGraph to manage workflow state and task distribution.
Workers: Spin up parallel agents (one per file or batch).
Scalability: Horizontal scaling via Docker/Kubernetes.

Step 4: Parse Code with AST (Abstract Syntax Tree)

Tool: Python’s built-in ast module.
Actions:
1. Parse each .py file into an AST.
2. Traverse tree to extract:
  - Function/class definitions
  - Variable names
  - Control flow
  - Imports
  - Docstrings
  - Complexity metrics (cyclomatic, nesting)
Output: Structured representation of code logic and style.

Step 5: Retrieve Relevant Guidelines via RAG

For each code element (function, class, block):
1. Generate contextual query embeddings from AST metadata (e.g., “naming convention for async functions”).
2. Perform vector similarity search in PostgreSQL/pgvector.
3. Retrieve top-k matching rules with confidence scores.
Enrichment: Attach rule text, examples, and fix templates.

Step 6: AI-Powered Violation Detection & Fix Suggestion

LLM Prompt (per file or function): {Prompt text}

Output:
- List of violations
- Refactored code snippet (if fixable)
- Confidence score

Step 7: Generate Unified Diffs

For each file:
1. Apply LLM-suggested changes to original code.
2. Use difflib or git apply –check to validate.
3. Generate patch/diff file (unified format).
Safety:
- Syntax validation via ast.parse()
- Optional: Run unit tests (if provided)

Step 8: Git Integration & PR Automation

Actions:
1. Create new branch: refactor/ai-analysis-<timestamp>
2. Apply diffs to files
3. Commit with structured message: {Prompt text}

1. Push branch
2. Open Pull Request via GitHub/GitLab API with:
  - Summary of changes
  - Violation report
  - Link to full analysis log

Step 9: Monitoring, Logging & Feedback Loop

Enterprise Features:
- Dockerized deployment
- Prometheus metrics (files/sec, violations found, PRs created)
- Audit logs (who triggered, what changed)
- Feedback UI: Approve/reject AI suggestions → retrain RAG rankings
Continuous Improvement:
- User-accepted fixes reinforce RAG relevance
- Rejected ones downrank bad rules

Complete codebase with design, requirements, installation and all code files is in the github repo. IT also contains some sample python code files and python PEP8 guidelines used for the code reactoring.

https://github.com/datawisdomx1/AgenticAI_AutomatedGitPR_CodeRefactoring_Guidelines