by Neverdecel
CodeRAG is an AI-powered tool for real-time codebase querying and augmentation using OpenAI and vector search.
# Add to your Claude Code skills
git clone https://github.com/Neverdecel/CodeRAGNote: This POC was innovative for its time, but modern tools like Cursor and Windsurf now apply this principle directly in IDEs. This remains an excellent educational project for understanding RAG implementation.
CodeRAG combines Retrieval-Augmented Generation (RAG) with AI to provide intelligent coding assistance. Instead of limited context windows, it indexes your entire codebase and provides contextual suggestions based on your complete project.
Most coding assistants work with limited scope, but CodeRAG provides the full context of your project by:
# Clone the repository
git clone https://github.com/your-username/CodeRAG.git
cd CodeRAG
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\\Scripts\\activate
# Install dependencies (installs the package with dev extras)
pip install -r requirements.txt
# Configure environment
cp example.env .env
# Edit .env with your OpenAI API key and settings
No comments yet. Be the first to share your thoughts!
The requirements file simply references
-e .[dev]; feel free to runpip install -e .[dev]directly if you prefer editable installs.
Create a .env file with your settings:
OPENAI_API_KEY=your_openai_api_key_here
OPENAI_EMBEDDING_MODEL=text-embedding-ada-002
OPENAI_CHAT_MODEL=gpt-4
WATCHED_DIR=/path/to/your/code/directory
FAISS_INDEX_FILE=./coderag_index.faiss
EMBEDDING_DIM=1536
# Start the backend (indexing and monitoring)
python main.py
# In a separate terminal, start the web interface
streamlit run app.py
# Query the local index from the terminal (after indexing completes)
coderag-cli "how is faiss configured?"
graph LR
A[Code Files] --> B[File Monitor]
B --> C[OpenAI Embeddings]
C --> D[FAISS Vector DB]
E[User Query] --> F[Semantic Search]
D --> F
F --> G[Retrieved Context]
G --> H[OpenAI GPT]
H --> I[AI Response]
CodeRAG/
├── 🧠 coderag/ # Core RAG functionality
│ ├── config.py # Environment configuration
│ ├── embeddings.py # OpenAI embedding generation
│ ├── index.py # FAISS vector operations
│ ├── search.py # Semantic code search
│ └── monitor.py # File system monitoring
├── 🌐 app.py # Streamlit web interface
├── 🔧 main.py # Backend indexing service
├── 🔗 prompt_flow.py # RAG pipeline orchestration
└── 📋 requirements.txt # Dependencies
"How does the FAISS indexing work in this codebase?"
"Where is error handling implemented?"
"Show me examples of the embedding generation process"
"How can I optimize the search performance?"
"What are potential security issues in this code?"
"Suggest better error handling for the monitor module"
"Why might the search return no results?"
"How do I troubleshoot OpenAI connection issues?"
"What could cause indexing to fail?"
# Install pre-commit hooks
pip install pre-commit
pre-commit install
pre-commit run --all-files
# Test FAISS index functionality
python tests/test_faiss.py
# Test individual components
python scripts/initialize_index.py
python scripts/run_monitor.py
Search returns no results
coderag_index.faiss fileOpenAI API errors
.env fileFile monitoring not working
WATCHED_DIR path in .env.py filesgit checkout -b feature/amazing-feature)pre-commit run --all-files)git commit -m 'Add amazing feature')git push origin feature/amazing-feature)This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
⭐ If this project helps you, please give it a star!