Intelligent OCR System · Batch Processing · Multi-Mode Support · Bounding Box Visualization
Features • Quick Start • Version History • Documentation • Contributing
🍎 Now fully supports Mac M1/M2/M3/M4 with native MPS acceleration!
DeepSeek-OCR-WebUI v3.3 brings native Apple Silicon support, enabling Mac users to run high-performance OCR locally with:
- ✅ Native MPS Backend - Metal Performance Shaders acceleration
- ✅ Easy Setup - One-command conda environment installation
- ✅ Private Deployment - Run completely offline on your Mac
- ✅ Fast Inference - ~3s per image on M3 Pro
👉 Jump to Mac Deployment Guide
DeepSeek-OCR-WebUI is an intelligent image recognition web application based on the DeepSeek-OCR model, providing an intuitive user interface and powerful recognition capabilities.
- 🎯 7 Recognition Modes - Document, OCR, Chart, Find, Freeform, etc.
- 🖼️ Bounding Box Visualization - Find mode automatically annotates positions
- 📦 Batch Processing - Support for multiple image sequential recognition
- 📄 PDF Support - Upload PDF files, automatically convert to images
- 🎨 Modern UI - Cool gradient backgrounds and animation effects
- 🌐 Multilingual Support - Simplified Chinese, Traditional Chinese, English, Japanese
- 🍎 Apple Silicon Support - Native MPS acceleration for Mac M1/M2/M3/M4
- 🐳 Docker Deployment - One-click startup, ready to use
- ⚡ GPU Acceleration - High-performance inference based on NVIDIA GPU
- 🌏 ModelScope Fallback - Auto-switch to ModelScope when HuggingFace is unavailable
| Mode | Icon | Description | Use Cases |
|---|---|---|---|
| Doc to Markdown | 📄 | Preserve format and layout | Contracts, papers, reports |
| General OCR | 📝 | Extract all visible text | Image text extraction |
| Plain Text | 📋 | Pure text without format | Simple text recognition |
| Chart Parser | 📊 | Recognize charts and formulas | Data charts, math formulas |
| Image Description | 🖼️ | Generate detailed descriptions | Image understanding, accessibility |
| Find & Locate ⭐ | 🔍 | Find and annotate positions | Invoice field locating |
| Custom Prompt ⭐ | ✨ | Customize recognition needs | Flexible recognition tasks |
DeepSeek-OCR-WebUI now supports PDF file uploads! When you upload a PDF file, it automatically converts each page to a separate image, maintaining all subsequent processing logic (OCR recognition, batch processing, etc.).
Key Features:
- Multi-page PDF Conversion: Automatically converts each page to a separate image
- Real-time Progress: Shows conversion progress page by page
- Drag & Drop: Support drag & drop PDF upload
- Find Mode: PDF support in Find mode (uses first page automatically)
- Format Validation: Automatic file type detection and error prompts
- Seamless Integration: Converted images follow the same processing pipeline as regular images
- Auto-Switch: Automatically switches to ModelScope when HuggingFace is unavailable
- Smart Detection: Intelligently detects network errors and timeouts
- China-Friendly: Seamless experience for users in mainland China
- 5-minute Timeout: Configurable timeout for model loading
Left-Right Split Layout:
┌──────────────────────┬─────────────────────────────┐
│ Left: Control Panel │ Right: Result Display │
├──────────────────────┼─────────────────────────────┤
│ 📤 Image Upload │ 🖼️ Result Image (with boxes) │
│ 🎯 Search Input │ 📊 Statistics │
│ 🚀 Action Buttons │ 📝 Recognition Text │
│ │ 📦 Match List │
└──────────────────────┴─────────────────────────────┘
Bounding Box Visualization:
- 🟢 Colorful neon border auto-annotation
- 🎨 6 colors in rotation
- 📍 Precise coordinate positioning
- 🔄 Responsive auto-redraw
Feature Demo:
- 🇨🇳 Simplified Chinese (zh-CN)
- 🇹🇼 Traditional Chinese (zh-TW)
- 🇺🇸 English (en-US) - Default
- 🇯🇵 Japanese (ja-JP)
Web UI:
- Click the language selector in the top-right corner
- Select your desired language
- Interface switches immediately, settings auto-save
For Docker (Recommended):
- Docker & Docker Compose
- NVIDIA GPU + Drivers (for GPU acceleration)
- 8GB+ RAM
- 20GB+ Disk Space
For Mac (Apple Silicon):
- macOS with Apple Silicon (M1/M2/M3/M4)
- Python 3.11+
- 16GB+ RAM (recommended)
- 20GB+ Disk Space
For Linux (Native):
- Python 3.11+
- NVIDIA GPU + CUDA (optional, for acceleration)
- 8GB+ RAM
- 20GB+ Disk Space
Best for: Linux servers with NVIDIA GPU, production environments
# 1. Clone repository
git clone https://github.com/neosun100/DeepSeek-OCR-WebUI.git
cd DeepSeek-OCR-WebUI
# 2. Start service
docker compose up -d
# 3. Wait for model loading (about 1-2 minutes)
docker logs -f deepseek-ocr-webui
# 4. Access Web UI
# The service listens on all network interfaces (0.0.0.0:8001)
# Choose the appropriate access method:
#
# - Local access: http://localhost:8001
# - LAN access: http://<server-ip>:8001
# - Domain access: http://<your-domain>:8001 (if configured)
#
# Example: If your server IP is 192.168.1.100, use:
# http://192.168.1.100:8001Access Methods:
- Local Machine:
http://localhost:8001 - Remote Server (No Domain):
http://<服务器IP地址>:8001- Find your IP:
hostname -Iorip addr show - Example: If IP is
192.168.1.100, accesshttp://192.168.1.100:8001
- Find your IP:
- With Domain:
http://<your-domain>:8001orhttps://<your-domain>- Configure your reverse proxy (nginx/caddy) to forward to
localhost:8001
- Configure your reverse proxy (nginx/caddy) to forward to
Best for: Mac M1/M2/M3/M4 users, local development
# Clone repository
git clone https://github.com/neosun100/DeepSeek-OCR-WebUI.git
cd DeepSeek-OCR-WebUI
# Create and activate conda environment (REQUIRED)
conda create -n deepseek-ocr-mlx python=3.11
conda activate deepseek-ocr-mlx
# Install PyTorch with MPS support
pip install torch torchvision
# Install required packages
pip install transformers==4.46.3 tokenizers==0.20.3
pip install fastapi uvicorn PyMuPDF Pillow
pip install einops addict easydict matplotlib
# Or install all dependencies at once
pip install -r requirements-mac.txt
# Verify installation (optional)
./verify_mac_env.sh# IMPORTANT: Always activate the conda environment first
conda activate deepseek-ocr-mlx
# Start service (auto-detects MPS backend)
./start.sh
# Or manually
python web_service_unified.pyAccess Methods:
- Local Machine:
http://localhost:8001 - Remote Server:
http://<服务器IP>:8001- Find IP:
ifconfig | grep "inet "orip addr show - Example: If IP is
192.168.1.100, accesshttp://192.168.1.100:8001
- Find IP:
- With Domain: Configure reverse proxy to point to
localhost:8001
Note: First run will download ~7GB model, please be patient.
Best for: Linux servers, custom configurations
# Install PyTorch with CUDA
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
# Install dependencies
pip install transformers==4.46.3 tokenizers==0.20.3
pip install fastapi uvicorn PyMuPDF Pillow
pip install einops addict easydict matplotlib
# Start service (auto-detects CUDA backend)
./start.sh# Install PyTorch CPU version
pip install torch torchvision
# Install dependencies
pip install transformers==4.46.3 tokenizers==0.20.3
pip install fastapi uvicorn PyMuPDF Pillow
pip install einops addict easydict matplotlib
# Start service (auto-detects CPU backend)
./start.sh# Check container status (Docker)
docker compose ps
# Check health status
curl http://localhost:8001/health
# Expected response:
# {
# "status": "healthy",
# "backend": "mps", # or "cuda" or "cpu"
# "platform": "Darwin", # or "Linux"
# "model_loaded": true
# }The service automatically detects your platform and uses the optimal backend:
| Platform | Backend | Acceleration | Auto-Detected |
|---|---|---|---|
| Mac M1/M2/M3/M4 | MPS | Metal GPU | ✅ Yes |
| Linux + NVIDIA GPU | CUDA | CUDA GPU | ✅ Yes |
| Linux (CPU only) | CPU | None | ✅ Yes |
| Docker | CUDA | CUDA GPU | ✅ Yes |
Force specific backend (optional):
FORCE_BACKEND=mps ./start.sh # Force MPS (Mac only)
FORCE_BACKEND=cuda ./start.sh # Force CUDA (Linux+GPU)
FORCE_BACKEND=cpu ./start.sh # Force CPU (any platform)
# http://localhost:8001# Check container status
docker compose ps
# Check health status
curl http://localhost:8001/health
# View logs
docker logs deepseek-ocr-webui🍎 Apple Silicon Support:
- ✅ Native MPS (Metal Performance Shaders) backend for Mac M1/M2/M3/M4
- ✅ Automatic platform detection and backend selection
- ✅ Optimized float32 precision for MPS compatibility
- ✅ ~7GB model with automatic download and caching
🌍 Multi-Platform Architecture:
- ✅ Unified backend interface (MPS/CUDA/CPU)
- ✅ Smart platform detection (Mac/Linux/Docker)
- ✅ Independent backend implementations (no conflicts)
- ✅ Universal startup script (
./start.sh)
🔧 Technical Improvements:
- ✅ Model revision:
1e3401a3d4603e9e71ea0ec850bfead602191ec4(MPS support) - ✅ Transformers 4.46.3 compatibility
- ✅ Fixed LlamaFlashAttention2 import issues
- ✅ Unified model inference interface across platforms
📚 Documentation:
- ✅ Multi-platform deployment guide
- ✅ Platform compatibility documentation
- ✅ Verification tools (
verify_platform.sh)
📄 New Features:
- ✅ PDF upload support (auto-convert to images)
- ✅ Multi-page PDF conversion with real-time progress
- ✅ Drag & drop PDF upload
- ✅ ModelScope auto-fallback (when HuggingFace unavailable)
- ✅ Smart network error detection and retry
🐛 Bug Fixes:
- ✅ Fixed PDF conversion progress logging
- ✅ Fixed button text duplication in i18n
- ✅ Fixed system initialization log information
🔧 Technical Improvements:
- ✅ PyMuPDF integration for high-quality PDF conversion (144 DPI)
- ✅ Async PDF processing for real-time progress
- ✅ Enhanced error handling and logging
🌐 New Features:
- ✅ Added multilingual support (Simplified Chinese, Traditional Chinese, English, Japanese)
- ✅ Language selector UI component
- ✅ Localization persistence storage
- ✅ Multilingual documentation (README)
🐛 Bug Fixes:
- ✅ Fixed mode switching issues
- ✅ Fixed bounding boxes exceeding image boundaries
- ✅ Optimized image container layout
- ✅ Added rendering delay for alignment
🎨 UI Optimization:
- ✅ Centered image display
- ✅ Responsive bounding box redraw
- ✅ Language switcher integration
✨ Major Updates:
- ✅ New Find mode (find & locate)
- ✅ Dedicated left-right split layout
- ✅ Canvas bounding box visualization
- ✅ Colorful neon annotation effects
🔧 Technical Improvements:
- ✅ transformers engine (replacing vLLM)
- ✅ Precise coordinate conversion algorithm
- ✅ Responsive design optimization
Scenario: Find "Total" amount in invoice
Steps:
1. Select "🔍 Find & Locate" mode
2. Upload invoice image
3. Enter search term: Total
4. Click "🚀 Start Search"
Results:
✓ "Total" marked with green border on image
✓ Shows 1-2 matches found
✓ Provides precise coordinate informationScenario: Batch recognize 20 contracts
Steps:
1. Select "📄 Doc to Markdown" mode
2. Drag and upload 20 images
3. Adjust order (optional)
4. Click "🚀 Start Recognition"
Results:
✓ Process each image sequentially
✓ Real-time progress display
✓ Auto-merge all results
✓ One-click copy or download# docker-compose.yml
API_HOST=0.0.0.0 # Listen address
MODEL_NAME=deepseek-ai/DeepSeek-OCR # Model name
CUDA_VISIBLE_DEVICES=0 # GPU device# Memory configuration
shm_size: "8g" # Shared memory
# GPU configuration
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]Contributions welcome! Please check the Contributing Guide.
- Fork this repository
- Create feature branch (
git checkout -b feature/AmazingFeature) - Commit changes (
git commit -m 'Add some AmazingFeature') - Push to branch (
git push origin feature/AmazingFeature) - Open Pull Request
- Check Troubleshooting
- Check Known Issues
- Submit an Issue
- Check Roadmap
- Submit a Feature Request
This project is licensed under the MIT License.
- DeepSeek-AI - DeepSeek-OCR model
- deepseek_ocr_app - Reference project
- All contributors and users
⭐ If this project helps you, please give it a Star! ⭐
Made with ❤️ by neosun100
DeepSeek-OCR-WebUI v3.3 | © 2025



