大家好,我是 Ai 學(xué)習(xí)的老章
最近的 OCR 大模型我都做了本地部署和測試,還寫了一個 API 統(tǒng)一對接這三個模型
?
?
??
很多同學(xué)問選哪個?
成年人怎么還在做選擇呢,必須全都要啊
我用 FastAPI 框架擼了一個簡單的 OCR 模型對比工具,可以實現(xiàn)同樣的提示詞 + 圖片/PDF,利用 Python 多線程并行調(diào)用 DeepSeek、Paddle 和 混元這三個模型的 API 進(jìn)行解析,并將結(jié)果并排展示。
前端其實純 HTML+CSS+js 實現(xiàn),為了內(nèi)網(wǎng)部署,不依賴任何 CND。
使用也很簡單,圖片/PDF 上傳之后,輸入提示詞,沒有特殊要求,使用默認(rèn)就行。
點擊 Run OCR Comparison 即可
三者都很快,內(nèi)置了輕量級 Markdown 解析其,自動渲染結(jié)果。
也可以切換到識別后的原始 Markdown,支持一鍵 copy
核心代碼如下(完整代碼接近 600 行,大多是 HTML 相關(guān)):
我這里主要是模型本地部署,內(nèi)網(wǎng)運(yùn)行的,沒再折騰線上部署。感興趣的同學(xué)可以試試,OCR 模型 API 部分替換成官方/第三方的 API,代碼稍作修改就可以在線部署運(yùn)行了。
#!/usr/bin/env python3
"""
OCR Comparison Web App - 美化版,不依賴外部 CDN
"""
import os
import re
import shutil
import tempfile
import requests
from concurrent.futures import ThreadPoolExecutor
import uvicorn
from fastapi import FastAPI, File, Form, UploadFile
from fastapi.responses import HTMLResponse
app = FastAPI(title="OCR Comparison")
# --- Configuration ---
MODELS = {
"DeepSeek-OCR": "http://localhost:8002/models/v1//deepseek-ocr/inference",
"PaddleOCR": "http://localhost:8003/models/v1/PaddleOCR/inference",
"HunyuanOCR": "http://localhost:8004/models/v1/HunyuanOCR/inference",
}
def call_api(model_name, api_url, file_path, prompt):
"""調(diào)用單個 OCR API"""
print(f"[INFO] Calling {model_name}: {api_url}")
try:
with open(file_path, 'rb') as f:
resp = requests.post(
api_url,
files={'file': (os.path.basename(file_path), f)},
data={'prompt': prompt},
timeout=300
)
print(f"[INFO] {model_name} status: {resp.status_code}")
if resp.status_code == 200:
data = resp.json()
result = data.get("result", str(data))
print(f"[INFO] {model_name} result length: {len(result)}")
return result
returnf"HTTP Error: {resp.status_code}"
except Exception as e:
print(f"[ERROR] {model_name}: {e}")
returnf"Error: {e}"
HTML_PAGE = """
省略
"""
@app.get("/", response_class=HTMLResponse)
asyncdef index():
return HTML_PAGE
@app.post("/api/compare")
asyncdef compare(
file: UploadFile = File(...),
prompt: str = Form("Convert the document to markdown.")
):
print(f"\n{'='*60}")
print(f"[INFO] Received request: {file.filename}")
print(f"[INFO] Prompt: {prompt[:50]}...")
print(f"{'='*60}")
temp_dir = tempfile.mkdtemp()
temp_path = os.path.join(temp_dir, file.filename)
try:
with open(temp_path, "wb") as f:
content = await file.read()
f.write(content)
print(f"[INFO] Saved to: {temp_path}, size: {len(content)} bytes")
# 并行調(diào)用三個 API
results = {}
with ThreadPoolExecutor(max_workers=3) as executor:
futures = {
"deepseek": executor.submit(call_api, "DeepSeek-OCR", MODELS["DeepSeek-OCR"], temp_path, prompt),
"paddle": executor.submit(call_api, "PaddleOCR", MODELS["PaddleOCR"], temp_path, prompt),
"hunyuan": executor.submit(call_api, "HunyuanOCR", MODELS["HunyuanOCR"], temp_path, prompt),
}
for name, future in futures.items():
try:
result = future.result(timeout=310)
results[name] = result
print(f"[INFO] {name} done, length: {len(result)}")
except Exception as e:
results[name] = f"Error: {e}"
print(f"[ERROR] {name}: {e}")
print(f"[INFO] All done. Returning results.")
print(f"[DEBUG] Results keys: {list(results.keys())}")
return results
finally:
shutil.rmtree(temp_dir, ignore_errors=True)
if __name__ == "__main__":
print("\n" + "="*60)
print("OCR Comparison Server")
print("URL: http://0.0.0.0:8080")
print("="*60 + "\n")
uvicorn.run(app, host="0.0.0.0", port=8080)
特別聲明:以上內(nèi)容(如有圖片或視頻亦包括在內(nèi))為自媒體平臺“網(wǎng)易號”用戶上傳并發(fā)布,本平臺僅提供信息存儲服務(wù)。
Notice: The content above (including the pictures and videos if any) is uploaded and posted by a user of NetEase Hao, which is a social media platform and only provides information storage services.