網(wǎng)易首頁 > 網(wǎng)易號(hào) > 正文申請(qǐng)入駐

Scikit-image 實(shí)戰(zhàn)指南：10 個(gè)讓 CV 模型更穩(wěn)健的預(yù)處理技巧

2025-12-21 20:41:54　來源: deephub

北京舉報(bào)

分享至

在計(jì)算機(jī)視覺工程落地中我們常遇到一種現(xiàn)象：模型在驗(yàn)證集上表現(xiàn)完美，但是一旦部署到生產(chǎn)環(huán)境準(zhǔn)確率卻莫名下跌。這種“性能衰退”往往不源于模型架構(gòu)本身而是歸咎于預(yù)處理管道的脆弱性。數(shù)據(jù)類型的隱式轉(zhuǎn)換、縮放算法的細(xì)微差異、或是未被矯正的幾何形變，這些看似微不足道的工程細(xì)節(jié)往往是系統(tǒng)失效的根源。

相比于盲目調(diào)整超參數(shù)，建立一套確定性強(qiáng)的預(yù)處理流程性價(jià)比更高。本文總結(jié)了基于scikit-image的十個(gè)工程化模式，旨在幫助開發(fā)者消除輸入數(shù)據(jù)的不確定性將雜亂的原始圖像轉(zhuǎn)化為對(duì)模型真正友好的高質(zhì)量張量。

1、統(tǒng)一數(shù)據(jù)類型（dtype）

scikit-image 中的大多數(shù)濾波器都默認(rèn)輸入是 [0, 1] 范圍內(nèi)的浮點(diǎn)數(shù)。在工程實(shí)現(xiàn)上最好選定一種內(nèi)部 dtype，并在數(shù)據(jù)進(jìn)入管道的邊界處完成轉(zhuǎn)換，而不是在中間環(huán)節(jié)反復(fù)橫跳。

import numpy as np
from skimage import img_as_float32, io
def load_and_normalize(path: str) -> np.ndarray:
img = io.imread(path) # could be uint8/uint16/RGBA
img = img_as_float32(img) # -> float32 in [0,1]
return img[..., :3] if img.shape[-1] == 4 else img # drop alpha if present

這種做法能最大限度減少意外（比如數(shù)據(jù)被靜默截?cái)啵ＷC跨機(jī)器行為的確定性，調(diào)試起來也更省心。

2、顯式指定顏色空間與通道軸

注意庫版本的 API 變動(dòng)，很多 API 已經(jīng)從 multichannel= 切換到了 channel_axis。另外，必須明確模型到底需要灰度圖還是 RGB。

from skimage.color import rgb2gray
def to_gray(img: np.ndarray) -> np.ndarray:
# img: float32 [0,1], shape (H,W,3)
g = rgb2gray(img) # returns (H,W) float in [0,1]
return g

如果保留 3 通道，盡量優(yōu)先使用 RGB 順序并在文檔中寫死。調(diào)用濾波器時(shí)記得傳入 channel_axis=-1 以便算法正確感知顏色維度。

3、縮放必須抗鋸齒（Anti-aliasing）并統(tǒng)一幾何策略

不帶抗鋸齒的下采樣簡直是災(zāi)難，不僅會(huì)引入摩爾紋還會(huì)導(dǎo)致邊緣信息丟失。

from skimage.transform import resize
def resize_safe(img: np.ndarray, size=(224, 224)) -> np.ndarray:
return resize(
img, size + ((img.shape[-1],) if img.ndim == 3 else ()),
anti_aliasing=True, preserve_range=False
).astype("float32")

在生產(chǎn)環(huán)境中，寬高比策略的一致性比算法的巧妙更重要。如果你決定用中心填充（center-pad）那就全鏈路都用；如果選了留白（letterbox）就一直到底。

4、關(guān)鍵區(qū)域使用自適應(yīng)對(duì)比度（CLAHE）

全局直方圖均衡化往往用力過猛容易讓圖像“過曝”。CLAHE（限制對(duì)比度自適應(yīng)直方圖均衡化）則好得多它能在不破壞高光的前提下提取局部細(xì)節(jié)。

from skimage import exposure
def local_contrast(img_gray: np.ndarray) -> np.ndarray:
# img_gray: (H,W) float in [0,1]
return exposure.equalize_adapthist(img_gray, clip_limit=0.02)

這招在處理文檔、醫(yī)學(xué)影像或照明昏暗的場景時(shí)特別管用，但如果場景本身對(duì)比度已經(jīng)很高就別用了，否則只是在徒增噪聲。

5、去噪要選對(duì)先驗(yàn)知識(shí)

噪聲類型千差萬別沒有萬能的方案，這里有三個(gè)實(shí)用的默認(rèn)方案：

from skimage.restoration import denoise_bilateral, denoise_tv_chambolle, estimate_sigma
def denoise(img_gray: np.ndarray, mode="tv") -> np.ndarray:
if mode == "bilateral":
return denoise_bilateral(img_gray, sigma_color=0.05, sigma_spatial=3)
if mode == "tv": # edges preserved, good for text/edges
return denoise_tv_chambolle(img_gray, weight=0.1)
if mode == "auto":
sig = estimate_sigma(img_gray, channel_axis=None)
w = min(0.2, max(0.05, sig * 2))
return denoise_tv_chambolle(img_gray, weight=w)
raise ValueError("unknown mode")

去噪更像是一個(gè)需要根據(jù)攝像頭模組或場景特性單獨(dú)調(diào)節(jié)的旋鈕，而不是一個(gè)全局通用的常量。

6、識(shí)別前的去偏斜

對(duì)于 OCR 和條形碼模型來說微小的旋轉(zhuǎn)都是致命的，所以可以利用圖像矩或霍夫變換（Hough lines）估計(jì)傾斜角，然后進(jìn)行矯正。

import numpy as np
from skimage.transform import rotate
from skimage.filters import sobel
from skimage.feature import canny
from skimage.transform import hough_line, hough_line_peaks
def deskew(img_gray: np.ndarray) -> np.ndarray:
edges = canny(img_gray, sigma=2.0)
hspace, angles, dists = hough_line(edges)
_, angles_peaks, _ = hough_line_peaks(hspace, angles, dists, num_peaks=5)
if len(angles_peaks):
# Convert from radians around vertical to degrees
angle = np.rad2deg(np.median(angles_peaks) - np.pi/2)
return rotate(img_gray, angle=angle, mode="edge", preserve_range=True)
return img_gray

哪怕只是修正 1-2 度文本識(shí)別的準(zhǔn)確率往往也能上一個(gè)臺(tái)階。

7、去除不均勻背景（Rolling Ball 或形態(tài)學(xué)開運(yùn)算）

遇到光照不均可以試著減去一個(gè)平滑后的背景層。

import numpy as np
from skimage.morphology import white_tophat, disk
def remove_background(img_gray: np.ndarray, radius=30) -> np.ndarray:
# white_tophat = image - opening(image)
return white_tophat(img_gray, footprint=disk(radius))

在處理收據(jù)小票、顯微鏡玻片或者白底產(chǎn)品圖時(shí)這個(gè)技巧非常有用。

8、智能二值化

全局 Otsu 算法作為理論的標(biāo)準(zhǔn)答案沒問題，但在有陰影或光照漸變的實(shí)際場景中局部（Local）閾值方法往往表現(xiàn)更好。

from skimage.filters import threshold_local, threshold_otsu
def binarize(img_gray: np.ndarray, method="local") -> np.ndarray:
if method == "otsu":
t = threshold_otsu(img_gray)
return (img_gray > t).astype("uint8") # {0,1}
# local "window" around each pixel
T = threshold_local(img_gray, block_size=35, offset=0.01)
return (img_gray > T).astype("uint8")

二值化之后還可以配合形態(tài)學(xué)操作清理噪點(diǎn)。

9、形態(tài)學(xué)操作：清理、連接與測量

這一步的目的是去除孤立噪點(diǎn)、連接斷裂的筆畫，并保留有意義的區(qū)塊（Blobs）。

from skimage.morphology import remove_small_objects, remove_small_holes, closing, square
from skimage.measure import label, regionprops
def clean_and_props(mask: np.ndarray, area_min=64) -> list:
mask = closing(mask.astype(bool), square(3))
mask = remove_small_objects(mask, area_min)
mask = remove_small_holes(mask, area_min)
lbl = label(mask)
return list(regionprops(lbl))

一旦Mask變得干凈，后續(xù)的對(duì)象級(jí)推理，比如數(shù)藥片、定位 Logo、測量缺陷尺寸就變得非常簡單了。

10、透視與幾何歸一化（讓輸入可比）

對(duì)于文檔或平面物體，在提取特征前先做視點(diǎn)歸一化很有必要。

import numpy as np
from skimage.transform import ProjectiveTransform, warp
def four_point_warp(img: np.ndarray, src_pts: np.ndarray, dst_size=(800, 1100)) -> np.ndarray:
# src_pts: 4x2 float32 (tl, tr, br, bl) in image coordinates
w, h = dst_size
dst = np.array([[0,0],[w-1,0],[w-1,h-1],[0,h-1]], dtype=np.float32)
tform = ProjectiveTransform()
tform.estimate(dst, src_pts)
out = warp(img, tform, output_shape=(h, w), preserve_range=True)
return out.astype("float32")

不過要注意，如果你依賴模型或啟發(fā)式算法來檢測角點(diǎn)，必須記錄成功/失敗的監(jiān)控指標(biāo)，因?yàn)橐坏?Warp 算錯(cuò)了后果很嚴(yán)重。

總結(jié)

預(yù)處理是計(jì)算機(jī)視覺從“學(xué)術(shù)算法”走向“工程”的分水嶺。使用 scikit-image只要選對(duì)了模式，就能兼顧速度、清晰度和控制力。建議從簡單的做起：統(tǒng)一 dtype，帶抗鋸齒的 Resize，加上自適應(yīng)對(duì)比度。然后再根據(jù)需求疊加去偏斜、背景去除和形態(tài)學(xué)操作，你會(huì)發(fā)現(xiàn)模型似乎變“聰明”了，其實(shí)模型沒變只是輸入的數(shù)據(jù)終于變得講道理了。

https://avoid.overfit.cn/post/f9c16dc30adc4a52926b2831a9252d30

作者：Nexumo

特別聲明：以上內(nèi)容(如有圖片或視頻亦包括在內(nèi))為自媒體平臺(tái)“網(wǎng)易號(hào)”用戶上傳并發(fā)布，本平臺(tái)僅提供信息存儲(chǔ)服務(wù)。

Notice: The content above (including the pictures and videos if any) is uploaded and posted by a user of NetEase Hao, which is a social media platform and only provides information storage services.