OCR & Text Extraction from Images — Study Pack Flashcards
Master OCR & Text Extraction from Images — Study Pack with these flashcards. Review key terms, definitions, and concepts using active recall to strengthen your understanding and ace your exams.
Swipe to navigate between cards
Front
OCR
Back
Optical Character Recognition (OCR) is the process of converting images of text into machine-readable text. It combines image processing, pattern recognition, and language modeling to extract and correct text from images.
Front
Binarization
Back
Binarization converts grayscale images to black-and-white (foreground/background) to simplify text detection. Common methods include global thresholding, Otsu's method, and adaptive thresholding.
Front
Deskewing
Back
Deskewing corrects rotation in scanned or photographed pages so text lines align horizontally. Techniques include Hough transform line detection and projection profiles.
Front
Denoising
Back
Denoising removes visual noise like speckles or Gaussian blur to improve readability. Filters such as median, Gaussian, or bilateral filters are commonly used.
Front
Layout Analysis
Back
Layout analysis segments a page into logical regions (text, images, tables) and identifies reading order. It is crucial for multi-column documents and complex page structures.
Front
Segmentation
Back
Segmentation breaks text regions into lines, words, or characters for the recognition stage. Approaches include connected component analysis and neural network–based proposals.
Front
CRNN
Back
CRNN is a hybrid model combining Convolutional Neural Networks for feature extraction and Recurrent Neural Networks for sequence modeling, often used for text recognition. It pairs well with CTC loss for alignment-free training.
Front
CTC loss
Back
Connectionist Temporal Classification (CTC) loss trains sequence models without requiring frame-level labels. It marginalizes over alignments allowing variable-length outputs from image sequences.
Front
Transformer OCR
Back
Transformer-based OCR uses attention mechanisms to model long-range dependencies in text recognition. They can replace RNNs and often achieve strong results on complex layouts and variable-length text.
Front
Language Model
Back
A language model predicts probable word or character sequences to improve OCR outputs. It can be integrated during decoding to prefer grammatically or statistically likely results.
Front
Post-processing
Back
Post-processing refines recognized text using spell-checkers, dictionary lookup, and language models to correct errors. It often reduces CER and WER in noisy outputs.
Front
Levenshtein Distance
Back
Levenshtein distance measures the minimum number of insertions, deletions, and substitutions to transform one string into another. It is used to compute edit distance-based error metrics and autocorrection.
Front
Character Error Rate
Back
Character Error Rate (CER) is the ratio of edit operations (substitutions, deletions, insertions) to the total number of reference characters. It quantifies character-level OCR performance.
Front
Word Error Rate
Back
Word Error Rate (WER) is similar to CER but computed at the word level, capturing word-level recognition mistakes. WER is sensitive to spacing and tokenization errors.
Front
Confidence Score
Back
A confidence score indicates how certain the OCR engine is about a recognized token. Thresholding these scores enables human review for low-confidence outputs.
Front
Otsu's Method
Back
Otsu's method is an automatic global thresholding technique that picks a threshold minimizing intra-class intensity variance. It works well when foreground and background intensities are distinct.
Front
Connected Components
Back
Connected component analysis groups adjacent pixels with similar values into regions, useful for identifying characters or blobs in binarized text images. It's a common step in classical OCR pipelines.
Front
Edge Detection
Back
Edge detection algorithms like Canny highlight boundaries and help in detecting text contours and regions. They assist in segmentation and layout analysis tasks.
Front
Data Augmentation
Back
Data augmentation synthetically expands training datasets using rotations, scaling, blur, brightness changes, and warps. It helps models generalize to varied real-world imaging conditions.
Front
Human-in-the-loop
Back
Human-in-the-loop integrates manual verification for uncertain OCR outputs, improving overall accuracy for critical data. It is common in production systems where errors are costly.
Continue learning
Explore other study materials generated from the same source content. Each format reinforces your understanding of OCR & Text Extraction from Images — Study Pack in a different way.
Create your own flashcards
Turn your notes, PDFs, and lectures into flashcards with AI. Study smarter with spaced repetition.
Get Started Free