Back to Explore

OCR & Text Extraction from Images — Study Pack Flashcards

Master OCR & Text Extraction from Images — Study Pack with these flashcards. Review key terms, definitions, and concepts using active recall to strengthen your understanding and ace your exams.

20 cards9 views
NotesFlashcardsQuiz
1 / 20
OCR

Click to flip

Optical Character Recognition (OCR) is the process of converting images of text into machine-readable text. It combines image processing, pattern recognition, and language modeling to extract and correct text from images.

Click to flip

Swipe to navigate between cards

Front

OCR

Back

Optical Character Recognition (OCR) is the process of converting images of text into machine-readable text. It combines image processing, pattern recognition, and language modeling to extract and correct text from images.

Front

Binarization

Back

Binarization converts grayscale images to black-and-white (foreground/background) to simplify text detection. Common methods include global thresholding, Otsu's method, and adaptive thresholding.

Front

Deskewing

Back

Deskewing corrects rotation in scanned or photographed pages so text lines align horizontally. Techniques include Hough transform line detection and projection profiles.

Front

Denoising

Back

Denoising removes visual noise like speckles or Gaussian blur to improve readability. Filters such as median, Gaussian, or bilateral filters are commonly used.

Front

Layout Analysis

Back

Layout analysis segments a page into logical regions (text, images, tables) and identifies reading order. It is crucial for multi-column documents and complex page structures.

Front

Segmentation

Back

Segmentation breaks text regions into lines, words, or characters for the recognition stage. Approaches include connected component analysis and neural network–based proposals.

Front

CRNN

Back

CRNN is a hybrid model combining Convolutional Neural Networks for feature extraction and Recurrent Neural Networks for sequence modeling, often used for text recognition. It pairs well with CTC loss for alignment-free training.

Front

CTC loss

Back

Connectionist Temporal Classification (CTC) loss trains sequence models without requiring frame-level labels. It marginalizes over alignments allowing variable-length outputs from image sequences.

Front

Transformer OCR

Back

Transformer-based OCR uses attention mechanisms to model long-range dependencies in text recognition. They can replace RNNs and often achieve strong results on complex layouts and variable-length text.

Front

Language Model

Back

A language model predicts probable word or character sequences to improve OCR outputs. It can be integrated during decoding to prefer grammatically or statistically likely results.

Front

Post-processing

Back

Post-processing refines recognized text using spell-checkers, dictionary lookup, and language models to correct errors. It often reduces CER and WER in noisy outputs.

Front

Levenshtein Distance

Back

Levenshtein distance measures the minimum number of insertions, deletions, and substitutions to transform one string into another. It is used to compute edit distance-based error metrics and autocorrection.

Front

Character Error Rate

Back

Character Error Rate (CER) is the ratio of edit operations (substitutions, deletions, insertions) to the total number of reference characters. It quantifies character-level OCR performance.

Front

Word Error Rate

Back

Word Error Rate (WER) is similar to CER but computed at the word level, capturing word-level recognition mistakes. WER is sensitive to spacing and tokenization errors.

Front

Confidence Score

Back

A confidence score indicates how certain the OCR engine is about a recognized token. Thresholding these scores enables human review for low-confidence outputs.

Front

Otsu's Method

Back

Otsu's method is an automatic global thresholding technique that picks a threshold minimizing intra-class intensity variance. It works well when foreground and background intensities are distinct.

Front

Connected Components

Back

Connected component analysis groups adjacent pixels with similar values into regions, useful for identifying characters or blobs in binarized text images. It's a common step in classical OCR pipelines.

Front

Edge Detection

Back

Edge detection algorithms like Canny highlight boundaries and help in detecting text contours and regions. They assist in segmentation and layout analysis tasks.

Front

Data Augmentation

Back

Data augmentation synthetically expands training datasets using rotations, scaling, blur, brightness changes, and warps. It helps models generalize to varied real-world imaging conditions.

Front

Human-in-the-loop

Back

Human-in-the-loop integrates manual verification for uncertain OCR outputs, improving overall accuracy for critical data. It is common in production systems where errors are costly.

Continue learning

Explore other study materials generated from the same source content. Each format reinforces your understanding of OCR & Text Extraction from Images — Study Pack in a different way.

Create your own flashcards

Turn your notes, PDFs, and lectures into flashcards with AI. Study smarter with spaced repetition.

Get Started Free
OCR & Text Extraction from Images — Study Pack Flashcards | Cramberry