





























Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
The concepts of edge detection and corner detection in image processing. It defines edges, blobs, and corners and explains their properties and detection methods. It also introduces the concept of scale-space and its importance in detecting features at different scales. different operators used for edge detection and corner detection, including Prewitt operator and Harris Corner Detector. It also explains the process of scale selection and keypoint matching. useful for students studying computer vision, image processing, and pattern recognition.
Typology: Study notes
1 / 37
This page cannot be seen from the preview
Don't miss anything!
Unit - 2
Edge: An Edge in an image is a sharp variation of the intensity function. In grayscale images this applies to the intensity or brightness of pixels. In color images it can also refer to sharp variations of color. An edge is distinguished from noise by possessing long range structure. Properties of edges include gradient and orientation. Edge Detection: Edge detection is a technique of image processing used to identify points in a digital image with discontinuities, simply to say, sharp changes in the image brightness. These points where the image brightness varies sharply are called the edges (or boundaries) of the image. Blob : A Blob, in a sense, is anything that is considered a large object or anything bright in a dark background, in images, we can generalize it as a group of pixel values that forms a somewhat colony or a large object that is distinguishable from its background. Using image processing, we can detect such blobs in an image. Corner : A corner is a point whose local neighborhood stands in two dominant and different edge directions. In other words, a corner can be interpreted as the junction of two edges, where an edge is a sudden change in image brightness. Corners are the important features in the image, and they are generally termed as interest points which are invariant to translation, rotation, and illumination.
Corner Detection: Corner detection is an approach used within computer vision systems to extract certain kinds of features and infer the contents of an image. Corner detection is frequently used in motion detection, image registration, video tracking, image mosaicing, panorama stitching, 3D reconstruction and object recognition. Corner detection overlaps with the topic of interest point detection. Scale-space: Real world objects are meaningful only at a certain scale. You might see a sugar cube perfectly on a table. But if looking at the entire milky way, then it simply does not exist. This multi-scale nature of objects is quite common in nature. And a scale space attempts to replicate this concept on digital images.
The concept of edge detection is used to detect the location and presence of edges by making changes in the intensity of an image. Different operations are used in image processing to detect edges. It can detect the variation of grey levels but it quickly gives response when a noise is detected. In image processing, edge detection is a very important task. Edge detection is the main tool in pattern recognition, image segmentation and scene analysis. It is a type of filter which is applied to extract the edge points in an image. Sudden changes in an image occurs when the edge of an image contour across the brightness of the image. In image processing, edges are interpreted as a single class of singularity. In a function, the singularity is characterized as discontinuities in which the gradient approaches are infinity. As we know that the image data is in the discrete form so edges of the image are defined as the local maxima of the gradient. lll Mostly edges exits between objects and objects, primitives and primitives, objects and background. The objects which are reflected back are in discontinuous form. Methods of edge detection study to change a single pixel of an image in gray area. Edge detection is mostly used for the measurement, detection and location changes in an image gray. Edges are the basic feature of an image. In an object, the clearest part is the edges and lines. With the help of edges and lines, an object structure is known. That is why extracting the edges is a very important technique in graphics processing and feature extraction. The basic idea behind edge detection is as follows:
Following is the gradient magnitude: As it is much faster to compute An approximate magnitude is computed:
This is 3 discrete approximations which are used commonly in Laplacian filter. Following is 2-D Log function with Gaussian standard deviation:
Remember that we want the SSD to be large in shifts for all eight directions, or conversely, for the SSD to be small for none of the directions. By solving for the eigenvectors of M, we can obtain the directions for both the largest and smallest increases in SSD. The corresponding eigenvalues give us the actual value amount of these increases. A score, R, is calculated for each window: λ1 and λ2 are the eigenvalues of M. So the values of these eigenvalues decide whether a region is a corner, edge or flat. ● When |R| is small, which happens when λ1 and λ2 are small, the region is flat. ● When R<0, which happens when λ1>>λ2 or vice versa, the region is an edge. ● When R is large, which happens when λ1 and λ2 are large and λ 1 ∼λ2, the region is a corner. SCALE SPACE AND SCALE SELECTION Scale-space theory is a framework for multi-scale signal representation developed by the computer vision, image processing and signal processing communities with complementary
motivations from physics and biological vision. It is a formal theory for handling image structures at different scales, by representing an image as a one-parameter family of smoothed images, the scale-space representation, parametrized by the size of the smoothing kernel used for suppressing fine-scale structures. The parameter t in this family is referred to as the scale parameter, with the interpretation that image structures of spatial size smaller than about 𝑡 have largely been smoothed away in the scale-space level at scale t. The main type of scale space is the linear (Gaussian) scale space, which has wide applicability as well as the attractive property of being possible to derive from a small set of scale-space axioms. The corresponding scale-space framework encompasses a theory for Gaussian derivative operators, which can be used as a basis for expressing a large class of visual operations for computerized systems that process visual information. This framework also allows visual operations to be made scale invariant, which is necessary for dealing with the size variations that may occur in image data, because real-world objects may be of different sizes and in addition the distance between the object and the camera may be unknown and may vary depending on the circumstances. Scale selection: The theory presented so far describes a well-founded framework for representing image structures at multiple scales. In many cases it is, however, also necessary to select locally appropriate scales for further analysis. This need for scale selection originates from two major reasons; (i) real-world objects may have different size, and this size may be unknown to the vision system, and (ii) the distance between the object and the camera can vary, and this
● Scale-space peak selection: Potential location for finding features. ● Keypoint Localization: Accurately locating the feature keypoints. ● Orientation Assignment: Assigning orientation to keypoints. ● Keypoint descriptor: Describing the keypoints as a high dimensional vector. ● Keypoint Matching The scale space of an image is a function L(x,y,σ) that is produced from the convolution of a Gaussian kernel(Blurring) at different scales with the input image. Scale-space is separated into octaves and the number of octaves and scale depends on the size of the original image. So we generate several octaves of the original image. Each octave’s image size is half the previous one. Blurring Within an octave, images are progressively blurred using the Gaussian Blur operator. Mathematically, “blurring” is referred to as the convolution of the Gaussian operator and the image. Gaussian blur has a particular expression or “operator” that is applied to each pixel. What results is the blurred image. G is the Gaussian Blur operator and I is an image. While x,y are the location coordinates and σ is the “scale” parameter. Think of it as the amount of blur. Greater the value, greater the blur. DOG(Difference of Gaussian kernel) Now we use those blurred images to generate another set of images, the Difference of Gaussians (DoG). These DoG images are great for finding out interesting keypoints in the image. The difference of Gaussian is obtained as the difference of Gaussian blurring of an image with two
different σ, let it be σ and kσ. This process is done for different octaves of the image in the Gaussian Pyramid. It is represented in below image: Finding keypoints Up till now, we have generated a scale space and used the scale space to calculate the Difference of Gaussians. Those are then used to calculate Laplacian of Gaussian approximations that are scale invariant. One pixel in an image is compared with its 8 neighbors as well as 9 pixels in the next scale and 9 pixels in previous scales. This way, a total of 26 checks are made. If it is a local extrema, it is a potential keypoint. It basically means that keypoint is best represented in that scale. Keypoint Localization Key0points generated in the previous step produce a lot of keypoints. Some of them lie along an edge, or they don’t have enough contrast. In both cases, they are not as useful as features. So we
A neighborhood is taken around the keypoint location depending on the scale, and the gradient magnitude and direction is calculated in that region. An orientation histogram with 36 bins covering 360 degrees is created. Let's say the gradient direction at a certain point (in the “orientation collection region”) is 18.759 degrees, then it will go into the 10–19-degree bin. And the “amount” that is added to the bin is proportional to the magnitude of the gradient at that point. Once you’ve done this for all pixels around the keypoint, the histogram will have a peak at some point. The highest peak in the histogram is taken and any peak above 80% of it is also considered to calculate the orientation. It creates keypoints with same location and scale, but different directions. It contributes to the stability of matching. Keypoint descriptor At this point, each keypoint has a location, scale, orientation. Next is to compute a descriptor for the local image region about each keypoint that is highly distinctive and invariant as possible to variations such as changes in viewpoint and illumination. To do this, a 16x16 window around the keypoint is taken. It is divided into 16 sub-blocks of 4x size.
For each sub-block, 8 bin orientation histogram is created. So 4 X 4 descriptors over 16 X 16 sample array were used in practice. 4 X 4 X 8 directions give 128 bin values. It is represented as a feature vector to form keypoint descriptor. This feature vector introduces a few complications. We need to get rid of them before finalizing the fingerprint.
With I_Σ calculated, it only takes four additions to calculate the sum of the intensities over any upright, rectangular area, independent of its size. Hessian matrix-based interest points Surf uses the Hessian matrix because of its good performance in computation time and accuracy. Rather than using a different measure for selecting the location and the scale (Hessian-Laplace detector), surf relies on the determinant of the Hessian matrix for both. Given a pixel, the Hessian of this pixel is something like: For adapt to any scale, we filtered the image by a Gaussian kernel, so given a point X = (x, y), the Hessian matrix H(x, σ) in x at scale σ is defined as: where Lxx(x, σ) is the convolution of the Gaussian second order derivative with the image I in point x, and similarly for Lxy (x, σ) and Lyy (x, σ). Gaussians are optimal for scale-space analysis but in practice, they have to be discretized and cropped. This leads to a loss in repeatability under image rotations around odd multiples of π /4. This weakness holds for Hessian-based detectors in general. Nevertheless, the detectors still perform well, and the slight decrease in performance does not outweigh the advantage of fast convolutions brought by the discretization and cropping. In order to calculate the determinant of the Hessian matrix, first we need to apply convolution with Gaussian kernel, then second-order derivative. After Lowe’s success with LoG approximations(SIFT), SURF pushes the approximation(both convolution and second-order derivative) even further with box filters. These approximate second-order Gaussian derivatives and can be evaluated at a very low computational cost using integral images and independently of size, and this is part of the reason why SURF is fast.
Gaussian partial derivative in xy Gaussian partial derivative in y The 9 × 9 box filters in the above images are approximations for Gaussian second order derivatives with σ = 1.2. We denote these approximations by Dxx, Dyy, and Dxy. Now we can represent the determinant of the Hessian (approximated) as: w=0.9 (Bay’s suggestion) Scale-space representation Scale spaces are usually implemented as image pyramids. The images are repeatedly smoothed with a Gaussian and subsequently sub-sampled in order to achieve a higher level of the pyramid. Due to the use of box filters and integral images, surf does not have to iteratively apply the same filter to the output of a previously filtered layer but instead can apply such filters of any size at exactly the same speed directly on the original image, and even in parallel. Therefore, the scale space is analyzed by up-scaling the filter size(9×9 → 15×15 → 21×21 → 27×27, etc) rather than iteratively reducing the image size. So for each new octave, the filter size increase is doubled simultaneously the sampling intervals for the extraction of the interest points(σ) can be doubled as well which allow the up-scaling of the filter at constant cost. In order to localize interest points in the image and over scales, a nonmaximum suppression in a 3 × 3 × 3 neighborhood is applied.
Descriptor Components Now it’s time to extract the descriptor
regions of the image it generates histograms using the magnitude and orientations of the gradient. LOCAL BINARY PATTERNS (LBP) Local Binary Pattern (LBP) is a simple yet very efficient texture operator which labels the pixels of an image by thresholding the neighborhood of each pixel and considers the result as a binary number. Due to its discriminative power and computational simplicity, LBP texture operator has become a popular approach in various applications. It can be seen as a unifying approach to the traditionally divergent statistical and structural models of texture analysis. Perhaps the most important property of the LBP operator in real-world applications is its robustness to monotonic gray-scale changes caused, for example, by illumination variations. Another important property is its computational simplicity, which makes it possible to analyze images in challenging real-time settings. VISUAL MATCHING - BAG-OF-WORDS In computer vision, the bag-of-words model (BoW model) sometimes called bag-of-visual-words model can be applied to image classification or retrieval, by treating image features as words. In document classification, a bag of words is a sparse vector of occurrence counts of words; that is, a sparse histogram over the vocabulary. In computer vision, a bag of visual words is a vector of occurrence counts of a vocabulary of local image features. Image representation based on the BoW model: To represent an image using the BoW model, an image can be treated as a document. Similarly, "words" in images need to be defined too. To achieve this, it usually includes following three steps: feature detection, feature description, and codebook generation. A definition of the BoW model can be the "histogram representation based on independent features". Content based image indexing and retrieval (CBIR) appears to be the early adopter of this image representation technique. Feature representation: After feature detection, each image is abstracted by several local patches. Feature representation methods deal with how to represent the patches as numerical vectors. These vectors are called feature descriptors. A good descriptor should have the ability to handle intensity, rotation, scale and affine variations to some extent. One of the most famous descriptors is Scale-invariant feature transform (SIFT). SIFT converts each patch to 128-dimensional vector. After this step, each image is a collection of vectors of the same dimension (128 for SIFT), where the order of different vectors is of no importance.