|
Select year: 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018 Seminars in 2014
2014-12-20
This paper proposes a method for detecting objects carried by pedestrians, such as backpacks and suitcases, from video sequences. In common with earlier work [14], [16] on the same problem, the method produces a representation of motion and shape (known as a temporal template) that has some immunity to noise in foreground segmentations and phase of the walking cycle. Our key novelty is for carried objects to be revealed by comparing the temporal templates against view-specific exemplars generated offline for unencumbered pedestrians. A likelihood map of protrusions, obtained from this match, is combined in a Markov random field for spatial continuity, from which we obtain a segmentation of carried objects using the MAP solution. We also compare the previously used method of periodicity analysis to distinguish carried objects from other protrusions with using prior probabilities for carried-object locations relative to the silhouette. We have reimplemented the earlier state-of-the-art method [14] and demonstrate a substantial improvement in performance for the new method on the PETS2006 data set. The carried-object detector is also tested on another outdoor data set. Although developed for a specific problem, the method could be applied to the detection of irregularities in appearance for other categories of object that move in a periodic fashion.
2014-12-13
Background subtraction is a fundamental low-level
processing task in numerous computer vision applications. The
vast majority of algorithms process images on a pixel-by-pixel
basis, where an independent decision is made for each pixel.
A general limitation of such processing is that rich contextual
information is not taken into account. We propose a block-based
method capable of dealing with noise, illumination variations, and
dynamic backgrounds, while still obtaining smooth contours of
foreground objects. Specifically, image sequences are analyzed on
an overlapping block-by-block basis. A low-dimensional texture
descriptor obtained from each block is passed through an
adaptive classifier cascade, where each stage handles a distinct
problem. A probabilistic foreground mask generation approach
then exploits block overlaps to integrate interim block-level
decisions into final pixel-level foreground segmentation. Unlike
many pixel-based methods, ad-hoc postprocessing of foreground
masks is not required. Experiments on the difficult Wallflower
and I2R datasets show that the proposed approach obtains on
average better results (both qualitatively and quantitatively) than
several prominent methods. We furthermore propose the use of
tracking performance as an unbiased approach for assessing
the practical usefulness of foreground segmentation methods,
and show that the proposed approach leads to considerable
improvements in tracking accuracy on the CAVIAR dataset. Attached files: Improved Foreground Detection via Block-Based Classifier Cascade With Probabilistic Decision Integration (Journal).pdf
2014-12-06
This paper proposes a vacant parking slot detection
and tracking system that fuses the sensors of an Around View
Monitor (AVM) system and an ultrasonic sensor-based automatic
parking system. The proposed system consists of three stages:
parking slot marking detection, parking slot occupancy classification,
and parking slot marking tracking. The parking slot marking
detection stage recognizes various types of parking slot markings
using AVM image sequences. It detects parking slots in individual
AVM images by exploiting a hierarchical tree structure of parking
slot markings and combines sequential detection results. The
parking slot occupancy classification stage identifies vacancies of
detected parking slots using ultrasonic sensor data. Parking slot
occupancy is probabilistically calculated by treating each parking
slot region as a single cell of the occupancy grid. The parking
slot marking tracking stage continuously estimates the position of
the selected parking slot while the ego-vehicle is moving into it.
During tracking, AVM images and motion sensor-based odometry
are fused together in the chamfer score level to achieve robustness
against inevitable occlusions caused by the ego-vehicle. In the
experiments, it is shown that the proposed method can recognize
the positions and occupancies of various types of parking slot
markings and stably track them under practical situations in
a real-time manner. The proposed system is expected to help
drivers conveniently select one of the available parking slots and
support the parking control system by continuously updating the
designated target positions
2014-11-29
Abstract?Video text which contains rich semantic information
can be utilized for video indexing and summarization. However,
compared with scanned documents, text recogniton for video
text is still a challenging problem due to complex background.
Segmenting text line into single characters before text extraction
can achieve higher recognition accuracy, since background of
single character is less complex compared with whole text line.
Therefore, we first perform character segmentation, which can
accurately locate the character gap in the text line. More
specifically, we get a fusion map which fuses the results of
color gradient and log-gabor filter. Then, candidate segmentation
points are obtained by vertical projection analysis of the fusion
map. We get segmentation points by finding minimum projection
value of candidate points in a limited range. Finally, we get the
binary image of the single character image by applying K-means
clustering and combine their results to form binary image of the
whole text line. The binary image is further refined by inward
filling and the fusion map. The experimental results on a large
amount of data show that the proposed method can contribute
to better binarization result which leads to a higher character
recognition rate of OCR engine. Attached files: Video text extraction using the fusion of color gradient and log garbor filter.pdf
2014-11-22
People reidentification is one of the most challenging tasks in computer vision, and considerable efforts have been directed toward providing solutions to this problem. The existence of extensive camera networks and surveillance systems increases the amount of people images obtained, but, on the other hand, implies the need for new algorithms to enable reidentification of people captured by the cameras. There is no one optimal model that solves the entire problem, but a set of distinctive features can be used to help in the matching process. Our proposal consists of using the orientation of each person captured in the surveillance scene to considerably improve the reidentification process. An iterative algorithm maximizes the number of successful matches and speeds up the process. A comparison with other earlier relevant studies is presented using available datasets. Attached files: people reidentification.pdf
2014-11-08
This paper provides a real-time vehicle classification and ounting system based on WSNs, namely, EasiSee.
?? Accurate vehicle classification.
?? Low-delay real-time performance.
?? Low resource consumption.
Propose an event trigger mechanism-CSM(collaborative ensing mechanism), which activates the camera sensor node only when a vehicle detected, to avoid keeping the camera sensor node working all the time.
Propose a robust vehicle image processing algorithm with low computational complexity, including the vehicle image segmentation and physical feature extraction. Attached files: 20141108-report-Yang Yu.pdf
2014-11-01
We present a stereo algorithm designed for speed and efficiency that uses local slanted plane sweeps to propose disparity hypotheses for a semi-global matching algorithm. Our local plane hypotheses are derived from initial sparse feature correspondences followed by an iterative
clustering step. Local plane sweeps are then performed around each slanted plane to produce out-of-plane parallax and matching-cost estimates. A final global optimization stage, implemented using semi-global matching, assigns each pixel to one of the local plane hypotheses. By
only exploring a small fraction of the whole disparity space volume, our technique achieves significant speedups over previous algorithms and achieves state-of-the-art accuracy
on high-resolution stereo pairs of up to 19 megapixels. Attached files: CVPR2014_Efficient High-Resolution Stereo Matching using Local Plane Sweeps.pdf
2014-10-11
Recent years have seen greater interest in the use of discriminative classiers in tracking systems, owing to their success in object detection. They are trained online with samples collected during tracking. Unfortunately, the potentially large number of samples becomes a computational burden, which directly con icts with real-time requirements. On the other hand, limiting the samples may sacrice performance. Interestingly, we observed that, as we add more and more samples, the problem acquires circulant structure. Using the well-established theory of Circulant matrices, we provide a link to Fourier analysis that opens up the possibility of extremely fast learning and detection with the Fast Fourier Transform. This can be done in the dual space of kernel machines as fast as with linear classiers. We derive closed-form solutions for training and detection with several types of kernels, including the popular Gaussian and polynomial kernels. The resulting tracker achieves performance competitive with the state-of-the-art, can be implemented with only a few lines of code and runs at hundreds of frames-per-second. MATLAB code is provided in the paper. Attached files: csk_tracker_eccv2012.pdf
2014-10-04
Automatic pedestrian detection for advanced driver assistance systems (ADASs) is still a challenging task. Major reasons are dynamic and complex backgrounds in street scenes and variations in clothing or postures of pedestrians. We propose a simple yet effective detector for robust pedestrian detection. Observing that pedestrians usually appear upright in video data, we employ a statistical model of the upright human body in which the head, upper body, and lower body are treated as three distinct components. Our main contribution is to systematically design a pool of rectangular features that are tailored to this shape model. As we incorporate different kinds of low-level measurements, the resulting multimodal and multichannel Haar-like features represent characteristic differences between parts of the human body but are robust against variations in clothing or environmental settings. Our approach avoids exhaustive searches over all possible configurations of rectangular features nor does it rely on random sampling. It thus marks a middle ground among recently published techniques and yields efficient low-dimensional yet highly discriminative features. Experimental results on the well-established INRIA, Caltech, and KITTI pedestrian data sets show that our detector reaches state-of-the-art performance at low computational costs and that our features are robust against occlusions. Attached files: Efficient Pedestrian Detection via Rectangular Feature Based on a Statistical Shape Model (ITS-Sept 2014).pdf
2014-09-27
Most of the existing traffic sign recognition (TSR) systems make use of the inner region of the signs or the local features such as Haar, histograms of oriented gradients (HOG), and scale-invariant feature transform for recognition, whereas these features are still limited to deal with the rotation, illumination, and scale variations situations. A good feature of a traffic sign is desired to be discriminative and robust. In this paper, a novel Color Global and Local Oriented Edge Magnitude Pattern (Color Global LOEMP) is proposed. The Color Global LOEMP is a framework that is able to effectively combine color, global spatial structure, global direction structure, and local shape information and balance the two concerns of distinctiveness and robustness. The contributions of this paper are as follows: 1) color angular patterns are proposed to provide the color distinguishing information; 2) a context frame is established to provide global spatial information, due to the fact that the context frame is established by the shape of the traffic sign, thus allowing the cells to be aligned well with the inside part of the traffic sign even when rotation and scale variations occur; and 3) a LOEMP is proposed to represent each cell. In each cell, the distribution of the orientation patterns is described by the HOG feature, and then, each direction of HOG is represented in detail by the occurrence of local binary pattern histogram in this direction. Experiments are performed to validate the effectiveness of the proposed approach with TSR systems, and the experimental results are satisfying, even for images containing traffic signs that have been rotated, damaged, altered in color, or undergone affine transformations or images that were photographed under different weather or illumination conditions.
2014-09-20
In this paper, we present a novel object detection approach that is capable of regressing the aspect ratio of objects. This results in accurately predicted bounding boxes having high overlap with the ground truth. In contrast to most recent works, we employ a Random Forest for learning a template-based model but exploit the nature of this learning algorithm to predict arbitrary output spaces. In this way, we can simultaneously predict the object probability of a window in a sliding window approach as well as regress its aspect ratio with a single model. Furthermore, we also exploit the additional information of the aspect ratio during the training of the Joint Classification-Regression Random Forest, resulting in better detection models. Our experiments demonstrate several benefits: (i) Our approach gives competitive results on standard detection benchmarks. (ii) The additional aspect ratio regression delivers more accurate bounding boxes than standard object detection approaches in terms of overlap with ground truth, especially when tightening the evaluation criterion. (iii) The detector itself becomes better by only including the aspect ratio information during training.
2014-09-13
This paper presents a robust and efficient text detection algorithm for news video. The proposed
algorithm uses the temporal information of video and logical AND operation to remove most of irrelevant
background. Then a window-based method by counting the black-and-white transitions is applied on the
resulted edge map to obtain rough text blobs. Line deletion technique is used twice to refine the text blocks.
The proposed algorithm is applicable to multiple languages (English, Japanese and Chinese), robust to text
polarities (positive or negative), various character sizes (from 4?7 to 30?30), and text alignments (horizontal or
vertical).Three metrics, recall (R), precision (P), and quality of bounding preciseness (Q), are adopted to
measure the efficacy of text detection algorithms. According to the experimental results on various multilingual
video sequences, the proposed algorithm has a 96% and above performance in all three metrics. Comparing to
existing methods, our method has better performance especially in the quality of bounding preciseness that is
crucial to later binarization process. Attached files: Robust news video text detection based on edges and line deletion.pdf
2014-09-06
Computer-aided sports analysis is demanded by coaches and the media. Image processing and machine learning techniques that allow for ??live?? recognition nd tracking of players exist. But these methods are far from collecting and nalyzing event data fully autonomously. To generate accurate results, human nteraction is required at different stages including system setup, calibration, supervision of classifier training, and resolution of tracking conflicts. urthermore, the real-time constraints are challenging: in contrast to other bject recognition and tracking applications, we cannot treat data collection, nnotation, and learning as an offline task. A semi-automatic labeling of training ata and robust learning given few examples from unbalanced classes are required. e present a realtime system acquiring and analyzing video sequences from soccer atches. It estimates each player?s position throughout the whole match in real- ime. Performance measures derived from these raw data allow for an objective evaluation of physical and tactical profiles of teams and individuals. The need or precise object recognition, the restricted working environment, and the echnical limitations of a mobile setup are taken into account. Our contribution s twofold: (1) the deliberate use of machine learning and pattern recognition echniques allows us to achieve high classification accuracy in varying nvironments. We systematically evaluate combinations of image features and earning machines in the given online scenario. Switching between classifiers epending on the amount of training data and available training time improves obustness and efficiency. (2) A proper human? machine interface decreases the umber of required operators who are incorporated into the system?s learning process. Their main task reduces to the identification of players in uncertain ituations. Our experiments showed high performance in the classification task chieving an average error rate of 3 % on three real-world datasets. The system as proved to collect accurate tracking statistics throughout different soccer atches in real-time by incorporating two human operators only. We finally show ow the resulting data can be used instantly for consumer applications and discuss urther development in the context of behavior analysis. Attached files: Adaptive pattern recognition in real-time video-based soccer analysis.pdf
2014-09-13
In this paper, we present a novel object detection approach that is capable of regressing the aspect ratio of objects. This results in accurately predicted bounding boxes having high overlap with the ground truth. In contrast to most recent works, we employ a Random Forest for learning a template-based model but exploit the nature of this learning algorithm to predict arbitrary output spaces. In this way, we can simultaneously predict the object probability of a window in a sliding window approach as well as regress its aspect ratio with a single model. Furthermore, we also exploit the additional information of the aspect ratio during the training of the Joint Classification-Regression Random Forest, resulting in better detection models. Our experiments demonstrate several benefits: (i) Our approach gives competitive results on standard detection benchmarks. (ii) The additional aspect ratio regression delivers more accurate bounding boxes than standard object detection approaches in terms of overlap with ground truth, especially when tightening the evaluation criterion. (iii) The detector itself becomes better by only including the aspect ratio information during training.
2014-08-09
Recent years have seen greater interest in the use of discriminative
classiers in tracking systems, owing to their success in object detection.
They are trained online with samples collected during tracking.
Unfortunately, the potentially large number of samples becomes a computational
burden, which directly con
icts with real-time requirements.
On the other hand, limiting the samples may sacrice performance.
Interestingly, we observed that, as we add more and more samples, the
problem acquires circulant structure. Using the well-established theory
of Circulant matrices, we provide a link to Fourier analysis that opens
up the possibility of extremely fast learning and detection with the Fast
Fourier Transform. This can be done in the dual space of kernel machines
as fast as with linear classiers. We derive closed-form solutions
for training and detection with several types of kernels, including the
popular Gaussian and polynomial kernels. The resulting tracker achieves
performance competitive with the state-of-the-art, can be implemented
with only a few lines of code and runs at hundreds of frames-per-second.
MATLAB code is provided in the paper (see Algorithm 1). Attached files: csk_tracker_eccv2012.pdf
2014-08-02
Traffic sign detection and recognition has been thoroughly studied for a long time. However, traffic panel detection and recognition still remains a challenge in computer vision due to its different types and the huge variability of the information depicted in them. This paper presents a method to detect traffic panels in street-level images and to recognize the information contained on them, as an application to intelligent transportation systems (ITS). The main purpose can be to make an automatic inventory of the traffic panels located in a road to support road maintenance and to assist drivers. Our proposal extracts local
descriptors at some interest keypoints after applying blue and white color segmentation. Then, images are represented as a ?bag of visual words? and classified using Na?ve Bayes or support vector machines. This visual appearance categorization method is a new approach for traffic panel detection in the state of the art. Finally, our own text detection and recognition method is applied on those images where a traffic panel has been detected, in order to automatically
read and save the information depicted in the panels.We propose a language model partly based on a dynamic dictionary for a limited geographical area using a reverse geocoding service. Experimental results on real images from Google Street View prove the efficiency of the proposed method and give way to using street-level images for different applications on ITS. Attached files: 06587069.pdf
2014-07-26
This paper proposes a novel method for tracking failure detection. The detection is based on the Forward-Backward error, i.e. the tracking is performed forward and backward in time and the discrepancies between these two trajectories are measured. We demonstrate that the proposed error enables reliable detection of tracking failures and selection of reliable trajectories in video sequences. We demonstrate that the approach is complementary to commonly used normalized cross-correlation (NCC). Based on the error, we propose a novel object tracker called Median Flow. State-of-the-art performance is achieved on challenging benchmark video sequences which include non-rigid objects. Attached files: Forward-Backward Error, Automatic Detection of Tracking Failures.pdf
2014-07-19
We describe a state-of-the-art system for finding objects in cluttered images. Our system is based on deformable models that represent objects using local part templates and geometric constraints on the locations of parts. We reduce object detection to classification with latent variables. The latent variables introduce invariances that make it possible to detect objects with highly variable appearance. We use a generalization of support vector machines to incorporate latent information during training. This has led to a general framework for discriminative training of classifiers with latent variables. Discriminative training benefits from large training datasets. In practice we use an terative algorithm that alternates between estimating latent values for positive examples and solving a large convex optimization problem. Practical optimization of this large convex problem can be done using active set techniques for adaptive subsampling of the training data. Attached files: Visual Object Detection with Deformable Part Models ACM2013.pdf
2014-07-12
Computer vision applications have come to rely increasingly on superpixels in recent years, but it is not always clear what constitutes a good superpixel algorithm. In an effort to understand the benefits and drawbacks of existing methods, we empirically compare five state-of-the-art superpixel algorithms for their ability to adhere to image boundaries, speed, memory efficiency, and their impact on segmentation performance. We then introduce a new superpixel algorithm, simple linear iterative clustering (SLIC), which adapts a k-means clustering approach to efficiently generate superpixels. Despite its simplicity, SLIC adheres to boundaries as well as or better than previous methods. At the same time, it is faster and more memory efficient, improves segmentation performance, and is straightforward to extend to supervoxel generation. Attached files: PAMI(2012) SLIC Superpixels Compared to State-of-the-Art Superpixel Methods.pdf
2014-07-12
We describe a state-of-the-art system for finding objects in cluttered images. Our system is based on deformable models that represent objects using local part templates and geometric constraints on the locations of parts. We reduce object detection to classification with latent variables. The latent variables introduce invariances that make it possible to detect objects with highly variable appearance. We use a generalization of support vector machines to incorporate latent information during training. This has led to a general framework for discriminative training of classifiers with latent variables. Discriminative training benefits from large training datasets. In practice we use an terative algorithm that alternates between estimating latent values for positive examples and solving a large convex optimization problem. Practical optimization of this large convex problem can be done using active set techniques for adaptive subsampling of the training data. Attached files: Visual Object Detection with Deformable Part Models ACM2013.pdf
2014-06-28
Catadioptric omnidirectional view sensors have found increasing adoption in various robotic and surveillance applications due to their 360? field of view. However, the inherent distortion caused by the sensors prevents their direct utilisations using existing image processing techniques developed for perspective images. Therefore, a correction processing known as ??unwrapping?? is commonly performed. However, the unwrapping process incursadditional computational loads on central processing units. In this paper, a method to reduce this burden in the computation is investigated by exploiting the parallelism of graphical processing units (GPUs) based on the Compute Unified Device Architecture (CUDA). More specifically, we first introduce a general approach of parallelisation to the said process. Then, a series of adaptations to the CUDA platform is proposed to enable an optimised usage of the hardware platform. Finally, the performances of the unwrapping function were evaluated on a high-end and low-end GPU to demonstrate the effectiveness of the parallelisation approach. Attached files: gpu_unwrap.pdf
2014-06-21
Abstract?Detecting text and caption from videos is important
and in great demand for video retrieval, annotation, indexing, and
content analysis. In this paper, we present a corner based approach
to detect text and caption from videos. This approach is inspired
by the observation that there exist dense and orderly presences of
corner points in characters, especially in text and caption. We use
several discriminative features to describe the text regions formed
by the corner points. The usage of these features is in a flexible
manner, thus, can be adapted to different applications. Language
independence is an important advantage of the proposed method.
Moreover, based upon the text features, we further develop a novel
algorithm to detect moving captions in videos. In the algorithm, the
motion features, extracted by optical flow, are combined with text
features to detect the moving caption patterns. The decision tree is
adopted to learn the classification criteria. Experiments conducted
on a large volume of real video shots demonstrate the efficiency and
robustness of our proposed approaches and the real-world system.
Our text and caption detection system was recently highlighted in
a worldwide multimedia retrieval competition, Star Challenge, by
achieving the superior performance with the top ranking. Attached files: Text from corners_a novel approach to detect text and captions in videos.pdf
2014-06-07
Training a generic objectness measure to produce a
small set of candidate object windows, has been shown
to speed up the classical sliding window object detection
paradigm. We observe that generic objects with well-
defined closed boundary can be discriminated by looking at
the norm of gradients, with a suitable resizing of their cor-
responding image windows in to a small fixed size. Based
on this observation and computational reasons, we propose
to resize the window to 8x8 and use the norm of the gra-
dients as a simple 64D feature to describe it, for explicitly
training a generic objectness measure.
We further show how the binarized version of this fea-
ture, namely binarized normed gradients (BING), can be
used for efficient objectness estimation, which requires only
a few atomic operations (e.g. ADD, BITWISE SHIFT, etc.).
Experiments on the challenging PASCAL VOC 2007 dataset
show that our method efficiently (300fps on a single lap-
top CPU) generates a small set of category-independent,
high quality object windows, yielding 96:2% object detec-
tion rate (DR) with 1,000 proposals. Increasing the num-
bers of proposals and color spaces for computing BING fea-
tures, our performance can be further improved to 99:5% DR Attached files: ObjectnessBING.pdf
2014-05-24
In this paper, we study the problem of detecting sudden pedestrian crossings to assist drivers in avoiding accidents. This application has two major requirements: to detect crossing pedestrians as early as possible just as they enter the view of the car-mounted camera and to maintain a false alarm rate as low as possible for practical purposes. Although many current sliding-window-based approaches using various features and classification algorithms have been proposed for image-/video-based pedestrian detection, their performance in terms of accuracy and processing speed falls far short of practical application requirements. To address this problem, we propose a three-level coarseto-fine video-based framework that detects partially visible pedestrians just as they enter the camera view, with low false alarm rate and high speed. The framework is tested on a new collection of high-resolution videos captured from a moving vehicle and yields a performance better than that of state-of-the-art pedestrian detection while running at a frame rate of 55 fps.
Index Terms—Coarse to fine, pedestrian detection, performance evaluation, spatiotemporal refinement, sudden pedestrian crossing Attached files: 06093757.pdf
2014-05-17
We consider the problem of detection and tracking of multiple people in crowded street scenes. State-ofthe-art methods perform well in scenes with relatively few people, but are severely challenged by scenes with many subjects that partially occlude each other. This limitation is due to the fact that current people detectors fail when persons are strongly occluded. We observe that typical occlusions are due to overlaps between people and propose a people detector tailored to various occlusion levels. Instead of treating partial occlusions as distractions, we leverage the fact that person/person occlusions result in very characteristic appearance patterns that can help to improve detection results. We demonstrate the performance of our occlusion-aware person detector on a new dataset of people with controlled but severe levels of occlusion and on two challenging publicly available benchmarks outperforming single person detectors in each case. Attached files: Detection and Tracking of Occluded People _IJCV2013.pdf
2014-05-10
Background subtraction has been widely investigated in recent
years. Most previous work has focused on stationary cameras. Recently,
moving cameras have also been studied since videos from mobile
devices have increased significantly. In this paper, we propose a unified
and robust framework to effectively handle diverse types of videos,
e.g., videos from stationary or moving cameras. Our model is inspired
by two observations: 1) background motion caused by orthographic cameras
lies in a low rank subspace, and 2) pixels belonging to one trajectory
tend to group together. Based on these two observations, we introduce
a new model using both low rank and group sparsity constraints. It is
able to robustly decompose a motion trajectory matrix into foreground
and background ones. After obtaining foreground and background trajectories,
the information gathered on them is used to build a statistical
model to further label frames at the pixel level. Extensive experiments
demonstrate very competitive performance on both synthetic data and
real videos. Attached files: ECCV12-background.pdf
2014-05-03
Since the initial comparison of Seitz et al. [48], the accuracy of dense multiview stereovision methods has been increasing steadily. A number of limitations, however, make most of these methods not suitable to outdoor scenes taken under uncontrolled imaging conditions. The present work consists of a complete dense multiview stereo pipeline which circumvents these limitations,
being able to handle large-scale scenes without sacrificing accuracy. Highly detailed reconstructions are produced within very reasonable time thanks to two key stages in our pipeline: a minimum s-t cut optimization over an adaptive domain that robustly and efficiently filters a quasidense point cloud from outliers and reconstructs an initial surface by integrating visibility constraints, followed by a mesh-based variational refinement that captures small details, smartly handling photo-consistency, regularization, and adaptive resolution. The pipeline has been tested over a wide range of scenes: from classic compact objects taken in a laboratory setting, to
outdoor architectural scenes, landscapes, and cultural heritage sites. The accuracy of its reconstructions has also been measured on the dense multiview benchmark proposed by Strecha et al. [59], showing the results to compare more than favorably with the current state-of-the-art methods. Attached files: PAMI-2012 High Accuracy and Visibility-Consistent Dense Multiview Stereo.pdf
2014-04-19
This paper develops a theoretical model for the formation of transparent
overlays and proposes a temporal algorithm to detect them
independent of their degree of transparency. The proposed algorithm
exploits our novel observation that the appearance of a transparent
overlay results in a proportionally constant decrease in the
intensity variance. In order to detect transparent regions, we first
compute intensity variances about each pixel. After that, the ratios
of the variances between the pixels of the consecutive frames are
computed to form variance ratio images. Because the degree of
transparency is unknown and may vary, we generate binary images
by thresholding variance ratio images for every possible fine interval
of the degree of transparency. Various morphological, textural,
and contextual information are applied to every candidate binary
image to detect spatial location of transparent overlays. We can
also accurately detect the color and the degree of transparency of
the transparent overlay so that we can remove the transparency or
apply user-specific enhancement operations. We also demonstrate
the application of the algorithm to video indexing and retrieval. Attached files: Temporal detection and processing of transparent overlays for video indexing and enhancement.pdf
2014-04-12
The efficiency and quality of a feature descriptor are critical to the user
experience of many computer vision applications. However, the existing
descriptors are either too computationally expensive to achieve real-time
performance, or not sufficiently distinctive to identify correct matches from a large
database with various transformations. In this paper, we propose a highly efficient
and distinctive binary descriptor, called local difference binary (LDB). LDB directly
computes a binary string for an image patch using simple intensity and gradient
difference tests on pairwise grid cells within the patch. A multiple-gridding strategy
and a salient bit-selection method are applied to capture the distinct patterns of the
patch at different spatial granularities. Experimental results demonstrate that
compared to the existing state-of-the-art binary descriptors, primarily designed for
speed, LDB has similar construction efficiency, while achieving a greater accuracy
and faster speed for mobile object recognition and tracking tasks. Attached files: Local Difference Binary for Ultrafastand Distinctive Feature Description.pdf
2014-03-29
Multioriented text detection in video frames is not as easy as detection of captions or graphics or overlaid texts, which usually appears in the horizontal direction and has high contrast compared to its background. Multioriented text generally refers to scene text that makes text detection more challenging and
interesting due to unfavorable characteristics of scene text. Therefore, conventional text detection methods may not give good results for multioriented scene text detection. Hence, in this paper, we present a new enhancement method that includes the product of Laplacian and Sobel operations to enhance text
pixels in videos. To classify true text pixels, we propose a Bayesian classifier without assuming a priori probability about the input frame but estimating it based on three probable matrices. Three different ways of clustering are performed on the output of the enhancement method to obtain the three probable matrices. Text candidates are obtained by intersecting the output of the Bayesian
classifier with the Canny edge map of the input frame. A boundary growing method is introduced to traverse the multioriented scene text lines using text candidates. The boundary growing method works based on the concept of nearest neighbors. The robustness of the method has been tested on a variety of datasets
that include our own created data (nonhorizontal and horizontal text data) and two publicly available data, namely, video frames of Hua and complex scene text data of ICDAR 2003 competition (camera images). Experimental results show that the performance of the proposed method is encouraging compared with results of
existing methods in terms of recall, precision, F-measures, and computational times.
2014-03-22
Part-based models have demonstrated their merit in object detection. However, there is a key issue to be solved on how to integrate the inaccurate scores of part detectors when there are occlusions or large deformations. To handle the imperfectness of part detectors, this paper presents a probabilistic pedestrian detection framework. In this framework, a deformable part-based model is used to obtain the scores of part detectors and the visibilities of parts are modeled as hidden variables. Unlike previous occlusion handling approaches that assume independence among visibility probabilities of parts or manually define rules for the visibility relationship, a discriminative deep model is used in this paper for learning the visibility relationship among overlapping parts at multiple layers. Experimental results on three public datasets (Caltech, ETH and Daimler) and a new CUHK occlusion dataset1 specially designed for the evaluation of occlusion handling approaches show the effectiveness of the proposed approach. Attached files: A Discriminative Deep Model for Pedestrian Detection with Occlusion Handling.pdf
2014-03-15
The automatic extraction of line-networks from images is a well-known computer vision issue. Appearance and shape considerations have been deeply explored in the liter-ature to improve accuracy in presence of occlusions, shad-ows, and a wide variety of irrelevant objects. However most existing works have ignored the structural aspect of the problem. We present an original method which pro-vides structurally-coherent solutions. Contrary to the pixel-based and object-based methods, our result is a graph in which each node represents either a connection or an end-ing in the line-network. Based on stochastic geometry, we develop a new family of point processes consisting in sam-pling junction-points in the input image by using a Monte Carlo mechanism. The quality of a configuration is mea-sured by a probability density which takes into account both image consistency and shape priors. Our experiments on a variety of problems illustrate the potential of our approach in terms of accuracy, flexibility and efficiency.
2014-03-08
There are few fully automated methods for liver segmentation in magnetic resonance images (MRI) despite the benefits of this type of acquisition in comparison to other radiology techniques such as computed tomography (CT). Motivated by medical requirements, liver segmentation in MRI has been carried out. For this purpose, we present a new method for liver segmentation based on the watershed transform and stochastic partitions. The classical watershed over-segmentation is reduced using a marker-controlled algorithm. To improve accuracy of selected contours, the gradient of the original image is successfully enhanced by applying a new variant of stochastic watershed. Moreover, a final classifier is performed in order to obtain the final liver mask. Optimal parameters of the method are tuned using a training dataset and then they are applied to the rest of studies (17 datasets).The obtained results (a Jaccard coefficient of 0.91 ± 0.02) in comparison to other methodsdemonstrate that the new variant of stochastic watershed is a robust tool for automaticsegmentation of the liver in MRI. Attached files: 1-s2.0-S0169260713004124-main.pdf
2014-03-01
In this paper, we propose a depth-map merging based multiple view stereo method for large-scale scenes which takes both accuracyandefficiencyinto account. In the proposed method, an efficient patch-based stereo matching process is used to generate depth-map at each image with acceptable
errors, followed by a depth-map refinement process to enforce consistency over neighboring views. Compared to state-of-the-art methods, the proposed method can reconstruct quite accurate and
dense point clouds with high computational efficiency. Besides, the proposed method could be easily parallelized at image level, i.e., each depth-map is computed individually, which makes it
suitable for large-scale scene reconstruction with high resolution images. The accuracy and efficiency of the proposed method are evaluated quantitatively on benchmark data and qualitatively on large data sets. Attached files: Accurate Multiple View 3D Reconstruction Using Pathch-Based Stereo for Large-Scale Scenes.pdf
2014-02-08
This work introduces a novel descriptor called
Binary Robust Appearance and Normals Descriptor (BRAND),
that efficiently combines appearance and geometric shape
information from RGB-D images, and is largely invariant to
rotation and scale transform. The proposed approach encodes
point information as a binary string providing a descriptor
that is suitable for applications that demand speed performance
and low memory consumption. Results of several experiments
demonstrate that as far as precision and robustness are concerned,
BRAND achieves improved results when compared to
state of the art descriptors based on texture, geometry and
combination of both information. We also demonstrate that
our descriptor is robust and provides reliable results in a
registration task even when a sparsely textured and poorly
illuminated scene is used. Attached files: BRAND-ARobustAppearanceandDepthDescriptorforRGB-DImages .pdf
2014-01-25
There is a growing body of work addressing the problem of localizing printed text regions occurring in natural scenes, all of it focused on images in which the text to be localized is resolved clearly enough to be read by OCR. This paper introduces an alternative approach to text localization based on the fact that it is often useful to localize text that is identifiable as text but too blurry or small to be read, for two reasons. First, an image can be decimated and processed at a coarser resolution than usual, resulting in faster localization before OCR is performed (at full resolution, if needed). Second, in real-time applications such as a cell phone app to find and read text, text may initially be acquired from a lower-resolution video image in which it appears too small to be read; once the text’s presence and location have been established, a higher-resolution image can be taken in order to resolve the text clearly enough to read it.
We demonstrate proof of concept of this approach by describing a novel algorithm for binarizing the image and extracting candidate text features, called “blobs,” and grouping and classifying the blobs into text and non-text categories. Experimental results are shown on a variety of images in which the text is resolved too poorly to be clearly read, but is still identifiable by our algorithm as text. Attached files: Localizing Blurry and Low-Resolution Text in Natural Images-IEEE Version-Applications of Computer Vision (WACV), 2011 IEEE Workshop on.pdf
2014-01-18
In robotics, vertical lines have been always very useful for autonomous robot localization and navigation in structured environments. This paper presents a robust method for matching vertical lines in omnidirectional images. Matching robustness is achieved by creating a descriptor which is very distinctive and is invariant to rotation and slight changes of illumination. We characterize the performance of the descriptor on a large image dataset by taking into account the sensitiveness to the different parameters of the descriptor. The robustness of the approach is also validated through a real navigation experiment with a mobile robot equipped with an omnidirectional camera. Attached files: Performance Evaluation of a Vertical Line Descriptor for Omnidirectional Images.pdf
2014-01-11
A full-automatic method for recognizing parking slot markings is proposed. The proposed method recognizes various types of parking slot markings by modeling them as a hierarchical tree structure. This method mainly consists of two processes: bottom-up and top-down. First, the bottom-up process climbs up the hierarchical tree structure to excessively generate parking slot candidates so as not to lose the correct slots. This process includes corner detection, junction and slot generation, and type selection procedures. After that, the top-down process confirms the final parking slots by eliminating falsely generated slots, junctions, and corners based on the properties of the parking slot marking type by climbing down the hierarchical tree structure. The proposed method was evaluated in 608 real-world parking situations encompassing a variety of different parking slot markings. The experimental result reveals that the proposed method outperforms the previous semiautomatic method while requiring a small amount of computational costs even though it is fully automatic. Attached files: Full-automatic recognition of various parking slot markings using a hierarchical tree structure.pdf
2014-01-04
We present a system able to predict the future behavior of the ego-vehicle in an inner-city environment. Our system learns the mapping between the current perceived scene (information about the ego-vehicle and the preceding vehicle,as well as information about the possible traffic lights) and the future driving behavior of the ego-vehicle. We improve the prediction accuracy by estimating the prediction confidence and by discarding unconfident samples. The behavior of the driver is represented as a sequence of elementary states termed behavior primitives. These behavior primitives are abstractions from the raw actuator states. Behavior prediction is therefore considered to be a multi-class learning problem.
In this contribution, we explore the possibilities of situation-specific earning. We show that decomposing the perceived complex situation into a combination of simpler ones, each of them with a dedicated prediction, allows the system to reach a performance equivalent to a system without situation-specificity. We believe that this is advantageous for the scalability of the approach to the number of possible situations that the driver will encounter. the system is tested on a real world scenario,using streams recorded in inner-city scenes. The prediction is evaluated for a prediction horizon of 3s into the future, and the quality of the prediction is measured using established evaluation methods.
|