|Select year: 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018
Seminars in 2013
Recent research has focused on systems for obtaining automatic 3D reconstructions of urban env ir onments from video acquired at street level. These systems record enormous amounts of video; therefore a key component is a stereo matcher which can process this data at speeds comparable to the recording frame rate. F urthermore, urban environments are unique in that they exhibit mostly planar surfaces. These surfaces, w hich are often imaged at oblique angles, pose a chal lenge for many window-based stereo matchers which suffer in the presence of slanted surfaces.
We present a multi-view plane-sweep-based stereo algorithm which correctly handles slanted surfaces and runs in real-time using the graphics processing unit (G P U ). Our algorithm consists of (1) identifying the scene's principle plane orientations, (2) estimating depth by p erforming a plane-sweep for each direction, (3) combining the results of each sweep. The latter can optionally be performed using graph cuts. Additionally , by incorporating priors on the
locations of planes in the scene, we can increase the quality of the reconstruction and reduce computation time, especially for uniform textureless surfaces. We demonstrate our algorithm on a variety of scenes and show the improved accuracy obtained by accounting for slanted surfaces.
Attached files: CRVP(2007) Real-time Plane-Sweeping Stereo with Multiple Sweeping Directions.pdf
The article presents an efficient method in detecting
critical 3-D feature points for efficient and accurate data
registration required in real-time indoor environment mapping
by using RDB-D cameras. To achieve fast and accurate data
correspondence between different 3-D scanned images, in the
proposed method, RGB images are first used to detect two-
dimensional (2-D) sparse color features for estimating matched
pairs between successive scanned depth images. Critically,
detected 2-D sparse features are mapped with their
corresponding depth information. Consequently, sub-sets of
matched pairs in 3-D depth space are established. Moreover, due
to potential sensing noises, not all of pairs are valid and useful to
3-D matched pair establishment. Invalid pairs are detected and
eliminated using an proposed angle-based filter for 2-D matched
pairs, as well as a filter based on Euclidean distance, neighboring
area and surface curvature filters for 3-D matched pairs. The
experimental results show that the method is efficient and
invariant to pose, robust for large-scale indoor environments,
and feasible for real-time 3-D indoor environment mapping.
Attached files: Real-time 3-D Feature Detection and CorrespondenceRefinement for Indoor Environment-Mapping usingRGB-D cameras .pdf
This paper proposes a novel system for the automatic detection and recognition of traffic signs. The proposed system detects candidate regions as maximally stable extremal regions (MSERs), which offers robustness to variations in lighting conditions. Recognition is based on a cascade of support vector machine (SVM) classifiers that were trained using histogram of oriented gradient (HOG) features. The training data are generated from synthetic template images that are freely available from an online database; thus, real footage road signs are not required as training data. The proposed system is accurate at high vehicle
speeds, operates under a range of weather conditions, runs at an average speed of 20 frames per second, and recognizes all classes of ideogram-based (nontext) traffic symbols from an online road sign database. Comprehensive comparative results to illustrate the performance of the system are presented.
Mobile robots equipped with an omnidirectional camera have gained a considerable attention over the last decade. Having an entire view of the scene can be very advantageous in numerous applications as all information is stored in a single frame. This paper is primarily concerned with detection of moving objects from optical flow field in cluttered indoor environments, which is necessary for safe navigation and collision avoidance. The algorithm is based on the comparison of the measured optical flow vectors with the generated ones. As depth information is not available, a novel method is proposed which iteratively generates optical flow vectors for different possible real world coordinates of the objects in the scene. This is necessary in order to incorporate motion estimates given by motor encoders. Back-projecting into image is then used to generate synthetic optical flow vectors needed for comparison. The algorithm was tested on a real system and was able to successfully localize a moving object under both artificial and natural lighting. The proposed algorithm can be implemented in real-time on any system with known calibrated model of the omnidirectional sensor and reliable motion estimation.
Attached files: Real-Time Detection of Moving Objects by a Mobile Robot with an Omnidirectional Camera (ISPA 2011).pdf
This paper proposes a novel algorithm for parking motion of a Car-like mobile robot. The algorithm presented here addresses calculating equations for planning a parking path in real time. Moreover, by incorporating the constraints of the mechanical and kinematical characteristics of the car and the geometry of the parking lot in the path planning, we can turn a parking problem into solving algebraic equations. By tracking a planned path, the Car-like mobile robot can drive into the parking area without hitting any boundaries. The efficiency of the proposed algorithm is demonstrated by simulation.
Attached files: Seminar Alexander_Filonenko.pdf
Recently, vision-based advanced driver-assistance systems (ADAS) have received a new increased interest to enhance driving safety. In particular, due to its high performance–cost ratio, mono-camera systems are arising as the main focus of this field of work. In this paper we present a
novel on-board road modeling and vehicle detection system, which is a part of the result of the European I-WAY project. The system relies on a robust estimation of the perspective of the scene, which adapts to the dynamics of the vehicle and generates a stabilized rectified image of the road plane. This rectified plane is used by a recursive Bayesian classifier, which classifies pixels as belonging to different classes corresponding to the elements of interest of the scenario. This stage works as an intermediate layer that isolates subsequent modules since it absorbs the inherent variability of the scene. The system has been tested on-road, in different scenarios, including varied illumination and adverse weather conditions, and the results have been proved to be remarkable even for such complex scenarios.
Attached files: Road environment modeling using robust perspective analysis and recursive Bayesian segmentation.pdf
In this paper, we describe a new approach for the extrinsic calibration of a camera with a 3D laser range finder, that can be done on the fly. This approach does not require any calibration object. Only few point correspondences are used, which are manually selected by the user from a scene viewed by the two sensors. The proposed method relies on a novel technique to visualize the range information obtained from a 3D laser scanner. This technique converts the visually ambiguous 3D range information into a 2D map where natural features of a scene are highlighted. We show that by enhancing the features the user can easily find the corresponding points of the camera image points. Therefore, visually identifying laser- camera correspondences becomes as easy as image pairing. Once point correspondences are given, extrinsic calibration is done using the well-known PnP algorithm followed by a noninear refinement process. We show the performance of our approach through experimental results. In these experiments, we will use an omnidirectional camera. The implication of this method is important because it brings 3D computer vision systems out of the laboratory and into practical use.
Attached files: Extrinsic Self Calibration of a Camera and a 3D Laser Range Finder from Natural Scenes IROS2007.pdf 2013-10-12-Van_Dung_Hoang.pptx
In this paper we propose a novel approach to binocular stereo
for fast matching of high-resolution images. Our approach builds a prior
on the disparities by forming a triangulation on a set of support points
which can be robustly matched, reducing the matching ambiguities of
the remaining points. This allows for efficient exploitation of the dispar-
ity search space, yielding accurate dense reconstruction without the need
for global optimization. Moreover, our method automatically determines
the disparity range and can be easily parallelized. We demonstrate the
effectiveness of our approach on the large-scale Middlebury benchmark,
and show that state-of-the-art performance can be achieved with signif-
icant speedups. Computing the left and right disparity maps for a one
Megapixel image pair takes about one second on a single CPU core.
Attached files: Effcient Large-Scale Stereo Matching.pdf
Abstract—This paper presents an automatic road-sign detection and recognition system based on support vector machines (SVMs). In automatic traffic-sign maintenance and in a visual driverassistance system, road-sign detection and recognition are two of the most important functions. Our system is able to detect and recognize circular, rectangular, triangular, and octagonal signs
and, hence, covers all existing Spanish traffic-sign shapes. Road signs provide drivers important information and help them to drive more safely and more easily by guiding and warning them and thus regulating their actions. The proposed recognition system is based on the generalization properties of SVMs. The system
consists of three stages: 1) segmentation according to the color of the pixel; 2) traffic-sign detection by shape classification using linear SVMs; and 3) content recognition based on Gaussian-kernel SVMs. Because of the used segmentation stage by red, blue, yellow, white, or combinations of these colors, all traffic signs can be detected, and some of them can be detected by several colors. Results show a high success rate and a very low amount of false positives in the final recognition stage. From these results, we can conclude that the proposed algorithm is invariant to translation, rotation, scale, and, in many situations, even to partial occlusions.
Attached files: its07-maldonado.pdf
Abstract—We describe a real-time pedestrian detection system intended for use in automotive applications. Our system demonstrates superior detection performance when compared to many state-of-the-art detectors and is able to run at a speed of 14 fps on an Intel Core i7 computer when applied to 640480 images. Our approach uses an analysis of geometric constraints to efficiently search feature pyramids and increases detection accuracy by using a multiresolution representation
of a pedestrian model to detect small pixel-sized pedestrians
normally missed by a single representation approach. We have
evaluated our system on the Caltech Pedestrian benchmark
which is currently the largest publicly available pedestrian
dataset at the time of this publication. Our system shows
a detection rate of 61% with 1 false positive per image
(FPPI) whereas recent other state-of-the-art detectors show
a detection rate of 50% 61% under the ‘reasonable’ test
scenario (explained later). Furthermore, we also demonstrate
the practicality of our system by conducting a series of use case
experiments on selected videos of Caltech dataset.
Attached files: Real-time Pedestrian Detection with Deformable Part Models (IV-2012).pdf
The RRT∗ algorithm has recently been proposed as an optimal extension to the standard RRT algorithm . However , like RRT, RRT∗is difficult to apply in problems withcomplicated or underactuated dynamics because it requires the design of a two domain-specific extension heuristics: a distance metric and node extension method. We propose automatically deriving these two heuristics for RRT
∗ by locally linearizing the domain dynamics and applying linear quadratic egulation (LQR). The resulting algorithm, LQR-RRT∗, finds optimal plans in domains with complex or underactuated dynamics without requiring domain-specific design choices. We demon-strate its application in domains that are successively torque-limited, underactuated, and in belief space.
Attached files: LQR-RRT_ Optimal Sampling-Based Motion Planning with Automatically Derived Extension Heuristics ICRA2012.pdf
Connected component labeling is an important but computationally expensive operation required in many fields of research. The goal in the present work is to label connected components on a 2D binary map. Two different iterative algorithms for doing this task are presented. The first algorithm (Row–Col Unify) is based upon the directional propagation labeling, whereas the second algorithm uses the Label Equivalence technique. The Row–Col Unify algorithm uses a local array of references and the reduction technique intrinsically. The usage of shared memory extensively makes the code efficient. The Label Equivalence algorithm is an extended version of the one presented by Hawick et al. (2010) . At the end the comparison depending on the performances of both of the algorithms is presented.
Attached files: Connected component labeling on a 2D grid using CUDA.pdf
This paper addresses the problem of segmenting an image into regions. We define a predicate for
measuring the evidence for a boundary between two regions using a graph-based representation of the image. We then develop an efficient segmentation algorithm based on this predicate, and show that although this algorithm makes greedy decisions it produces segmentations that satisfy global properties. We apply the algorithm to image segmentation using two different kinds of local neighborhoods in constructing the graph, and illustrate the results with both real and synthetic images. The algorithm runs in time nearly linear in the number of graph edges and
is also fast in practice. An important characteristic of the method is its ability to preserve detail in low-variability image regions while ignoring detail in high-variability regions.
Attached files: IJCV(2004) Efficient Graph-Based Image Segmentation.pdf
The authors propose a vision-based automatic system to detect preceding vehicles on the highway under various lighting and different weather conditions. To adapt to different characteristics of vehicle appearance under various lighting conditions, four
cues including underneath shadow, vertical edge, symmetry and taillight are fused for the vehicle detection. The authors achieve this goal by generating probability distribution of vehicle under particle filter framework through the processes of initial sampling,
propagation, observation, cue fusion and evaluation. Unlike normal particle filter focusing on single target distribution in a state space, the authors detect multiple vehicles with a single particle filter through a high-level tracking strategy using clustering. In
addition, the data-driven initial sampling technique helps the system detect new objects and prevent the multi-modal distribution from collapsing to the local maxima. Experiments demonstrate the effectiveness of the proposed system.
In this paper we present two real-time methods
for estimating surface normals from organized point cloud
data. The proposed algorithms use integral images to perform
highly efficient border- and depth-dependent smoothing and
covariance estimation. We show that this approach makes it
possible to obtain robust surface normals from large point
clouds at high frame rates and therefore, can be used in real-
time computer vision algorithms that make use of Kinect-like
Attached files: Adaptive Neighborhood Selection for Real-Time Surface Normal.pdf
In this paper, we propose a method for dot text detection based on FAST points. This problem is different from general scene text detection because of discontinuous text stroke. Unlike many other methods which assume that text is horizontally oriented, our method is able to deal with slant dot text. We extract interesting patches from FAST points and define four features based on the stroke and gray value similarity of dot text to describe a patch. Then, we generate some candidate regions from these patches and utilize SVM to filter out non-dot text ones with the first and second order moments of FAST points in them. Experimental results show that the proposed method is effective and fast to detect dot text.
Attached files: Dot Text Detection Using FAST Point.pdf
The complexity of human detection increases significantly with a growing density of humans populating a scene. This paper presents a Bayesian detection framework using shape and motion cues to obtain a maximum a posteriori (MAP) solution for human configurations consisting of many, possibly occluded pedestrians viewed by a stationary camera. The paper contains two novel contributions for the human detection task: 1. computationally efficient detection based on shape templates using contour integration by means of integral images which are built by oriented
string scans; (2) a non-parametric approach using an approximated version of the Shape Context descriptor which generates informative object parts and infers the presence of humans despite occlusions. The outputs of the two detectors are used to generate a spatial configuration of hypothesized human body locations. The configuration is iteratively optimized while taking into account the depth ordering and occlusion status of the hypotheses. The method achieves fast computation times even in complex scenarios with a high density of people. Its validity is demonstrated on a substantial amount of image data using the CAVIAR
and our own datasets. Evaluation results and comparison with state of the art are presented
Attached files: Fast human detection in crowded scenes by contour integration and local shape estimation (CVPR 2009).pdf
The Hough transform is a well-known and popular algorithm for detecting lines in raster images. The standard Hough transform is rather slow to be usable in real time, so different accelerated and approximated algorithms exist. This study proposes a modiﬁed accumulation scheme for the Hough transform, using a new parameterization of lines ‘‘PClines’’. This algorithm is suitable for computer systems with a small but fast read-write memory, such as today’s graphics processors. The algorithm requires no ﬂoating-point computations or goniometric functions. This makes it suitable for special and low-power processors and special-purpose chips. The proposed algorithm is evaluated both on synthetic binary images and on complex real-world photos of high resolutions. The results show that using today’s commodity graphics chips, the Hough transform can be computed at interactive frame rates, even with a high resolution of the Hough space and with the Hough transform fully computed.
Attached files: Real-time detection of lines using parallel coordinates and CUDA.pdf
In this paper, we present a framework for 6D absolute scale motion and structure estimation of a multi-camera system in challenging indoor environments. It operates in real-time and employs information from two cameras with non-overlapping fields of view. Monocular Visual Odome-try supplying up-to-scale 6D motion information is carried out in each of the cameras, and the metric scale is recovered via a linear solution by imposing the known static transfor-mation between both sensors. The redundancy in the motion estimates is finally exploited by a statistical fusion to an op-timal 6D metric result. The proposed technique is robust to outliers and able to continuously deliver a reasonable mea-surement of the scale factor. The quality of the framework is demonstrated by a concise evaluation on indoor datasets, including a comparison to accurate ground truth data pro-vided by an external motion tracking system.
Attached files: Real-Time 6D Stereo Visual Odometry with Non-Overlapping Fields of View CVPR2012.pdf 2013-06-15-Van_Dung_Hoang.pptx
We present an original multiview stereo reconstruction algorithm which allows the 3D-modeling of urban scenes as a combination of meshes and geometric primitives. The method provides a compact model while preserving details: Irregular elements such as statues and ornaments are described by meshes, whereas regular structures such as columns and walls are described by primitives (planes, spheres, cylinders, cones, and tori). We adopt a two-step strategy consisting first in segmenting the initial meshbased surface using a multilabel Markov Random Field-based model and second in sampling primitive and mesh components simultaneously on the obtained partition by a Jump-Diffusion process. The quality of a reconstruction is measured by a multi-object energy model which takes into account both photo-consistency and semantic considerations (i.e., geometry and shape layout). The segmentation and sampling steps are embedded into an iterative refinement procedure which provides an increasingly accurate hybrid representation. Experimental results on complex urban structures and large scenes are presented and compared to state-of-the-art multiview stereo meshing algorithms.
Attached files: A Hybrid Multiview Stereo Algorithm for Modeling Urban Scenes.pdf
Abstract— Over the past years, there has been a tremendous progress in the area of robot navigation. Most of the systems developed thus far, however, are restricted to indoor scenarios, non-urban outdoor environments, or road usage with cars. Urban areas introduce numerous challenges to autonomous mobile robots as they are highly complex and in addition to
that dynamic. In this paper, we present a navigation system for pedestrian-like autonomous navigation with mobile robots in city environments. We describe different components including a SLAM system for dealing with huge maps of city centers, a planning approach for inferring feasible paths taking also into account the traversability and type of terrain, and a method for accurate localization in dynamic environments. The navigation system has been implemented and tested in several large-scale field tests in which the robot Obelix managed to autonomously navigate from our university campus over a 3.3 km long route
to the city center of Freiburg.
Real-time 3D perception of the surrounding environment is a
crucial precondition for the reliable and safe application of mobile service
robots in domestic environments. Using a RGB-D camera, we present a
system for acquiring and processing 3D (semantic) information at frame
rates of up to 30Hz that allows a mobile robot to reliably detect obstacles
and segment graspable objects and supporting surfaces as well as the
overall scene geometry. Using integral images, we compute local surface
normals. The points are then clustered, segmented, and classified in both
normal space and spherical coordinates. The system is tested in different
setups in a real household environment.
The results show that the system is capable of reliably detecting obstacles
at high frame rates, even in case of obstacles that move fast or not
considerably stick out of the ground. The segmentation of all planes in
the 3D data even allows for correcting characteristic measurement errors
and for reconstructing the original geometry in far ranges.
Attached files: Real-Time Plane Segmentation using RGB-D Cameras.pdf
This paper focus on the study of the motion activity descriptor for shot boundary detection in video sequences. We
interest in the validation of this descriptor in the aim of its real time implementation with reasonable high performances in shot
boundary detection. The motion activity information is extracted in uncompressed domain based on adaptive rood pattern
search (ARPS) algorithm. In this context, the motion activity descriptor was applied for different video sequence.
Attached files: Video shot boundary detection using motion activity descriptor.pdf
Reading text from scene images is a challenging problem that is receiving much attention, especially since the appearance of imaging devices in low-cost consumer products like mobile phones. This paper presents an easy and fast method to recognize individual characters in images of natural scenes that is applied after an algorithm that robustly locates text on such images. The recognition is based on a gradient direction feature. Our approach also computes the output probability for each class of the character to be recognized. The proposed feature is compared to other features typically used in character recognition. Experimental results with a challenging dataset show the good performance of the proposed method.
Attached files: A Character Recognition Method in Natural Scenes Images ICPR 2012.pdf
Significant research has been devoted to detecting people in images and videos. In this paper we describe a human detection method that augments widely used edge-based features with texture and color information, providing us with a much richer descriptor set. This augmentation results in an extremely high-dimensional feature space (more than 170,000 dimensions). In such high-dimensional spaces, classical machine learning algorithms such as SVMs are nearly intractable with respect to training. Furthermore, the number of training samples is much smaller than the dimensionality of the feature space, by at least an order of magnitude. Finally, the extraction of features from a densely sampled grid structure leads to a high degree of multicollinearity. To circumvent these data characteristics, we employ Partial Least Squares (PLS) analysis, an efficient dimensionality reduction technique, one which preserves significant discriminative information, to project the data onto a much lower dimensional subspace (20 dimensions, reduced from the original 170,000). Our human detection system, employing PLS analysis over the enriched descriptor set, is shown to outperform state-of-the-art techniques on three varied datasets including the popular INRIA pedestrian dataset, the low-resolution gray-scale DaimlerChrysler pedestrian dataset, and the ETHZ pedestrian dataset consisting of full-length videos of crowded scenes.
Attached files: PLS_xvid.avi Human Detection Using Partial Least Squares Analysis.pdf
In this paper we propose a new image event detection method for
identifying flood in videos. Traditional image based flood detec-
tion is often used in remote sensing and satellite imaging applica-
tions. In contrast, the proposed method is applied for retrieval of
flood catastrophes in newscast content, which present great varia-
tion in flood and background characteristics, depending on the video
instance. Different flood regions in different images share some
common features which are reasonably invariant to lightness, cam-
era angle or background scene. These features are texture, relation
among color channels and saturation characteristics. The method
analyses the frame-to-frame change in these features and the results
are combined according to the Bayes classifier to achieve a decision (i.e. flood happens, flood does not happen). In addition, because the flooded region is usually located around the lower and middle parts of an image, a model for the probability of occurrence of flood as a function of the vertical position is proposed, significantly improving the classification performance. Experiments illustrated the applicability of the method and the improved performance in comparison to other techniques.
Attached files: flood.pdf
This paper proposes a novel algorithm for multiview stereopsis that outputs a dense set of small rectangular patches covering the surfaces visible in the images. Stereopsis is implemented as a match, expand, and filter procedure, starting from a sparse set of matched keypoints, and repeatedly expanding these before using visibility constraints to filter away false matches. The keys to the performance of the proposed algorithm are effective techniques for enforcing local photometric consistency and global visibility constraints. Simple but effective methods are also proposed to turn the resulting patch model into a mesh which can be further refined by an algorithm that enforces both photometric consistency and regularization constraints. The proposed approach automatically detects and discards outliers and obstacles and does not require any initialization in the form of a visual hull, a bounding box, or valid depth ranges. We have tested our algorithm on various data sets including objects with fine surface details, deep concavities, and thin structures, outdoor scenes observed from a restricted set of viewpoints, and ¡°crowded¡± scenes where moving obstacles appear in front of a static structure of interest. A quantitative evaluation on the Middlebury benchmark  shows that the proposed method outperforms all others submitted so far for four out of the six data sets.
Attached files: Accurate, Dense, and Robust Multiview Stereopsis.pdf
In structure-from-motion with a single camera it is well known that the scene can be only recovered up to a scale. In order to compute the absolute scale, one needs to know the baseline of the camera motion or the dimension of at least one element in the scene. In this paper, we show that there exists a class of structure-from-motion problems where it is possible to compute the absolute scale completely automatically without using this knowledge, that is, when the camera is mounted on wheeled vehicles (e.g. cars, bikes, or mobile robots). The construction of these vehicles puts interesting constraints on the camera motion, which are known as “nonholonomic constraints”. The interesting case is when the camera has an offset to the vehicle's center of motion. We show that by just knowing this offset, the absolute scale can be computed with a good accuracy when the vehicle turns. We give a mathematical derivation and provide experimental results on both simulated and real data over a large image dataset collected during a 3 Km path. To our knowledge this is the first time nonholonomic constraints of wheeled vehicles are used to estimate the absolute scale. We believe that the proposed method can be useful in those research areas involving visual odometry and mapping with vehicle mounted cameras.
Attached files: Absolute Scale in Structure from Motion from a Single Vehicle Mounted Camera by Exploiting Nonholonomic Constraints ICCV2009.pdf
One of the more startling effects of road related accidents is the economic and social burden they cause. Between 750,000 and 880,000 people died globally in road related accidents in 1999 alone, with an estimated cost of US✩518 billion . One way of combating this problem is to develop Intelligent Vehicles that are self-aware and act to increase the safety of the transportation system. This paper presents the development and application of a novel multiple-cue visual lane tracking system for research into Intelligent Vehicles (IV).
Particle filtering and cue fusion technologies form the basis of the lane tracking system which robustly handles several of the problems faced by previous lane tracking systems such as shadows on the road, unreliable lane markings, dramatic lighting changes and discontinuous
changes in road characteristics and types. Experimental results of the lane tracking system running at 15Hz will be discussed, focusing on the particle filter and cue fusion technology used.
Decomposing sensory measurements into relevant
parts is a fundamental prerequisite for solving complex tasks,
e.g., in the field of mobile manipulation in domestic environments.
In this paper, we present a fast approach to surface
reconstruction in range images by means of approximate
polygonal meshing. The obtained local surface information
and neighborhoods are then used to 1) smooth the underlying
measurements, and 2) segment the image into planar regions
and other geometric primitives. An evaluation using publicly
available data sets shows that our approach does not rank
behind state-of-the-art algorithms while allowing to process
range images at high frame rates.
Attached files: Fast Range Image Segmentation and Smoothing using Approximate Surface Reconstruction and Region Growing.pdf
The captions in videos are closely related to the video con-
tents, so the research of automatic caption detection contributes to video
contents analysis and content-based retrieval. In this paper, a novel
phase-based static caption detection approach is proposed. Our phase-
based algorithm consists of two processes: candidate caption region de-
tection and candidate caption region renement. Firstly, the candidate
caption regions are extracted from the caption saliency map, which is
mainly generated by phase-only Fourier synthesis. Secondly, the candi-
date regions are rened by text region shape features. The comparison
experimental results with existing methods show a better performance
of our proposed approach.
Attached files: A phase based approach for caption detection in videos.pdf
With the increasing popularity of practical vision systems and smart phones, text detection in natural scenes becomes a critical yet challenging task. Most existing methods have focused on detecting horizontal or near-horizontal texts. In this paper, we propose a system which detects texts of arbitrary orientations in natural images. Our algorithm is equipped with a two-level classification scheme and two sets of features specially designed for capturing both the intrinsic characteristics of texts. To better evaluate our algorithm and compare it with other competing algorithms, we generate a new dataset, which includes various texts in diverse real-world scenarios; we also propose a protocol for performance evaluation. Experiments on benchmark datasets and the proposed dataset demonstrate that our algorithm compares favorably with the state-of-the-art algorithms when handling horizontal texts and achieves significantly enhanced performance on texts of arbitrary orientations in complex natural scenes.
Attached files: Detecting Texts of Arbitrary Orientations in Natural Images-CVPR2012.pdf
Omnidirectional cameras are becoming increasingly popular in computer vision and robotics. Camera calibration is a step before performing any task involving metric scene measurement, required in nearly all robotics tasks. In recent years many different methods to calibrate central omnidirectional cameras have been developed, based on different camera models and often limited to a speciﬁc mirror shape. In this paper we review the existing methods designed to calibrate any central omnivision system and analyze their advantages and drawbacks doing a deep comparison using simulated and real data. We choose methods available as OpenSource and which do not require a complex pattern or scene. The evaluation
protocol of calibration accuracy also considers 3D metric reconstruction combining omnidirectional images. Comparative results are shown and discussed in detail.
Attached files: Calibration of Omnidirectional Cameras in Practice (2012).pdf
Automated fire detection is an active research topic in computer vision. In this paper, we propose and analyze a new method for identifying fire in videos. Computer vision-based fire detection algorithms are usually applied in closed-circuit television surveillance scenarios with controlled background. Incontrast, the proposed method can be applied not only to surveillance but also to automatic video classification for retrieval of fire catastrophes in databases of newscast content. In the latter case, there are large variations in fire and background characteristics depending on the video instance. The proposed method analyzes the frame-to-frame changes of specific low-level features describing potential fire regions. These features are color, area size, surface coarseness, boundary roughness, and skewness within estimated fire regions. Because of flickering and random characteristics of fire, these features are powerful discriminants. The behavioral change of each one of these features is evaluated, and the results are then combined according to the Bayes classifier for robust fire recognition. In addition, a priori knowledge of fire events captured in videos is used to significantly improve the classification results. For edited newscast videos, the fire region is usually located in the center of the frames. This fact is used to model the probability of occurrence of fire as a function of the position. Experiments illustrated the applicability of the method.
Attached files: 05430942.pdf
Simultaneous Localization and Mapping (SLAM)and Visual SLAM (V-SLAM) in particular have been an active area of research lately. In V-SLAM the main focus is most often laid on the localization part of the problem allowing for a drift free motion estimate. To this end, a sparse set of landmarks is tracked and their position is estimated. However, this set of landmarks (rendering the map) is often too sparse for tasks in autonomous driving such as navigation, path planning, obstacle avoidance etc. Some methods keep the raw measurements for past robot poses to address the sparsity problem often resulting in a pose only SLAM akin to laser scanner SLAM. For the stereo case, this is however impractical due to the high noise of stereo reconstructed point clouds.
In this paper we propose a dense stereo V-SLAM algorithm that estimates a dense 3D map representation which is more accurate than raw stereo measurements. Thereto, we run a sparse V-SLAM system, take the resulting pose estimates to compute a locally dense representation from dense stereo correspondences. This dense representation is expressed in local coordinate systems which are tracked as part of the SLAM estimate. This allows the dense part to be continuously updated. Our system is driven by visual odometry priors to achieve high robustness when tracking landmarks. Moreover, the sparse part of the SLAM system uses recently published sub mapping techniques to achieve constant runtime complexity most of the time. The improved accuracy over raw stereo measurements is shown in a Monte Carlo simulation. Finally, we demonstrate the feasibility
of our method by presenting outdoor experiments of a car like robot.
Attached files: Visual SLAM for Autonomous Ground Vehicles ICRA2011.pdf
In this paper, we propose an efficient technique to detect changes in the geometry of an urban environment using some images observing its current state. The proposed method can be used to significantly optimize the process of updating the 3D model of a city changing over time, by restricting this process to only those areas where changes are detected. With this application in mind, we designed our algorithm to specifically detect only structural changes in the environment, ignoring any changes in its appearance, and ignoring also all the changes which are not relevant for update purposes, such as cars, people etc. As a by-product,the algorithm also provides a coarse geometry of the detected changes. The erformance of the proposed method was tested on four different kinds of urban environments and compared with two alternative techniques.
Attached files: ICCV2011_Image Based Detection of Geometric Changes in Urban Environments.pdf