Home | Login
Lectures       Previous announcements
Select year: 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018
Seminars in 2015
    Abstract??The determination of flame or fire edges is the process of identifying a boundary between the area where there is thermochemical reaction and those without. It is a precursor to image-based flame monitoring, early fire detection, fire evaluation, and the determination of flame and fire parameters. Several traditional edge-detection methods have been tested to identify flame edges, but the results achieved have been disappointing. Some research works related to flame and fire edge detection were reported for different applications; however, the methods do not emphasize the continuity and clarity of the flame and fire edges. A computing algorithm is thus proposed to define flame and fire edges clearly and continuously. The algorithm detects the coarse and superfluous edges in a flame/fire image first and then identifies the edges of the flame/fire and removes the irrelevant artifacts. The autoadaptive feature of the algorithm ensures that the primary symbolic flame/fire edges are identified for different scenarios. Experimental results for different flame images and video frames proved the effectiveness and robustness of the algorithm.
    Attached files: An Autoadaptive Edge-Detection Algorithm.pdf
    In this paper a very remarkable approach on edge detection has been explored. The most important aspect of image segmentation is edge detection. Edge detection is a meaningful interpretation of discontinuities of similar intensity values in image analysis [1]. Laplace of Gaussian (LoG) filter is a conventional edge detecting tool. Threshold neighboring pixel value (T) and standard deviation parameters are optimized with the help of Cuckoo Search optimization in order to augment the edge detection potential of LoG. PFOM (Pratt??s Figure of Merit) is used as quality factor for edge detection analysis.
    Attached files: 06959172.pdf
    The traditional Canny operator does not have the adaptive ability in the selection of the variance of the Gaussian fIltering.Filtering requires human intervention, and the selection of the variance of Gaussian fIltering affects the edge preserving and denoising effect. An improved edge detection algorithm is proposed in this paper. The Gaussian fIltering is replaced with the morphological fIltering. Experimental results show that the improved Canny operator can fIlter the salt & pepper noise effectively, improve the accuracy of edge detection, and achieve an ideal effect of edge detection. The experiment results show that the objective evaluation and visual effect are good.
    Attached files: Image Edge Detection Algorithm Based on Improved Canny Operator.pdf
    Abstract?In general, the problem of change detection is studied in color space. Most proposed methods aim at dynamically finding the best color thresholds to detect moving objects against a background model. Background models are often complex to handle noise affecting pixels. Because the pixels are considered individually, some changes cannot be detected because it involves groups of pixels and some individual pixels may have the same appearance as the background. To solve this problem, we propose to formulate the problem of background subtraction in feature space. Instead of comparing the color of pixels in the current image with colors in a background model, features in the current image are compared with features in the background model. The use of a feature at each pixel position allows accounting for change affecting groups of pixels, and at the same time adds robustness to local perturbations. With the advent of binary feature descriptors such as BRISK or FREAK, it is now possible to use features in various applications at low computational cost. We thus propose to perform background subtraction with a small binary descriptor that we named Local Binary Similarity Patterns (LBSP). We show that this descriptor outperforms color, and that a simple background subtractor using LBSP outperforms many sophisticated state of the art methods in baseline scenarios.
    Object detection performance, as measured on the canonical PASCAL VOC dataset, has plateaued in the last few years. The best-performing methods are complex ensemble systems that typically combine multiple low-level image features with high-level context. In this paper, we propose a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012?achieving a mAP of 53.3%. Our approach combines two key insights: (1) one can apply high-capacity convolutional neural networks (CNNs) to bottom-up region proposals in order to localize and segment objects and (2) when labeled training data is scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, yields a significant performance boost. Since we combine region proposals with CNNs, we call our method R-CNN: Regions with CNN features.
    Attached files: r-cnn-cvpr.pdf rcnn_pami.pdf
    Kevin Lin et.al., Abandoned Object Detection via Temporal Consistency Modeling and Back-Tracking Verification for Visual Surveillance, IEEE Trans. Information Forensic and Security, July 2015.
    Attached files: 20151114-Saturday Seminar-Wahyono.pdf
    Achieving good accuracy for text detection and recognition is a challenging and interesting problem in the field of video document analysis because of the presences of both graphics text that has good clarity and scene text that is unpredictable in video frames. Therefore, in this paper, we present a novel method for classifying graphics texts and scene texts by exploiting temporal information and finding the relationship between them in video. The method proposes an iterative procedure to identify Probable Graphics Text Candidates (PGTC) and Probable Scene Text Candidates (PSTC) in video based on the fact that graphics texts in general do not have large movements especially compared to scene texts which are usually embedded on background. In addition to PGTC and PSTC, the iterative process automatically identifies the number of video frames with the help of a converging criterion. The method further explores the symmetry between intra and inter character components to identify graphics text candidates and scene text candidates. Boundary growing method is employed to restore the complete text line. For each segmented text line, we finally introduce Eigen value analysis to classify graphics and scene text lines based on the distribution of respective Eigen values. Experimental results with the existing methods show that the proposed method is effective and useful to improve the accuracy of text detection and recognition.
    Attached files: Garphics and scene text classification.pdf
    : It is very challenging to accurately detect smoke from images because of large variances of smoke colour,textures, shapes and occlusions. To improve performance, the authors combine dual threshold AdaBoost withstaircase searching technique to propose and implement an image smoke detection method. First, extended Haar-like features and statistical features are efficiently extracted from integral images from both intensity and saturation components of RGB images. Then, a dual threshold AdaBoost algorithm with a staircase searching technique is proposed to classify the features of smoke for smoke detection. The staircase searching technique aims at keeping consistency of training and classifying as far as possible. Finally, dynamic analysis is proposed to further validate the existence of smoke. Experimental results demonstrate that the proposed system has a good robustness in terms of early smoke detection and low false alarm rate, and it can detect smoke from videos with size of 320 × 240 in real time.
    Attached files: seminar_sasha.pdf
    In this work, we propose to use attributes and parts for recognizing human actions in still images. We define action attributes as the verbs that describe the properties of human actions, while the parts of actions are objects and poselets that are closely related to the actions. We jointly model the attributes and parts by learning a set of sparse bases that are shown to carry much semantic meaning. Then, the attributes and parts of an action image can be reconstructed from sparse coefficients with respect to the learned bases. This dual sparsity provides theoretical guarantee of our bases learning and feature reconstruction approach. On the PASCAL action dataset and a new ?Stanford 40 Actions? dataset, we show that our method extracts meaningful high-order interactions between attributes and parts in human actions while achieving state-of-the-art classification performance.
    Attached files: Human Action Recognition by Learning Bases of Action Attributes and Parts(iccv2011_yao).pdf
    Visual tracking is a challenging problem in computer vision. Most state-of-the-art visual trackers either rely on luminance information or use simple color representations for image description. Contrary to visual tracking, for object recognition and detection, sophisticated color features when combined with luminance have shown to provide excellent performance. Due to the complexity of the tracking problem, the desired color feature should be computationally efficient, and possess a certain amount of photometric invariance while maintaining high discriminative power. This paper investigates the contribution of color in a tracking-by-detection framework. Our results suggest that color attributes provides superior performance for visual tracking. We further propose an adaptive low-dimensional variant of color attributes. Both quantitative and attributebased evaluations are performed on 41 challenging benchmark color sequences. The proposed approach improves the baseline intensity-based tracker by 24% in median distance precision. Furthermore, we show that our approach outperforms state-of-the-art tracking methods while running at more than 100 frames per second.
    Attached files: Adaptive Color Attributes for Real-Time Visual Tracking.pdf
    The detection of moving pedestrians is of major importance for intelligent vehicles, since information about such persons and their tracks should be incorporated into reliable collision avoidance algorithms. In this paper, we propose a new approach to detect moving pedestrians aided by motion analysis. Our main contribution is to use motion information in two ways: on the one hand we localize blobs of moving objects for regions of interest (ROIs) selection by segmentation of an optical flow field in a pre-processing step, so as to significantly reduce the number of detection windows needed to be evaluated by a subsequent people classifier, resulting in a fast method suitable for real-time systems. On the other hand we designed a novel kind of features called Motion Self Difference (MSD)features as a complement to single image appearance features, e. g. Histograms of Oriented Gradients (HOG), to improve distinctness and thus classifier performance. Furthermore, we integrate our novel features in a two-layer classification scheme combining a HOG+Support Vector Machines (SVM) and a MSD+SVM detector. Experimental results on the Daimler mono moving pedestrian detection benchmark show that our approach obtains a log-average
    Attached files: Fast moving pedestrian detection based on motion segmentation and new motion features.pdf
    Video text usually provides us a lot of useful information that is important for video analysis, indexing and retrieval. However, it is still a challenging work to detect text from video images due to variation of text patterns and complexity of background. In this paper, an automatic video text detection method is proposed. Firstly, K-means is utilized to classify pixels in gradient images into text and non-text regions. Subsequently, morphological operations are performed on text regions to form connected candidate text components, followed by projection profile boundary refinement. Finally, the detection results are verified by geometry and BP-Adaboost identifications. The experimental results on our manually selected dataset and the publicly available Microsoft Asia dataset show the effectiveness and feasibility of the proposed method.
    Attached files: An automatic video text detection.pdf
    We present a novel vanishing point detection algorithm for uncalibrated monocular images of man-made environments. We advance the state-of-the-art by a new model of measurement error in the line segment extraction and minimizing its impact on the vanishing point estimation. Our contribution is twofold: 1) Beyond existing hand-crafted models, we formally derive a novel consistency measure, which captures the stochastic nature of the correlation between line segments and vanishing points due to the measurement error, and use this new consistency measure to improve the line segment clustering. 2) We propose a novel minimum error vanishing point estimation approach by optimally weighing the contribution of each line segment pair in the cluster towards the vanishing point estimation. Unlike existing works, our algorithm provides an optimal solution that minimizes the uncertainty of the vanishing point in terms of the trace of its covariance, in a closed-form. We test our algorithm and compare it with the state-of-the-art on two public datasets: York Urban Dataset and Eurasian Cities Dataset. The experiments show that our approach outperforms the state-of-the-art.
    In this paper, a novel approach for detecting multiscale vehicles with time-varying vehicle features based on a multiscale AND?COR graph (AOG) model is proposed. Our approach consists of two steps, i.e., construction of a multiscale AOG model and an inference process for vehicle detection. The multiscale model uses global features to describe low-scale vehicles and local features to represent high-scale vehicles.Meanwhile, multiple appearances, such as sketch, flatness, texture, and color, are used to represent the global and local features. By virtue of the use of both global and local features as well as multiple appearances, our model is more suitable for describing multiscale vehicles in complex urban traffic conditions. Based on this multiscale model, an inference process using local features (local process) is integrated with a process using global features (global process) to detect multiscale vehicles. To evaluate the performance of our proposed method, a validation experiment, a quantitative evaluation, and a contrasting experiment are conducted. The experimental results show that our proposed approach can efficiently detect multiscale vehicles. In addition, the results also demonstrate that our approach is able to handle partial vehicle occlusion and various vehicle shapes and has great potential for real-world applications.
    Attached files: !A Novel Approach for Vehicle Detection Using an.pdf
    A correct perception of road signalizations is required for autonomous cars to follow the traffic codes. Road marking is a signalization present on road surfaces and commonly used to inform the correct lane cars must keep. Cameras have been widely used for road marking detection, however they are sensible to environment illumination. Some LIDAR sensors return infrared reflective intensity information which is insensible to illumination condition. Existing road marking detectors that analyzes reflective intensity data focus only on lane markings and ignores other types of signalization. We propose a road marking detector based on Otsu thresholding method that make possible segment LIDAR point clouds into asphalt and road marking. The results show the possibility of detecting any road marking (crosswalks, continuous lines, dashed lines). The road marking detector has also been integrated with Monte Carlo localization method so that its performance could be validated. According to the results, adding road markings onto curb maps lead to a lateral localization error of 0.3119 m.
    Attached files: Road Marking Detection Using LIDAR Reflective Intensity Data and its Application to Vehicle Localization.pdf
    Visual tracking is a challenging problem in computer vision. Most state-of-the-art visual trackers either rely on luminance information or use simple color representations for image description. Contrary to visual tracking, for object recognition and detection, sophisticated color features when combined with luminance have shown to provide excellent performance. Due to the complexity of the tracking problem, the desired color feature should be computationally efficient, and possess a certain amount of photometric invariance while maintaining high discriminative power. This paper investigates the contribution of color in a tracking-by-detection framework. Our results suggest that color attributes provides superior performance for visual tracking. We further propose an adaptive low-dimensional variant of color attributes. Both quantitative and attributebased evaluations are performed on 41 challenging benchmark color sequences. The proposed approach improves the baseline intensity-based tracker by 24% in median distance precision. Furthermore, we show that our approach outperforms state-of-the-art tracking methods while running at more than 100 frames per second.
    Attached files: Adaptive Color Attributes for Real-Time Visual Tracking.pdf
    Recently dense trajectories were shown to be an efficient video representation for action recognition and achieved state-of-the-art results on a variety of datasets. This paper improves their performance by taking into account camera motion to correct them. To estimate camera motion, we match feature points between frames using SURF descriptors and dense optical flow, which are shown to be complementary. These matches are, then, used to robustly estimate a homography with RANSAC. Human motion is in general different from camera motion and generates inconsistent matches. To improve the estimation, a human detector is employed to remove these matches. Given the estimated camera motion, we remove trajectories consistent with it. We also use this estimation to cancel out camera motion from the optical flow. This significantly improves motion-based descriptors, such as HOF and MBH. Experimental results on four challenging action datasets (i.e., Hollywood2, HMDB51, Olympic Sports and UCF50) significantly outperform the current state of the art.
    Attached files: wang_iccv13.pdf
    Visual tracking is a challenging problem in computer vision. Most state-of-the-art visual trackers either rely on luminance information or use simple color representations for image description. Contrary to visual tracking, for object recognition and detection, sophisticated color features when combined with luminance have shown to provide excellent performance. Due to the complexity of the tracking problem, the desired color feature should be computationally efficient, and possess a certain amount of photometric invariance while maintaining high discriminative power. This paper investigates the contribution of color in a tracking-by-detection framework. Our results suggest that color attributes provides superior performance for visual tracking. We further propose an adaptive low-dimensional variant of color attributes. Both quantitative and attributebased evaluations are performed on 41 challenging benchmark color sequences. The proposed approach improves the baseline intensity-based tracker by 24% in median distance precision. Furthermore, we show that our approach outperforms state-of-the-art tracking methods while running at more than 100 frames per second.
    Attached files: Adaptive Color Attributes for Real-Time Visual Tracking.pdf
    To model a scene for background subtraction, Gaussian mixture modeling (GMM) is a popular choice for its capability of adaptation to background variations. However, GMM often suffers from a tradeoff between robustness to background changes and sensitivity to foreground abnormalities and is inefficient in managing the tradeoff for various surveillance scenarios. By reviewing the formulations of GMM, we identify that such a tradeoff can be easily controlled by adaptive adjustments of the GMM?s learning rates for image pixels at different locations and of distinct properties. A new rate control scheme based on high-level feedback is then developed to provide better regularization of background adaptation for GMM and to help resolving the tradeoff. Additionally, to handle lighting variations that change too fast to be caught by GMM, a heuristic rooting in frame difference is proposed to assist the proposed rate control scheme for reducing false foreground alarms. Experiments show the proposed learning rate control scheme, together with the heuristic for adaptation of over-quick lighting change, gives better performance than conventional GMM approaches.
    Foreground/background segmentation via change detection in video sequences is often used as a stepping stone in high-level analytics and applications. Despite the wide variety of methods that have been proposed for this problem, none has been able to fully address the complex nature of dynamic scenes in real surveillance tasks. In this paper, we present a universal pixel-level segmentation method that relies on spatiotemporal binary features as well as color information to detect changes. This allows camouflaged foreground objects to be detected more easily while most illumination variations are ignored. Besides, instead of using manually set, frame-wide constants to dictate model sensitivity and adaptation speed, we use pixel-level feedback loops to dynamically adjust our method?s internal parameters without user intervention. These adjustments are based on the continuous monitoring of model fidelity and local segmentation noise levels. This new approach enables us to outperform all 32 previously tested state-of-the-art methods on the 2012 and 2014 versions of the ChangeDetection.net dataset in terms of overall F-Measure. The use of local binary image descriptors for pixel-level modeling also facilitates high-speed parallel implementations: our own version, which used no low-level or architecture-specific instruction, reached real-time processing speed on a midlevel desktop CPU. A complete C++ implementation based on OpenCV is available online.
    Attached files: seminar_june.pdf
    We propose a new method for achieving robust text segmentation in images by using a stroke filter. It is known that to segment text accurately and robustly from a complex background is a very difficult task. Most of the existing methods are sensitive to text color, size, font, and background clutter, because they use simple segmentation methods or require prior knowledge about text shape. In this paper, we attempt to consider the intrinsic characteristics of the text by using the stroke filter and design a new and robust algorithm for text segmentation. First, we describe the stroke filter briefly based on local region analysis. Second, the determination of text color polarity and local region growing procedures are performed successively based on the response of the stroke filter. Finally, the feedback procedure by the recognition score from an optical character recognition (OCR) module is used to improve the performance of text segmentation. By means of experiments on a large database, we demonstrate that the performance of our method is quite impressive from the viewpoints of the accuracy and robustness.
    Attached files: A new apporach for text segmentation using a stroke filter.pdf
    We propose a novel system for the automatic detection and recognition of text in traffic signs. Scene structure is used to define search regions within the image, in which traffic sign candidates are then found. Maximally stable extremal regions (MSERs) and hue, saturation, and value color thresholding are used to locate a large number of candidates, which are then reduced by applying constraints based on temporal and structural information. A recognition stage interprets the text contained within detected candidate regions. Individual text characters are detected as MSERs and are grouped into lines, before being interpreted using optical character recognition (OCR). Recognition accuracy is vastly improved through the temporal fusion of text results across consecutive frames. The method is comparatively evaluated and achieves an overall Fmeasure of 0.87.
    Attached files: Recognizing Text-Based Traffic Signs.pdf
    State-of-the-art Multi-View Stereo (MVS) algorithms deliver dense depth maps or complex meshes with very high detail, and redundancy over regular surfaces. In turn, our interest lies in an approximate, but light-weight method that is better to consider for large-scale applications, such as urban scene reconstruction from ground-based images. We present a novel approach for producing dense reconstructions from multiple images and from the underlying sparse Structure-from-Motion (SfM) data in an efficient way. To overcome the problem of SfM sparsity and textureless areas, we assume piecewise planarity of man-made scenes and exploit both sparse visibility and a fast over-segmentation of the images. Reconstruction is formulated as an energydriven, multi-view plane assignment problem, which we solve jointly over superpixels from all views while avoiding expensive photoconsistency computations. The resulting planar primitives ? defined by detailed superpixel boundaries ? are computed in about 10 seconds per image.
    Images captured during sandstorm conditions frequently feature degraded visibility and undesirable color cast effects. In such situations, traditional visibility restoration approaches usually cannot adequately restore images due to poor estimation of haze thickness and the persistence of color cast problems. In this paper, we present a novel aplacian-based visibility restoration approach to effectively solve inadequate haze thickness estimation and alleviate color cast problems. By doing so, a high-quality image with clear visibility and vivid color can be generated. Experimental results via qualitative and quantitative evaluations demonstrate that the proposed method can dramatically improve images captured during inclement weather conditions and produce results superior to those of other state-of-the-art methods.
    A novel approach to detection of stationary objects in the video stream is presented. Stationary objects are these separated from the static background, but remaining motionless for a prolonged time. Extraction of stationary objects from images is useful in automatic detection of unattended luggage. The proposed algorithm is based on detection of image regions containing foreground image pixels having stable values in time and checking their correspondence with the detected moving objects. In the first stage of the algorithm, stability of individual pixels belonging to moving objects is tested using a model constructed from vectors. Next, clusters of pixels with stable color and brightness are extracted from the image and related to contours of the detected moving objects. This way, stationary (previously moving) objects are detected. False contours of objects removed from the background are also found and discarded from the analysis. The results of the algorithm may be analyzed further by the classifier, separating luggage from other objects, and the decision system for unattended luggage detection. The main focus of the paper is on the algorithm for extraction of stable image regions. However, a complete framework for unattended luggage detection is also presented in order to show that the proposed approach provides data for successful event detection. The results of experiments in which the proposed algorithm was validated using both standard datasets and video recordings from a real airport security system are presented and discussed.
    Attached files: art%3A10.1007%2Fs11042-014-2324-4.pdf
    Future vehicle systems for active pedestrian safety will not only require a high recognition performance but also an accurate analysis of the developing traffic situation. In this paper, we present a study on pedestrian path prediction and action classification at short subsecond time intervals.We consider four representative approaches: two novel approaches (based on Gaussian process dynamical models and probabilistic hierarchical trajectory matching) that use augmented features derived from dense optical flow and two approaches as baseline that use positional information only (a Kalman filter and its extension to interacting multiple models). In experiments using stereo vision data obtained from a vehicle, we investigate the accuracy of path prediction and action classification at various time horizons, the effect of various errors (image localization, vehicle egomotion estimation), and the benefit of the proposed approaches. The scenario of interest is that of a crossing pedestrian, who might stop or continue walking at the road curbside. Results indicate similar performance of the four approaches on walking motion, with near-linear dynamics. During stopping, however, the two newly proposed approaches, with nonlinear and/or higher order models and augmented motion features, achieve a more accurate position prediction of 10?50 cm at a time horizon of 0?0.77 s around the stopping event.
    Attached files: Will the Pedestrian Crossing (IEEE ITS 2014).pdf
    3D shape is a crucial but heavily underutilized cue in today?s computer vision system, mostly due to the lack of a good generic shape representation. With the recent availability of inexpensive 2.5D depth sensors (e.g. Microsoft Kinect), it is becoming increasingly important to have a powerful 3D shape model in the loop. Apart from object recognition on 2.5D depth maps, recovering these incomplete 3D shapes to full 3D is critical for analyzing shape variations. To this end, we propose to represent a geometric 3D shape as a probability distribution of binary variables on a 3D voxel grid, using a Convolutional Deep Belief Network. Our model, 3D ShapeNets, learns the distribution of complex 3D shapes across different object categories and arbitrary poses. It naturally supports joint object recognition and shape reconstruction from 2.5D depth maps, and further, as an additional application it allows active object recognition through view planning. We construct a large scale 3D CAD model dataset to train our model, and conduct extensive experiments to study our new representation.
    Attached files: 3D ShapeNets A Deep Representation for Volumetric Shape Modeling.pdf
    This paper presents an effective and efficient approach to extracting captions from videos. The robustness of our system comes from two aspects of contributions. First, we propose a novel stroke-like edge detection method based on contours, which can effectively remove the interference of non-stroke edges in complex background so as to make the detection and localization of captions much more accurate. Second, our approach highlights the importance of temporal feature, i.e., inter-frame feature, in the task of caption extraction (detection, localization, segmentation). Instead of regarding each video frame as an independent image, through fully utilizing the temporal feature of video together with spatial analysis in the computation of caption localization, segmentation and post-processing, we demonstrate that the use of inter-frame information can effectively improve the accuracy of caption localization and caption segmentation. In the comprehensive our evaluation experiments, the experimental results on two representative datasets have shown the robustness and efficiency of our approach.
    Attached files: Robustly extracting captions in videos.pdf
    Every year, a large number of wildfires all over the world burn forested lands, causing adverse ecological, economic, and social impacts. Beyond taking precautionary measures, early warning and immediate response are the only ways to avoid great losses. To this end, in this paper we propose a computer vision approach for fire-flame detection to be used by an early warning fire monitoring system. Initially, candidate fire regions in a frame are defined using background subtraction and color analysis based on a nonparametric model. Subsequently, the fire behavior is modeled by employing various spatio-temporal features, such as color probability, flickering, spatial, and spatiotemporal energy, while dynamic texture analysis is applied in each candidate region using linear dynamical systems and a bagof-systems approach. To increase the robustness of the algorithm, the spatio-temporal consistency energy of each candidate fire region is estimated by exploiting prior knowledge about the possible existence of fire in neighboring blocks from the current and previous video frames. As a final step, a two-class support vector machine classifier is used to classify the candidate regions. Experimental results have shown that the proposed method outperforms existing state-of-the-art algorithms.
    Attached files: 20150411-report-Alexander_Filonenko.pdf
    This paper presents a survey of literature about road feature extraction, giving a detailed description of a Mobile Laser Scanning (MLS) system (RIEGL VMX-450) for transportationrelated applications. This paper describes the development of automated algorithms for extracting road features (road surfaces, road markings, and pavement cracks) from MLS point cloud data. The proposed road surface extraction algorithm detects road curbs from a set of profiles that are sliced along vehicle trajectory data. Based on segmented road surface points, we create Geo- Referenced Feature (GRF) images and develop two algorithms, respectively, for extracting the following: 1) road markings with high retroreflectivity and 2) cracks containing low contrast with their surroundings, low signal-to-noise ratio, and poor continuity.
    Attached files: 20150404-report-Yang Yu.pptx
    Motivated by the 2013 International UAV Innovation Grand Prix, we design and implement a real-time vision system for an unmanned helicopter autonomously transferring cargoes between two platforms. In the competition, four cargoes are initially placed inside four circles on one platform, respectively. They are required to be transferred one by one into the four circles on the other platform. This paper presents the core algorithms of the proposed vision system on ellipse detection, ellipse tracking, and single-circle-based position estimation. Experiments and the great success of our team in the competition have verified the efficiency, accuracy, and robustness of the algorithms. Our team was ranked first in the final round competition.
    Tracking-based approaches for abandoned object detection often become unreliable in complex surveillance videos due to occlusions, lighting changes, and other factors. We present a new framework to robustly and efficiently detect abandoned and removed objects based on background subtraction and foreground analysis with complement of tracking to reduce false positives. In our system, the background is modeled by three Gaussian mixtures. In order to handle complex situations, several improvements are implemented for shadow removal, quick lighting change adaptation, fragment reduction, and keeping a stable update rate for video streams with different frame rates. Then, the same Gaussian mixture models used for background subtraction are employed to detect static foreground regions without extra computation cost. Furthermore, the types of the static regions (abandoned or removed) are determined by using a method that exploits context information about the foreground masks, which significantly outperforms previous edge-based techniques. Based on the type of the static regions and userdefined parameters (e.g., object size and abandoned time), a matching method is proposed to detect abandoned and removed objects. A person-detection process is also integrated to distinguish static objects from stationary people. The robustness and efficiency of the proposed method is tested on IBM Smart Surveillance Solutions for public safety applications in big cities and evaluated by several public databases such as i-Lids and PETS2006 datasets. The test and evaluation demonstrate our method is efficient to run in real-time while being robust to quick lighting changes and occlusions in complex environments.
    Attached files: SMCC-Abandoned.pdf
    We present a novel Dynamic Bayesian Network for pedestrian path prediction in the intelligent vehicle domain. The model incorporates the pedestrian situational awareness, situation criticality and spatial layout of the environment as latent states on top of a Switching Linear Dynamical System (SLDS) to anticipate changes in the pedestrian dynamics. Using computer vision, situational awareness is assessed by the pedestrian head orientation, situation criticality by the distance between vehicle and pedestrian at the expected point of closest approach, and spatial layout by the distance of the pedestrian to the curbside. Our particular scenario is that of a crossing pedestrian, who might stop or continue walking at the curb. In experiments using stereo vision data obtained from a vehicle, we demonstrate that the proposed approach results in more accurate path prediction than only SLDS, at the relevant short time horizon (1 s), and slightly outperforms a computationally more demanding state-of-the-art method
    Attached files: Context-Based Pedestrian Path Prediction-eccv14.pdf
    Abstract. The depth information of RGB-D sensors has greatly simplified some common challenges in computer vision and enabled breakthroughs for several tasks. In this paper, we propose to use depth maps for object detection and design a 3D detector to overcome the major difficulties for recognition, namely the variations of texture, illumination, shape, viewpoint, clutter, occlusion, selfocclusion and sensor noises. We take a collection of 3D CAD models and render each CAD model from hundreds of viewpoints to obtain synthetic depth maps. For each depth rendering, we extract features from the 3D point cloud and train an Exemplar-SVM classifier. During testing and hard-negative mining, we slide a 3D detection window in 3D space. Experiment results show that our 3D detector significantly outperforms the state-of-the-art algorithms for both RGB and RGBD images, and achieves about 1:7 improvement on average precision compared to DPM and R-CNN. All source code and data are available online.
    Attached files: Sliding Shapes for 3D Object Detection in Depth Images.pdf
    We present in this article a video OCR system that detects and recognizes overlaid texts in video as well as its application to person identification in video documents. We proceed in several steps. First, text detection and temporal tracking are performed. After adaptation of images to a standard OCR system, a final post-processing combines multiple transcriptions of the same text box. The semi-supervised adaptation of this system to a particular video type (video broadcast from a French TV) is proposed and evaluated. The system is efficient as it runs 3 times faster than real time (including the OCR step) on a desktop Linux box. Both text detection and recognition are evaluated individually and through a person recognition task where it is shown that the combination of OCR and audio (speaker) information can greatly improve the performances of a state of the art audio based person identification system.
    Attached files: From text detection in videos to person identification.pdf
    This paper presents an application for counting people through a single fixed camera. This system performs the count distinction between input and output of people moving through the supervised area. The counter requires two steps: detection and tracking. The detection is based on finding people?s heads through preprocessed image correlation with several circular patterns. Tracking is made through the application of a Kalman filter to determine the trajectory of the candidates. Finally, the system updates the counters based on the direction of the trajectories. Different tests using a set of real video sequences taken from different indoor areas give results ranging between 87% and 98% accuracies depending on the volume of flow of people crossing the counting zone. Problematic situations, such as occlusions, people grouped in different ways, scene luminance changes, etc., were used to validate the performance of the system.
    Attached files: 20150124-report-Alexander_Filonenko.pdf
    In this paper, we present a moving object detection system named Flux Tensor with Split Gaussian models (FTSG)that exploits the benefits of fusing a motion computation method based on spatio-temporal tensor formulation, a novel foreground and background modeling scheme, and a multi-cue appearance comparison. This hybrid system can handle challenges such as shadows, illumination changes, dynamic background, stopped and removed objects. Extensive testing performed on the CVPR 2014 Change Detection benchmark dataset shows that FTSG outperforms state-ofthe-art methods.
    We present a method to extract the contour of geometric objects embedded in binary digital images using techniques in computational geometry. Rather than directly dealing with pixels as in traditional contour extraction methods, we process on object point set extracted from the image. The proposed algorithm works in four phases: point extraction, Euclidean graph construction, point linking and contour simplification. In point extraction phase, all pixels that represent the object pattern are extracted as a point set from the input image. We use the color segmentation to distinguish the object pixels from the background pixels. In the second phase, a geometric graph G=(V,E) is constructed, where V consists of the extracted object point set and E consists of all possible edges whose Euclidean distance is less than a threshold parameter, l ; which can be derived from the available information from the point set. In point linking phase, all border points are connected to generate the contour using the orientation information inferred from the clockwise turn angle at each border point. Finally, the extracted contour is simplified using collinearity check. Experiments on various standard binary images show that the algorithm is capable of constructing contours with high accuracy and achieves high compression ratio in noisy and non-noisy binary images.
    Attached files: CADA-ContourExtraction.pdf 20150124-report-Yang Yu.pptx
    This research is conducted to detect crosswalks and traffic lights with small false positive and negative errors. We propose an integral framework of the algorithms. Crosswalk and traffic light detection based on vision are challenging problem because brightness and hue value of scene can be easily changed, and it causes a number of false positive and negative errors. In order to solve it, we integrate those two algorithms with an assumption that traffic light and crosswalk exist together frequently. The proposed algorithm is tested with movies captured by cameras located on a vehicle and results are compared with ground truth which was chosen manually. The proposed algorithms run in real-time and can be easily installed on vehicle.
    Multi-view stereo (MVS) algorithms now produce reconstructions that rival laser range scanner accuracy. However, stereo algorithms require textured surfaces, and therefore work poorly for many architectural scenes (e.g., building interiors with textureless, painted walls). This paper presents a novel MVS approach to overcome these limitations for Manhattan Worldscenes, i.e., scenes that consists of piece-wise planar surfaces with dominant directions. Given a set of calibrated photographs, we first reconstruct textured regions using an existing MVS algorithm, then extract dominant plane directions, generate plane hypotheses, and recover per-view depth maps using Markov random fields. We have tested our algorithm on several datasets ranging from office interiors to outdoor buildings, and demonstrate results that outperform the current state of the art for such texture-poor scenes.
    Attached files: [5] CVPR 2009_Manhattan-world Stereo_presentation.pptx [5] CVPR 2009_Manhattan-world Stereo.pdf
News | About us | Research | Lectures