Home | Login
Lectures       Previous announcements
Select year: 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017
Seminars in 2012
    Scorebox plays an important role in understanding contents of sports videos. However, the tiny scorebox may give the small-display-viewers uncomfortable experience in grasping the game situation. In this paper, we propose a novel framework to extract the scorebox from sports video frames. We first extract candidates by using accumulated intensity and edge information after short learning period. Since there are various types of scoreboxes inserted in sports videos, multiple attributes need to be used for efficient extraction. Based on those attributes, the optimal information gain is computed and top three ranked attributes in terms of information gain are selected as a three-dimensional feature vector for Support Vector Machines (SVM) to distinguish the scorebox from other candidates, such as logos and advertisement boards. The proposed method is tested on various videos of sports games and experimental results show the efficiency and robustness of our proposed method.
    Attached files: Scorebox extraction from mobile sports videos using support vector machines.pdf
    Abstract-Automatic face recognition is one of the most challenging tasks in fields of computer vision and pattern recognition, and face detection is the first critical step in full automatic face recognition system. The skin-color feature is an effective feature, but this feature is interfered easily. This paper proposes a method of face detection from a picture based on an improved skin-color model. Firstly, use an improved “reference white” method to remove the interference of non-skin-color region; and then design colorclassifier based on statistic large number of skin-color pixels and detect each pixel in color picture is skin-color or nonskin-color through the color-classifier; finally, detect face on the candidate regions and remove the non-face regions, and then locate the face regions. Experimental results show that the algorithm can effectively detect face with skin-color interference under complex background. Keywords-Face detection; Skin-color model; Skin-color classifier; Reference white; Non-face regions
    Attached files: Fast face detection algorithm based on improved skin-color model and adaptive threshold.pdf
    This paper introduces a novel framework for estimating the motion of a robotic car from image information,a scenario widely known as visual odometry. Most current monocular visual odometry algorithms rely on a calibrated camera model and recover relative rotation and translation by tracking image features and applying geometrical constraints. This approach has some drawbacks: translation is recovered up to a scale, it requires camera calibration which can be tricky under certain conditions, and uncertainty estimates are not directly obtained. We propose an alternative approach that involves the use of semi-parametric tatistical models as means to recover scale, infer camera parameters and provide uncertainty estimates given a training dataset. As opposed to conventional non-parametric machine learning procedures, where standard models for egomotion would be neglected, we present a novel framework in which the existing parametric models and powerful non-parametric Bayesian learning proce-dures are combined. We devise a multiple output Gaussian Process (GP)procedure, named Coupled GP, that uses a parametric model as the mean function and a non-stationary covariance function to map image features directly into vehicle motion. Additionally, this procedure is also able to infer joint uncertainty estimates (full covariance matrices) for rotation and translation. Experiments performed using data collected from a single camera under challenging conditions show that this technique outperforms traditional methods in trajectories of several kilometers.
    Attached files: 0 Semi-parametric Models for Visual Odometry ICRA2012.pdf
    In this paper, we present a method for extracting consistent foreground regions when multiple views of a scene are available. We propose a framework that automatically identifies such regions in images under the assumption that, in each image, background and foreground regions present different color properties. To achieve this task, monocular color information is not sufficient and we exploit the spatial consistency constraint that several image projections of the same space region must satisfy. Combining the monocular color consistency constraint with multiview spatial constraints allows us to automatically and simultaneously segment the foreground and background regions in multiview images. In contrast to standard background subtraction methods, the proposed approach does not require a priori knowledge of the background nor user interaction. Experimental results under realistic scenarios demonstrate the effectiveness of the method for multiple camera set ups.
    Attached files: Silhouette Segmentation in Multiple Views.pdf Silhouette Segmentation in Multiple Views.pptx
    In this paper we present a method for 3D urban reconstruction from a single catadioptric omnidirectional image. Firstly, we classify the catadioptric omnidirectional image to horizontal ground, vertical building surface and vertical background surface through the registration between catadioptric omnidirectional image and remote sensing image. According to the classification results, we recover the geometry based on the catadioptric projection model. The experiment shows that our method is feasible and realizes a precise 3D reconstruction for the city scenes.
    Attached files: Automatic 3D Reconstruction.pdf
    Due to the limitation of dynamic range, a single still image is usually insufficient to describe a high contrast scene. Fusing multi-exposure images of the same scene can produce a resulting image with details both in bright and dark regions. However, they may be sensitive to the exposure parameters of the input images. To improve the robustness of the method, a novel layered-based exposure fusion algorithm is proposed in this paper. In our algorithm, a global-layer is introduced to improve the robustness of the fusion method. The global-layer is employed to preserve the overall luminance of a real scene and avoid possible luminance reversion. Then details are recovered in gradient domain by a Poisson solver. Experimental results show the superior performance of our approach in terms of robustness and color consistency.
    Attached files: hdr.pdf
    The last generation of consumer electronic devices is endowed with Augmented Reality (AR) tools. These tools require moving object detection strategies, which should be fast and efficient, to carry out higher level object analysis tasks. We propose a lightweight spatio-temporal-based non-parametric background-foreground modeling strategy in a General Purpose Graphics Processing Unit (GPGPU), which provides real-time high-quality results in a great variety of scenarios and is suitable for AR applications.
    Attached files: Moving Object Detection.pdf
    This paper presents two new, efficient solutions to the two-view,relative pose problem from three image point correspondences and one common reference direction. This three-plus-one problem can be used either as a substitute for the classic five-point algorithm, using a vanishing point for the reference direction, or to make use of an inertial measurement unit commonly available on robots and mobile devices where the gravity vector becomes the reference direction. We provide a simple, closed-form solution and a solution based on algebraic geometry which offers numerical advantages. In addition, we introduce a new method for computing visual odometry with RANSAC and four point correspondences per hypothesis. In a set of real experiments, we demonstrate the power of our approach by comparing it to the five-point method in a hypothesize and-test visual odometry setting.
    Attached files: 2012 Two Efficient Solutions for Visual OdometryUsing Directional Correspondence.pdf
    Occlusion and lack of visibility in crowded and cluttered scenes make it difficult to track individual people correctly and consistently, particularly in a single view. We present a multiview approach to solve this problem. In our approach, we neither detect nor track objects from any single camera or camera pair; rather, evidence is gathered from all of the cameras into a synergistic framework and detection and tracking results are propagated back to each view. Unlike other multiview approaches that require fully calibrated views, our approach is purely image-based and uses only 2D constructs. To this end, we develop a planar homographic occupancy constraint that fuses foreground likelihood information from multiple views to resolve occlusions and localize people on a reference scene plane. For greater robustness, this process is extended to multiple planes parallel to the reference plane in the framework of plane to plane homologies. Our fusion methodology also models scene clutter using the Schmieder and Weathersby clutter measure, which acts as a confidence prior, to assign higher fusion weight to views with lesser clutter. Detection and tracking are performed simultaneously by graph cuts segmentation of tracks in the space-time occupancy likelihood data. Experimental results with detailed qualitative and quantitative analysis are demonstrated in challenging multiview crowded scenes.
    Attached files: Tracking Multiple Occluding People by Localizing on Multiple Scene Planes.pptx Tracking Multiple Occluding People by Localizing on Multiple Scene Planes.pdf
    In this paper, we propose Hangul grapheme segmentation method by structural approach, which is developed on the machine printed characters with the widely known fonts such as Myunjo, Gulim and Gothic and applied to much more deformed fonts. The process is composed of two steps. One is a structural grapheme segmentation to the characters classified into 20 types, more a reasonable classification than 6 types in that the algorithm of the grapheme segmentation can be simpler and more effective by the intensified common features of 20 types. Furthermore, it is quite easy for 20 type classified characters to be postprocessed using the connected components separated by the boundary information. With the proposed method, we got 99% correct segmentation rate with very high execution speed
    Attached files: Graphe Hangul Segmentation.pdf
    Abstract. Most commercial television channels use video logos, which can be considered a form of visible watermark, as a declaration of intellectual property ownership. They are also used as a symbol of authorization to rebroadcast when original logos are used in conjunction with newer logos. An unfortunate side effect of such logos is the concomitant decrease in viewing pleasure. In this paper, we use the temporal correlation of video frames to detect and remove video logos. In the video-logo-detection part, as an initial step, the logo boundary box is first located by using a distance threshold of video frames and is further refined by employing a comparison of edge lengths. Second, our proposed Bayesian classifier framework locates fragments of logos called logolets. In this framework, we systematically integrate the prior knowledge about the location of the video logos and their intrinsic local features to achieve a robust detection result. In our logo-removal part, after the logo region is marked, a matching technique is used to find the best replacement patch for the marked region within that video shot. This technique is found to be useful for small logos. Furthermore, we extend the image inpainting technique to videos. Unlike the use of 2D gradients in the image inpainting technique, we inpaint the logo region of video frames by using 3D gradients exploiting the temporal correlations in video. The advantage of this algorithm is that the inpainted regions are consistent with the surrounding texture and hence the result is perceptually pleasing. We present the results of our implementation and demonstrate the utility of our method for logo removal.
    Attached files: Automatic video logo detection and removal.pdf
    Abstract: Face Detection is image processing of determining the face location, size and number. Meantime, Face Detection is the premise of face recognition, human-computer interaction and so on. This paper presents a new Face Detection method, which firstly clusters skin-color model in YCbCr chrominance space with the templates collected, then locates candidate face areas through the given skin-color model. After the normalization of the candidate face areas, a calculation of Hausdorff distance is performed between the given template and the candidates. Finally according to the length of the distance, whether the given area is face or not is determined. Plentiful experiments indicate that this method possesses high accuracy. Keywords: skin-color clustering, template matching, face detection
    Attached files: Face Detection Technology Based on Skin Color Segmentation and Template Matching (2010).pdf
    We propose bridge routing based on network coding for wireless mesh network. Our bridge routing offers the solution to exploit the network coding to minimize the usage of time slot. We present feasible and practical ways to study the performance of routing with network coding, as compared to the conventional shortest path algorithms. Bridge routing consists of two procedures, node coordination procedure which builds bridge and routing procedure, and it works in a decentralized way. Simulation results show that our bridge routing is more efficient than the shortest path algorithm, which its performance depends on the network connectivity.
    Attached files: network coding-based bridge routing in wireless mesh network.pdf
    Vision sensors give mobile robots a relatively cheap means of obtaining rich 3D information of their environment, but lack the depth information that a laser range finder can provide. This paper describes a novel composite sensor approach that combines the information given by an omnidirectional camera and a laser range finder to efficiently solve the indoor Simultaneous Localization and Mapping problem and reconstruct a 3D representation of the environment. We report the results of validating our methodology using a mobile robot equipped with a 2D laser range finder and an omnidirectional camera.
    Attached files: Indoor SLAM Based on Composite Sensor Mixing Laser Scans and.pdf
    Text detection in natural images has gained much attention in the last years as it is a primary step towards fully autonomous text recognition. Understanding the visual text content is of a vital importance in many applicative areas from the internet search engines to the PDA signboard translators. Images of natural scenes, however, pose numerous difficulties compared to the traditional scanned documents. They mainly contain diverse complex text of different sizes, styles and colors with complex backgrounds. Furthermore, such images are captured under variable lighting conditions and are often affected by the skew distortion and perspective projections. In this article an improved edge profile based text detection method is presented. It uses a set of heuristic rules to eliminate detection of non-text areas. The method is evaluated on CVL OCR DB, an annotated image database of text in natural scenes.
    Attached files: improved_eurocon_2011[1].pdf
    Abstract: The authors propose a vision-based automatic system to detect preceding vehicles on the highway under various lighting and different weather conditions. To adapt to different characteristics of vehicle appearance under various lighting conditions, four cues including underneath shadow, vertical edge, symmetry and taillight are fused for the vehicle detection. The authors achieve this goal by generating probability distribution of vehicle under particle filter framework through the processes of initial sampling, propagation, observation, cue fusion and evaluation. Unlike normal particle filter focusing on single target distribution in a state space, the authors detect multiple vehicles with a single particle filter through a high-level tracking strategy using clustering. In addition, the data-driven initial sampling technique helps the system detect new objects and prevent the multi-modal distribution from collapsing to the local maxima. Experiments demonstrate the effectiveness of the proposed system.
    Attached files: Vehicle detection and tracking under various lighting conditions using a particle filter.pdf
    We integrate the cascade-of-rejectors approach with the Histograms of Oriented Gradients (HoG) features to achieve a fast and accurate human detection system. The features used in our system are HoGs of variable-size blocks that capture salient features of humans automatically. Us-ing AdaBoost for feature selection, we identify the appro-priate set of blocks, from a large set of possible blocks. In our system, we use the integral image representation and a rejection cascade which significantly speed up the compu-tation. For a 320 × 280 image, the system can process 5 to 30 frames per second depending on the density in which we scan the image, while maintaining an accuracy level similar to existing methods.
    Attached files: 20120428-the Saturday Seminar-AdaBoost-Multiple HOG.pptx
    We propose a novel localization method for outdoor mobile robots using High Dynamic Range (HDR) vision technology. To obtain an HDR image, multiple images at different exposures is typically captured and combined. However, since mobile robots can be moving during a capture sequence, images cannot be fused easily. Instead, we generate a set of keypoints that incorporates those detected in each image. The position of the robot is estimated using the keypoint sets to match measured positions with a map. We conducted experimental comparisons of HDR and auto-exposure images, and our HDR method showed higher robustness and localization accuracy.
    Attached files: A High Dynamic Range Vision Approach to Outdoor Localization.pdf hdr + localization.ppt Monte Carlo Localization forMobile Robots.pdf
    Visual surveillance using multiple cameras has attracted increasing interest in recent years. Correspondence between multiple cameras is one of the most important and basic problems which visual surveillance using multiple cameras brings. In this paper, we propose a simple and robust method, based on principal axes of people, to match people across multiple cameras. The correspondence likelihood reflecting the similarity of pairs of principal axes of people is constructed according to the relationship between ¡±ground-points¡± of people detected in each camera view and the intersections of principal axes detected in different camera views and transformed to the same view. Our method has the following desirable properties: 1) Camera calibration is not needed. 2) Accurate motion detection and segmentation are less critical due to the robustness of the principal axis-based feature to noise. 3) Based on the fused data derived from correspondence results, positions of people in each camera view can be accurately located even when the people are partially occluded in all views. The experimental results on several real video sequences from outdoor environments have demonstrated the effectiveness, efficiency, and robustness of our method.
    Attached files: Principal Axis-Based Correspondence between Multiple Cameras for People Tracking.pptx principal axis-based correspondence between multiple cameras for people tracking.pdf
    ABSTRACT Most computer graphics pictures have been computed all at once, so that the rendering program takes care of all computations relating to the overlap of objects. There are several applications, however, where elements must be rendered separately, relying on eompositing techniques for the anti-aliased accumulation of the full image. This paper presents the case for four-channel pictures, demonstrating that a matte component can be computed similarly to the color channels. The paper discusses guidelines for the generation of elements and the arithmetic for their arbitrary compositing.
    Attached files: Compositing digital images.pdf
    Tilt license plate correction is an important part of the license plate recognition system. In reality, there are a lot of inclined license plates due to various reasons, such as the perspective distortion and uneven or curvy road surface. The usual rotation methods are often based only on one theory, which is difficult to use the advantages of different methods, and we do not know the rotation results are correct or not. We proposed a mutual correction method based on pairwise fitted parallel straight lines, which will provide much help in the credibility of verifying the result of line fitting method by measuring the parallelism of these two lines. If we find that this method fails, we can use another method to do tilt license plate correction or give up. The proposed method can provide reliable correction results, and utilize the advantages of different rotation algorithms. The experimental results show better results than only using one method.
    Attached files: license plate correction.ppt lp tilt correction.pdf
    his paper describes a face detection framework that is capable of processing images extremely rapidly while achieving high detection rates. There are three key contributions. The first is the introduction of a new image representation called the “Integral Image” which allows the features used by our detector to be computed very quickly. The second is a simple and efficient classifier which is built using the AdaBoost learning algorithm to select a small number of critical visual features from a very large set of potential features. The third contribution is a method for combining classifiers in a “cascade” which allows background regions of the image to be quickly discarded while spending more computation on promising face-like regions. A set of experiments in the domain of face detection is presented. The system yields face detection perfor-mance comparable to the best previous systems (Sung and Poggio, 1998; Rowley et al., 1998; Schneiderman and Kanade, 2000; Roth et al., 2000). Implemented on a conventional desktop, face detection proceeds at 15 frames persecond.
    Attached files: 1_Robust Real-Time Face Detection.pdf
News | About us | Research | Lectures