Browsing

Publication Tag: Machine Learning

An overview of all publications that have the tag you selected.

2022
3 citations
Proximally Sensitive Error for Anomaly Detection and Feature Learning
A. Gudi, F. Büttner, J. van Gemert
Mean squared error is widely used to measure differences between multi-dimensional entities, including images. However, MSE lacks local sensitivity as it doesn’t consider the spatial arrangement of pixel differences, which is crucial for structured data like images. Such spatial arrangements provide information about the source of differences; therefore, an error function that incorporates the location of errors can offer a more meaningful distance measure. We introduce Proximally Sensitive Error , suggesting that emphasizing regions in the error measure can highlight semantic differences between images over syntactic or random deviations. We demonstrate that this emphasis can be leveraged for anomaly or occlusion detection. Additionally, we explore its utility as a loss function to help models focus on learning representations of semantic objects instead of minimizing syntactic reconstruction noise.
2004
5 citations
Real time automatic scene classification
M. Israël, E.L. van den Broek, P. van der Putten, M.J. den Uyl
This work, part of the EU VICAR and SCOFI projects, aimed to develop a real-time video indexing, classification, annotation, and retrieval system. The authors introduced a generic approach for visual scene recognition using “typed patches”—groups of adjacent pixels characterized by local pixel distribution, brightness, and color. Each patch is described using an HSI color histogram and texture features. A fixed grid overlays the image, segmenting each cell into patches categorized by a classifier. Frequency vectors of these classified patches are concatenated to represent the entire image. Testing on eight scene categories from the Corel database showed 87.5% accuracy in patch classification and 73.8% in scene classification. The method’s advantages include low computational complexity and versatility for image classification, segmentation, and matching. However, manual classification of training patches is a drawback, prompting the development of algorithms for automatic extraction of relevant patch types. The approach was implemented in the VICAR project’s video indexing system for the Netherlands Institute for Sound and Vision and in the SCOFI project’s real-time Internet pornography filter, achieving 92% accuracy with minimal overblocking and underblocking.
2004
29 citations
Automating the Construction of Scene Classifiers for Content-Based Video Retrieval
M. Israël, E.L. van den Broek, P. van der Putten, M.J. den Uyl
This paper introduces a real-time automatic scene classifier within content-based video retrieval. In the proposed approach, end users like documentalists, not image processing experts, build classifiers interactively by simply indicating positive examples of a scene. Classification consists of a two-stage procedure: first, small image fragments called patches are classified; second, frequency vectors of these patch classifications are fed into a second classifier for global scene classification . The first-stage classifiers can be seen as a set of highly specialized, learned feature detectors, serving as an alternative to having an image processing expert determine features a priori. The paper presents results from experiments on a variety of patch and image classes. The scene classifier has been used successfully within television archives and for Internet porn filtering.
2006
15 citations
Learning a Sparse Representation from Multiple Still Images for On-Line Face Recognition in an Unconstrained Environment
J.W.H. Tangelder, B.A.M. Schouten
In a real-world environment a face detector can be applied to extract multiple face images from multiple video streams without constraints on pose and illumination. The extracted face images will have varying image quality and resolution. Moreover, also the detected faces will not be precisely aligned. This paper presents a new approach to on-line face identification from multiple still images obtained under such unconstrained conditions. Our method learns a sparse representation of the most discriminative descriptors of the detected face images according to their classification accuracies. On-line face recognition is supported using a single descriptor of a face image as a query. We apply our method to our newly introduced BHG descriptor, the SIFT descriptor, and the LBP descriptor, which obtain limited robustness against illumination, pose and alignment errors. Our experimental results using a video face database of pairs of unconstrained low resolution video clips of ten subjects, show that our method achieves a recognition rate of 94% with a sparse representation containing 10% of all available data, at a false acceptance rate of 4%.
2007
14 citations
Distance Measures for Gabor Jets-Based Face Authentication: A Comparative Evaluation
D. González-Jiménez, M. Bicego, J.W.H. Tangelder, B.A.M. Schouten, O. Ambekar, J.L. Alba-Castro, E. Grosso, M. Tistarelli
Local Gabor features have been widely used in face recognition systems. Once the sets of jets have been extracted from the two faces to be compared, a proper measure of similarity between corresponding features should be chosen. For instance, in the well-known Elastic Bunch Graph Matching approach and other Gabor-based face recognition systems, the cosine distance was used as a measure. In this paper, we provide an empirical evaluation of seven distance measures for comparison, using a recently introduced face recognition system, based on Shape Driven Gabor Jets . Moreover, we evaluate different normalization factors that are used to pre-process the jets. Experimental results on the BANCA database suggest that the concrete type of normalization applied to jets is a critical factor, and that some combinations of normalization and distance achieve better performance than the classical cosine measure for jet comparison.
2016
56 citations
Recognizing Semantic Features in Faces using Deep Learning
A. Gudi
The human face constantly conveys information, both consciously and subconsciously. However, as basic as it is for humans to visually interpret this information, it is quite a big challenge for machines. Conventional semantic facial feature recognition and analysis techniques are already in use and are based on physiological heuristics, but they suffer from lack of robustness and high computation time. This thesis aims to explore ways for machines to learn to interpret semantic information available in faces in an automated manner without requiring manual design of feature detectors, using the approach of Deep Learning. This thesis provides a study of the effects of various factors and hyper-parameters of deep neural networks in the process of determining an optimal network configuration for the task of semantic facial feature recognition. This thesis explores the effectiveness of the system to recognize the various semantic features present in faces. Furthermore, the relation between the effect of high-level concepts on low level features is explored through an analysis of the similarities in low-level descriptors of different semantic features. This thesis also demonstrates a novel idea of using a deep network to generate 3-D Active Appearance Models of faces from real-world 2-D images.
2016
44 citations
Human Pose Estimation in Space and Time using 3D CNN
A. Grinciunaite, A. Gudi, E. Tasli, M. Den Uyl
This paper explores the capabilities of convolutional neural networks to deal with a task that is easily manageable for humans: perceiving 3D pose of a human body from varying angles. However, in our approach, we are restricted to using a monocular vision system. For this purpose, we apply a convolutional neural network approach on RGB videos and extend it to three dimensional convolutions. This is done via encoding the time dimension in videos as the 3rd dimension in convolutional space, and directly regressing to human body joint positions in 3D coordinate space. This research shows the ability of such a network to achieve state-of-the-art performance on the selected Human3.6M dataset, thus demonstrating the possibility of successfully representing temporal data with an additional dimension in the convolutional operation.
2017
16 citations
Object Extent Pooling for Weakly Supervised Single-Shot Localization
A. Gudi, N. van Rosmalen, M. Loog, J. van Gemert
In the face of scarcity in detailed training annotations, the ability to perform object localization tasks in real-time with weak supervision is very valuable. However, the computational cost of generating and evaluating region proposals is heavy. We adapt the concept of Class Activation Maps into the very first weakly-supervised ‘single-shot’ detector that does not require the use of region proposals. To facilitate this, we propose a novel global pooling technique called Spatial Pyramid Averaged Max pooling for training this CAM-based network for object extent localization with only weak image-level supervision. We show this global pooling layer possesses a near ideal flow of gradients for extent localization, that offers a good trade-off between the extremes of max and average pooling. Our approach only requires a single network pass and uses a fast-backprojection technique, completely omitting any region proposal steps. To the best of our knowledge, this is the first approach to do so. Due to this, we are able to perform inference in real-time at 35fps, which is an order of magnitude faster than all previous weakly supervised object localization frameworks.
2021
26 citations
Efficiency in real-time webcam gaze tracking
A. Gudi, X Li and J. van Gemert
Efficiency and ease of use are essential for practical applications of camera-based eye/gaze-tracking. Gaze tracking involves estimating where a person is looking on a screen based on face images from a computer-facing camera. In this paper, we investigate two complementary forms of efficiency in gaze tracking: 1. The computational efficiency of the system, which is dominated by the inference speed of a CNN predicting gaze-vectors; 2. The usability efficiency, which is determined by the tediousness of the mandatory calibration of the gaze-vector to a computer screen. To do so, we evaluate the computational speed/accuracy trade-off for the CNN and the calibration effort/accuracy trade-off for screen calibration. For the CNN, we evaluate the full face, two-eyes, and single eye input. For screen calibration, we measure the number of calibration points needed and evaluate three types of calibration: 1. pure geometry, 2. pure machine learning, and 3. hybrid geometric regression. Results suggest that a single eye input and geometric regression calibration achieve the best trade-off.
2015
216 citations
Deep learning based FACS Action Unit occurrence and intensity estimation
A. Gudi, H. E. Tasli, T. M. den Uyl and A. Maroulis
Ground truth annotation of the occurrence and intensity of FACS Action Unit activation requires great amount of attention. The efforts towards achieving a common platform for AU evaluation have been addressed in the FG 2015 Facial Expression Recognition and Analysis challenge . Participants are invited to estimate AU occurrence and intensity on a common benchmark dataset. Conventional approaches towards achieving automated methods are to train multiclass classifiers or to use regression models. In this paper, we propose a novel application of a deep convolutional neural network to recognize AUs as part of FERA 2015 challenge. The 7 layer network is composed of 3 convolutional layers and a max-pooling layer. The final fully connected layers provide the classification output. For the selected tasks of the challenge, we have trained two different networks for the two different datasets, where one focuses on the AU occurrences and the other on both occurrences and intensities of the AUs. The occurrence and intensity of AU activation are estimated using specific neuron activations of the output layer. This way, we are able to create a single network architecture that could simultaneously be trained to produce binary and continuous classification output.

Request a free trial

Get your free example report

Get your free whitepaper