RESEARCH | mysite

RESEARCH INTERESTS:

Machine learning: Deep learning, dimensionality reduction, clustering, kernel methods, semi-supervised learning, transfer learning, active learning

computer vision: domain adaptation, dictionary learning, object recognition, activity recognition, shape representation, Multiview face recognition

Biometrics: biometrics recognition, biometrics template protection, cross-modality, multimodal biometrics
Signal & image processing: sparse representation, compressive sampling, automatic target recognition, multispectral & hyperspectral imaging, acoustic and seismic signal processing

RESEARCH PROJECTS:

1. Cross Audio-to-Visual Speaker Identification in the Wild Using Deep Learning:

Speaker recognition technology has achieved significant performance for some real-world applications. However, the performance of speaker recognition is still greatly degraded when used in noisy environments. One approach to improve speech recognition/identification is by combining video and audio sources to link the visual features of lip motion with vocal features, two modalities that are correlated and convey complementary information. In this project, we are interested in identifying an individual face from a coupled video/audio clip of several individuals based on data collected in an unrestricted environment (wild). For this effort, we are proposing to use visual lip motion features for a face in a video clip and the co-recorded audio signal features from several speakers to identify the individual who uttered the audio recorded along with the video. To solve this problem, we are proposing to use an auto-associative deep neural network architecture (as shown in Fig. 1)which is a data-driven model and does not model phonemes or visemes (the visual equivalent of a phoneme). A speech-to-video auto-associative deep network will be used where the network has learned to reconstruct the visual lip features given only speech features as the input. The visual lip feature vector generated by our deep network for an input test speech signal will be compared with a gallery of individual visual lip features for speaker identification. The proposed speech-to-video deep network will be trained with our current WVU voice and video training dataset using the corresponding audio and video features from individuals as inputs to the network. For the audio signal we will use the Mel-frequency cepstral coefficients (MFCC), and for video, we will extract static and temporal visual features of the lip motion.

Fig. 1. An auto-associative deep neural network architecture connecting phonemes to visemes.

2. Nonlinear Mapping using Deep Learning for Thermal-to-Visible Night-Time Face Recognition: Infrared (IR) thermal cameras are important for night-time surveillance and security applications. They are especially useful in nighttime scenarios when the subject is far away from the camera. The motivation behind thermal face recognition (FR) is the need for enhanced intelligence gathering capabilities in darkness where active illumination is impractical and when surveillance with visible cameras is not feasible. However, the acquired thermal face images have to be identified using the images from existing visible face databases. Therefore, cross-spectral face matching between the thermal and visible spectrum is a much desired capability. In cross-modal face recognition, identifying a thermal probe image based on a visible face database is especially difficult because of the wide modality gap between thermal and visible physical phenomenology. In this project we address the cross-spectral (thermal vs. visible) and cross-distance (50 m, 100 m, and 150 m vs. 1 m standoff) face matching problem for night-time FR applications. Previous research activities have mainly concentrated on extracting hand-crafted features (i.e., SIFT, SURF, HOG, LBP, wavelets, Gabor jets, kernel functions) by assuming that the two modalities share the same extracted features. However, the relationship between the two modalities is highly non-linear. In this project we will investigate non-linear mapping techniques based on deep neural networks (DNN) learning procedures (shown in Fig. 2) to bridge the modality gap between visible-thermal spectrums while preserving the subject identity information. The nonlinear coupled DNN features will be used by a FR classifier.

Fig. 2. Deep coupled CNN to perform mapping between thermal to visible

3. Face recognition for mobile passive video acquisition at a distance Scenario:

In this scenario we are considering a situation where a stationary or moving body-worn camera is observing or monitoring a crowed scene at a distance as shown in Fig. 3. Our objective is to detect and perform face recognition at a distance on the sensor or at a remote site. We propose to do this by performing a super-resolution algorithm on the video footage in order to increase the number of pixels on the target and identify key-frames which are then used to perform the face detection algorithm to obtain the face-chips as shown in Fig. 3. Once face-chips are extracted they are matched against a search list on the sensor or these face-chips are transmitted to a remote dedicated server with a much larger search list to identify each of the face-chips. This problem is more challenging than the close-up scenario since there are a large number of faces and the resolution on faces may not be sufficient to perform face identification. To improve the video resolution we propose to use super-resolution algorithms to increase the number of pixels on the face. Also there might be multiple body-worn cameras observing the same physical event at different vantage points. We propose to exploit this multi-view scenario to improve the face recognition algorithm by using multi-view face recognition techniques. In the multi-view scenario association procedures between the same face-chips extracted from different body-worn cameras need also to be developed.

Fig. 3. Multi-platforms video at a distance, face-chips are extracted from key-frames after super-resolution.

4. Use of body-worn video cameras for facial and activity recognition used by law enforcement and military personnel: A body-worn camera is a small device clipped to a security guards/agents, soldiers or a police officer’s uniform, or possibly to his head-gear. It can record video of the area in front of him and audio of the surrounding environment. The implementation of body-worn cameras in police force has specially gained increased attention in recent years. The footage from body-worn cameras can be used for biometric tasks performed instantly on the sensor (live) or off the sensor (remotely). For example, live face recognition (FR) algorithm can be used for day/night-time police traffic stop scenarios by using smart body-worn cameras. FR can be used by solider to identify insurgents as they come across individuals and crowds. Security agents can passively observe crowd gatherings and collect videos for real-time or off-line face identification. However, the video footage from body-worn cameras presents new issues and challenges that do not exist or have not been addressed in traditional video FR using a hand held or stationary surveillance devices. In the case of body-worn cameras, the video is very shaky due to rapid body movements of the officer/soldier who is continuously recording while the user performs his/her normal operations, thus the user cannot capture all the relevant activities. The video may sometimes be not focused on the scene at large but rather at nearby objects. The poor image quality of many body-worn camera videos effectively renders them useless for the purposes of identifying a person or action. Video stabilization, background clutter/subtraction, video summarization, selecting representative key-frames, face detection/identification, video anomaly detection and action recognition algorithms are serious problems for wearable camera biometrics that are addressed in this project. We envision a smart body-worn camera that will capture, detect and match face images locally on-device against a watch list of suspected personnel stored within the camera system. The device can also transmit key-frames from the video footage or detect face regions (face-chips) and transmit to a remote device (server) that can launch a search against much larger offline databases. It can also record the video events for post evaluation analysis for officer/soldier accountability or multi-platform surveillance using video analytics toolboxes. We believe by enhancing the functions of the body-worn cameras they can significantly improve the capability, situational awareness and operational capability of soldiers, police and security officers in different challenging environments. Having FR and video analytics functions added to smart body-worn camera devices will also be very useful for special covert operations and SWAT teams in addition to war fighters.

5. Multi-Sensor Classification Border Patrol Using Seismic, Acoustic and PIR sensors (UGS):

The objective of this project is to detect and classify different targets (e.g., humans, vehicles, and animals led by human), where unattended ground sensors (UGS) (i.e., seismic, acoustic and PIR sensors) are used to capture the characteristic target signatures. For example, in the movement of a human or an animal across the border, oscillatory motions of the body appendages provide the respective characteristic signatures. Efﬁcacy of UGS systems is often limited by high false alarm rates because the onboard data processing algorithms may not be able to correctly discriminate different types of targets (e.g., humans from animals). Power consumption is a critical consideration in UGS systems. Therefore, power-efﬁcient sensing modalities, low-power signal processing algorithms, and efﬁcient methods for exchanging information between the UGS nodes are needed. In the detection and classiﬁcation problem at hand, the targets usually include human, vehicles and animals. For example, discriminating human footstep signals from other targets and noise sources is a challenging task, because the signal-to-noise ratio (SNR) of footsteps decreases rapidly with the distance between the sensor and the pedestrian. Furthermore, the footstep signals may vary signiﬁcantly for different people and environments. Often the weak and noise-contaminated signatures of humans and light vehicles may not be clearly distinguishable from each other, in contrast to heavy vehicles that radiate loud signatures. In this project we demonstrate the effectiveness of using multiple sensors over using a single sensor for discriminating between the human and human-animal footsteps. We are proposing to use a nonlinear technique for multi-sensor classification, which relies on sparsely representing a test sample in terms of all the training samples in a feature space induced by a kernel function. Our approach takes into account correlations as well as complementary information between homogeneous/heterogeneous sensors simultaneously while considering joint sparsity within each sensor’s observations in the feature space. This approach can be seen as the generalized model of multitask and multivariate Lasso in the feature space, where data from all the representing the same physical events are jointly represented by a sparse linear combination of the training data. Experiments will be conducted on real data sets (using real dataset from ARL) and the results will be compared with the conventional discriminative classifiers to verify the effectiveness of the proposed methods in the application of automatic border patrol, where it is required to discriminate between human and animal footsteps.

6. Deep Transfer Learning for Automatic Target Classification: MWIR to LWIR:

Transfer learning tends to be a powerful tool that can mitigate the divergence across different domains (MWIR vs LWIR) through knowledge transfer. Recent research efforts on transfer learning have exploited deep neural network (NN) structures for discriminative feature representation to better tackle cross-domain disparity. Cross-domain disparity can be due to the difference between source and target distributions or different modalities such as going from Midwave IR to Longwave IR. However, few of these techniques are able to jointly learn deep features and train a classifier in a unified transfer learning framework. To this end, we propose a task-driven deep transfer learning framework for automatic target classification, where the deep feature and classifier are obtained simultaneously for optimal classification performance. Therefore, the proposed deep structure can generate more discriminative features by using the classifier performance as a guide. Furthermore, the classifier performance is increased since it is optimized on a more discriminative deep features. The developed supervised formulation is a task-driven scheme, which will provide better learned features for target classification task. By assigning pseudo labels to target data using semi-supervised algorithms, we can transfer knowledge from source (i.e., MWIR) to target (i.e., LWIR) through the deep structures. Experimental results on a real database of MWIR and LWIR targets demonstrate the superiority of the proposed algorithm by comparing with other ones.

7. Multimodal Super-Resolution to Improve Spatial/Spectral Information for Surveillance and Object Detection:

Hyperspectral cameras are known to provide better discriminative information for object recognition than visible spectrum ones. For example, spectral measurements in infrared allow sensing of object structures which are invariant to orientation and reflection. Hyperspectral imagery can also be used for material classification and identification. A major problem with hyperspectral cameras are insufficient spatial resolution for distance object/face recognition. This proposal will address the problem of increasing the spatial resolution of a hyperspectral imaging (HSI) sensor using a high-resolution panchromatic image of the same scene. The difference in spatial resolution between panchromatic and hyperspectral sensors is generally a result of the fundamental tradeoff between spatial resolution, spectral resolution, and radiometric sensitivity in the design of electro-optical sensor systems. Multimodal super-resolution enhancement refers to the joint processing of data from these two sensors in order to produce a HSI image that exhibits, ideally, the spectral characteristics of the observed HSI image at the spatial resolution and sampling of higher-resolution panchromatic image. This project will develop novel algorithms based on coupled dictionary learning framework to develop (a) a multimodal super-resolution algorithm using a panchromatic image that has a spectral range similar to that of the HSI image to improve the resolution for hyperspectral FR (this we refer to as homogenous multimodal super-resolution) , (b) a heterogeneous multi-modal super-resolution algorithm using for example a high resolution visible image and a low resolution multispectral/hyperspectral MWIR or LWIR. Since the spectral range for these sensors are very different innovative coupled dictionary-based learning algorithms will be developed to exploit the nonlinear relationship between these heterogeneous sensors. This project will enhance the ability of the HSI cameras for object/face detection and classification at a distance.

8. Using LWIR Hyperspectral for Camouflaged Snipper Detection:

Hyperspectral imagery provides both spatial and spectral information. In principle, different materials produce different spectral responses. In this project we propose to develop HSI algorithms to build an effective system for camouflaged sniper detection using LWIR HSI cameras. HSI spectral signatures make it possible to detect camouflaged sniper activities in infrared spectra. We will develop unique hyperspectral unmixing methods to detection camouflaged targets. We will extend our previous technique on detecting camouflaged military targets to snipper detection. We will study performance limitation of this proposed algorithm as well as analysis of physical limitations. Several activity scenarios of a camouflaged snipper will be studied. Effect of snipper camouflage type on detection performance will be investigated.

9. Multi-modal dictionary design:

10. Multimodal sensor fusion:

11. Hyperspectral unmixing:

RESEARCH INTERESTS:

Machine learning: Deep learning, dimensionality reduction, clustering, kernel methods, semi-supervised learning, transfer learning, active learning

computer vision: domain adaptation, dictionary learning, object recognition, activity recognition, shape representation, Multiview face recognition

Biometrics: biometrics recognition, biometrics template protection, cross-modality, multimodal biometrics Signal & image processing: sparse representation, compressive sampling, automatic target recognition, multispectral & hyperspectral imaging, acoustic and seismic signal processing​​

​

Biometrics: biometrics recognition, biometrics template protection, cross-modality, multimodal biometrics
Signal & image processing: sparse representation, compressive sampling, automatic target recognition, multispectral & hyperspectral imaging, acoustic and seismic signal processing