3. SHARED SPACE LEARNING VIA JOINT SMOOTH NMF (LJNMF) Our joint space learning method is formulated under the NMF framework. This section will introduce our adaptation of NMF for extracting shared latent spaces. We call this approach as freely joint nonnegative matrix factorization, or LJNMF. The purpose of using the NMF framework for image annotation is to explain the underlying latent factors existing in a collection of images that created different objects in the images by representing the occurrence of these factors for each image. In multimodal problems, different modalities come from the same collection and so we expect the truths that create latent factors in those modalities to be almost similar. But each mode has different characteristics, so these factors are done differently. The similarity in the representation of the factors is interpreted as the similarity in the coefficient matrix, and the difference in the way the factors are constituted implies different basic matrices. So forcing the cost function to find exactly the same coefficient matrix for both modes is not reasonable and also has restrictive constraints that do not allow the modes to find the best factors and therefore increase the approximation error. Then we will factorize both modes, such that the factor matrices were different for them. But in reality they are poorly articulated and the similarity between the coefficient matrices is encouraged by reducing the distance between their factors. The objective function for freely joint nonnegative matrix factorization or LJNMF in general mode can be written as below.(9)where dist(H1, H2) is a metric to measure the distance between two coefficients.3.1. Notation In our problem there are two data resources as two modes. One is the visual information embedded in images and… in the center of the paper… on a Corel 5K dataset.6. CONCLUSION The problem addressed in this paper is to build a multimodal automatic image annotation system that combines two data modalities: visual features extracted from images and text terms collected from attached tags. It is done by extracting latent factors that explain the patterns that create the content of images, in a unified and freely articulated space. Both modes are factorized simultaneously while a relationship is considered to exist between them. We relaxed the constraint that makes the coefficient matrices of both modes exactly the same and allowed the representation of latent factors that have some differences. This was implemented by minimizing the nonlinear distance between two coefficient matrices. The proposed LJNMF algorithm could achieve comparable performance to state-of-the-art works while having a smaller feature vector dimension.
tags