Feature Selection in Machine Learning

Topic > Feature Selection in Machine Learning

Classification is one of the essential tasks in machine learning whose aim is to classify each instance in the dataset into different classes based on its characteristics. It is often difficult to determine which features are useful without prior knowledge. As a result, a large number of features are usually introduced into the dataset that may be irrelevant or redundant. Feature selection is the process of selecting a small subset of relevant features from a large original set of features. This small subset of features can have less redundant or relevant features that make the machine learning process simple with reduced learning process time and improved performance. Other benefits of feature selection are improved prediction performance, scalability, understandability, and generalization ability of the classifier. It also reduces computational and storage complexity, provides faster and cheaper modeling and knowledge discovery. Furthermore, it offers new insights into determining the most relevant or informative features. The main challenge that occurs in feature selection is the large search space where for n datasets the solutions are 2^n. Feature selection consists of complex steps that are usually expensive. And even the optimal model parameters of the entire feature set may need to be redefined a few times to obtain the optimal model parameters for the selected feature subsets. Feature selection also involves two main objectives, which are to maximize classification accuracy and minimize the number of features, which are both conflicting objectives. Therefore, feature selection is considered a multi-objective problem with some trade-off solutions lying between these two objectives. Some examples of feature selection techniques are information gain, chi-square, lasso, and Fisher score. Feature selection can be used to find key genes (i.e. biomarkers) from a large number of candidate genes in biological and biomedical problems, to discover key indicators or features to describe the dynamic business environment, to select key terms such as words or phrases in text mining and choose or construct important visual contents such as pixels, color, texture and shape in image analysis. Compared to other dimensionality reduction techniques such as those based for example on projection, principal component analysis (PCA) or compression, feature selection techniques do not modify the original representation of the variables, but simply select a subset of them . Therefore, they maintain the original semantics of the variables offering interpretability. Say no to plagiarism. Get a tailor-made essay on "Why Violent Video Games Shouldn't Be Banned"? Get an Original Essay Feature selection used on gene expression data that has a small sample size is called genetic selection. Genetic selection can be used to find key genes from biological and biochemical problems. This type of feature selection is important for disease detection and discovery such as tumor detection and tumor discovery, which results in better diagnoses and treatments. Gene expression data can be expressed as fully labeled, unlabeled, or partially labeled. This leads to the development of supervised, non-supervised genetic selection.