Topic > Predicting ICU Patient Specific Mortality Based on Rf Fr Classifier

IndexIntroductionContextIntroductionData Mining MethodsICU DatabasePreprocessingClassificationResultsDiscussion and Future WorkPredicting intensive care unit (ICU) patient mortality facilitates hospital benchmarking and provides the opportunity to provide healthcare providers with helpful summaries of patients' bedside health. Developing new models for predicting mortality is a popular task in machine learning, where researchers typically try to maximize measures. We present a modified binary classification method designed to address the imbalance problem common in clinical datasets. Our methods exploit class imbalance to achieve a unique feature transformation such that the transformed features are well separated. We derive new combinations that further improve the classification accuracy of our methods. Say no to plagiarism. Get a tailor-made essay on "Why Violent Video Games Shouldn't Be Banned"? Get an original essay We demonstrate the effectiveness of our methods on the MIMIC dataset, used in the Computing in Cardiology Challenge 2012. An advantage of our methods is that they are based on partial or full optimization of traditional learning algorithms, which still offer better performance than any advanced nonlinear learning algorithm such as multi-kernel SVM and deep neural networks. Introduction Cardiovascular disease (CVD) is the number one killer in the world. In low-income countries people struggle with non-existent or poor healthcare. This is the main reason why deaths caused cardiovascular disease in these countries. The electrocardiogram was used to evaluate the state of the patient's heart. The quality of the measurement is a fundamental requirement for the applicability of the record. Timely diagnosis of heart disease increases the chances of recovery. The lack of specialists in many countries increases the need for simple and efficient measuring devices, capable of sending the measured data to the specialist. We have developed a scoring system to inform the user about the quality of the measured ECG. This method can be used to quickly detect useless records and reduce the number of worse quality records sent to the specialist. Our approach reduces the requirements for user experience with electrocardiogram evaluation. We focused on inapplicable signals because we believe that remeasurement has lower costs than providing useless logging specialists. The scoring system algorithm is divided into three stages: Separate signal into bins. Applying four rules to containers. Calculation of the mortality score. Background Intensive care units (ICUs) provide support to the most seriously ill patients in a hospital, offering radical life-saving treatments. Patients are monitored closely within the intensive care unit to assist in early diagnosis and correction of worsening before it becomes fatal; an approach has been demonstrated to improve outcomes. Quantifying patient health and predicting future outcomes is an important area of ​​critical care research. One of the most immediately relevant outcomes for intensive care units is patient mortality, which has led many studies toward the development of mortality prediction models. Typically, researchers seek to improve previously published performance measures such as sensitivity and specificity, but other goals may includeimproved model interpretability and the extraction of new features. Recent advances in both machine learning and hospital networking have facilitated better prediction models using more detailed granular data. Interpreting studies reporting advances in mortality prediction performance, however, is often challenging, because homogeneous comparison is prevented by the high degree of heterogeneity between studies. For example, approaches may differ in areas such as exclusion criteria, data cleaning, creation of training and test sets, and so on, making it unclear where performance improvements have been achieved. In many areas of machine learning, datasets like ImageNet have facilitated benchmarking and comparison between studies. The key to these datasets is that they are publicly available to researchers, allowing code and data to be shared to create reproducible studies. Barriers to data sharing in healthcare have limited the accessibility of highly granular clinical data and have largely prevented the publication of reproducible, but freely available datasets such as Medical Information's end-to-end reproducible studies Mart for Intensive Care (MIMIC-III) are reachable]. The use of mortality prediction models to evaluate ICUs as a whole has seen great success, both for identifying useful policies and for comparing patient populations. In order to focus contributions to the state of the art in mortality prediction, however, it should be clear where results are being achieved and where further improvements could be achieved. In this study, we review publications that have reported the performance of mortality prediction models based on the Medical Information Mart for Intensive Care (MIMIC) database and attempt to reproduce their studies. We then compare the performance reported in studies with gradient boosting and logistic regression models using features extracted from MIMIC. The objective of this exercise is twofold: the primary hypothesis is that the textual description of the patient selection criteria is insufficient to reproduce the studies; the secondary hypothesis is that data mining using domain knowledge remains an often overlooked but useful tool for improving model performance. Data Mining Methods Association Rule: AR is another important branch in DM techniques. Instead of directly searching for a satisfactory classification result, the relationship between different attributes is an important goal. As for ICU treatment as a manufacturing process, it will be more predictable if each process is delicate. Decision Tree: DT is a typical supervised learning approach with decisions determined in multiple stages. The tree structure starts from a condition or model that was usually the most informative and based on the branches selected from the conditions, building subtrees iteratively until the object class is determined at a given bread node. By splitting the node of the tree, the probability of certain classes was improved. Fuzzy rule: an FR is defined as a conditional statement of the form: IFx is A Then y is B, where x and y are linguistic variables; A and B are the linguistic values ​​determined by fuzzy sets over the discourse universe X and Y respectively. Fuzzy logic is used in clinical support systems because it is a powerful approach for approximate reasoning. Adaboost: Adaboost, which is short for Adaptive Boosting, performs classification by initially generating a weak group classifier and determining the results withvoting strategies. When building the weak classifier, sample weights are adjusted after each iteration, and increasing the weights leads to further learning for those incorrectly classified samples. Random Forest: RF is similar to adaboos except for two differences. Firstly it is the set of DTs and secondly the size of each sample remains identical to the number of samples. However, accompanied by the increase in accuracy, there is some loss in model interpretation. ICU Database The information consists of records of 12,000 ICU patients, who remained in the ICU for at least 48 hours. The records were divided into three sets: A, B and C, each consisting of 4000 records. Set A was used to develop the predictor while sets B and C were used for validation purposes. Up to 41 variables were recorded once, more than once, or not at all, during the first 48 hours after ICU admission. These variables were divided into three groups: general descriptors, outcome descriptors and time series. General descriptors were mainly defined as age (AGE), sex (GEN), height (HEI), ICU type (ICU), and weight (WEI). These descriptors were collected upon patient admission to intensive care and appear at the beginning of each record. General descriptors were mainly defined as age (AGE), sex (GEN), height (HEI), ICU type (ICU), and weight (WEI). These descriptors were collected upon admission of the patient to intensive care and appear at the beginning of each sheet. Outcome descriptors were defined as SAPS score, SOFA score, length of hospital stay (LOS), number of days between admission and death (SUR), and in-hospital death. The mean (standard deviation) for age, uncorrected height, and uncorrected initial weight is 64.5 years, 169.5 centimeters, and 81.2 kg; 43% were female and 56.1% were male. The largest number of patients were admitted to medical intensive care (35.8%), followed by surgical intensive care (28.4%), cardiac surgical recovery (21.1%) and coronary (21.1%) units. . These descriptors were only available for training set A. Scoring criteria. Because of its unambiguous definition and its use in previous similar studies, we used in-hospital death as the outcome variable to predict in the challenge. We defined the scoring criteria as: Algorithms necessary to classify each case as a survivor (at least until discharge from hospital) or as a non-survivor. The final event score obtained by each algorithm depended on the counts of true positives (TP), false negatives (FN), and false positives (FP) when tested on set C. We defined sensitivity and positive predictiveness as usual: the score defined as the smallest of these measures: this criterion was chosen as a reasonable compromise between discrimination accuracy and prognostic value. The data was first converted from time-stamped measurements into usable features in a supervised classification environment. The overall development process involved: preprocessing, classification, feature extraction, decision, training and validation. Preprocessing This preprocessing method focused on removing outliers using thresholds and domain knowledge. When this preprocessing method detected an outlier, its value was set to missing and subsequently replaced by the assigned mean values. Preprocessing the domain knowledge first involved correcting human transcription errors (such as recording the temperature in degrees Fahrenheit instead of-2).