Review of “Crime Data Mining: an Overview and Case Studies”

Topic > Review of “Crime Data Mining: an Overview and Case Studies”

IndexContentEntity Extraction for Police Narrative ReportsCriminal Network AnalysisMy ConclusionsThe paper “Crime Data Mining: An Overview and Case Studies” by Chen et al shares the findings of several small studies developed to explore the application of the field of data mining to fighting crime. It seems like an obvious area to assist law enforcement in carrying out their duties, especially with the advent of data mining which “has the promise of making the exploration of very large databases easy, convenient and practical”. The document notes growing concerns about national security after the 9/11 terrorist attacks, as well as “information overload” as contributing factors to the project and associated studies. Referring to the problems of terrorist attacks and information overload, Chen and colleagues note that data mining for “law enforcement and intelligence analysis promises to alleviate such problems.” Say no to plagiarism. Get a tailor-made essay on "Why Violent Video Games Shouldn't Be Banned"? Get an Original Essay ContentsChen and others note that “It is useful to review crime data mining in two dimensions: crime types and security concerns, and technical crime data mining approaches and methods.” The premise of this statement seems entirely applicable to those interested in crime data mining as various studies reveal that the type of criminal activity investigated can produce better results with different techniques. Chen et al describe multiple techniques that can be used in crime data mining: mining entity has been used to automatically identify people, addresses, vehicles, drugs and personal property from narrative police reports (Chau et al., 2002). Clustering techniques such as “concept space” have been used to automatically associate different objects (such as people, organizations, vehicles) in criminal records (Hauck et al., 2002). Deviation detection has been applied to fraud detection, network intrusion detection, and other crime analysis that involves tracking anomalous activity. Classification has been used to detect email spam and find authors sending unsolicited emails (de Vel et al., 2001). The string comparator has been used to detect misleading information in criminal records (Wang et al., 2002). Social network analysis was used to analyze criminal roles and associations between entities in a criminal network. The paper presents four case studies, including how these case studies were performed and their results. Entity Extraction for Narrative Police Reports This study proposes a neural network to extract entities in police reports based on three parts. The first part is “Noun Phrasing” which “extracts noun phrases as named entities from documents based on syntactic analysis.” The second part is “Finite State Machine and Lexical Search” which uses a finite state machine to check for matches in the reference phrase of words within the preceding and following sentences in the police report. The third part is a neural network that uses “feedforward/backpropagation” to predict the most likely type of entity (e.g. name, address, etc.). Chen et al. found that their technique “achieved encouraging accuracy and recall rates for names of people and drugs (74 – 85%), but did not perform as well for addresses and personal properties (47 – 60%) (Chau et al ., 2002).”The next approach covered was designed to detect identity datadeceptions provided by criminals to law enforcement. Using a database from the Tucson Police Department, Chen and colleagues' research team was able to construct a taxonomy of deceptive identity information "that consisted of deceptions related to names, addresses, dates of birth, and identity number". This taxonomy revealed that criminals typically altered their actual identity information with small variations in spelling and/or out-of-sequence digits. To identify this fraud the team developed an algorithm to compare matching fields across multiple records by “calculating the Euclidean distance of disagreement measures across all attribute fields”. Euclidean distance was then used with an activation level of a predetermined level to identify deceptive records. Using a sample from the Tucson Police Department, Chen and colleagues showed that their algorithm was 94% accurate in detecting misleading identity information. The third approach we took was to automatically detect the identities of authors posting messages online. The authors noted that the anonymous nature of online activities makes it very difficult to investigate cybercrime and therefore an assistance tool would be useful. Chen et al developed a framework composed of “three types of message characteristics, including style indicators, structural characteristics, and content-specific characteristics.” This framework was then tested using experimental datasets of emails and online messages. During testing, three algorithms including “decision trees, backpropagation neural networks and support vector machines” were implemented in an attempt to determine the authorship of online material. They were able to predict perpetrators with varying degrees of accuracy from 70 to 97 percent depending on the type of online messages. Chen and colleagues found that the Support Vector Machine algorithm performed best in their analysis. Analysis of the criminal network The fourth topic covered was the analysis of the criminal network. The analysis is based on social network analysis with the premise that organized criminal organizations form networks to carry out their illegal activities. By analyzing these networks you may be able to determine structural relationships and/or hierarchies. To decipher the underlying structural organization of the network, Chen and colleagues used a four-part method known as Social Network Analysis. The first part was Network Extraction which used existing records from the Tucson Police Department to train networks "because criminals who committed crimes together were usually related." [1] The next part was subgroup detection, designed to detect hierarchical subgroups based on the strength of the identified relationships. The third part was Interaction Pattern Discovery, used to “reveal patterns of interaction between groups.” [1] The final part was the Central Member Identification which identified the central members of the criminal organization by determining measures of previously determined relationships. The authors also show a figure depicting a network of 60 criminals which mostly appears as one large network, however the second figure shows a structure derived from the criminal network through the above processes which shows a strong chain structure. Chen et al provided a brief conclusion in which they highlight their belief that crime data mining has a promising future. They also note that there are many other data mining applications that could be explored further. Please note: this is just an example. Get a customized document from our writers now.