Monday, June 3, 2019

Application Survey on Data Mining and Data Warehousing

natural covering Survey on information Mining and Data W areho employAishwarya.RSurvey Report on Bank-Loan Risk Prediction IntroductionData Mining has been the most explored topic for the past decade and has given rise to several(prenominal) new enhancements and techniques in several industries. One such mind provoking arena of high interest is Credit Risk analysis or simply the Bank- lend risk prediction. It has been a imperativeness need for several deposits these days to employ a Credit Risk Analysis simply to make sure that the money they invest to customers as a give or any form is given to a legitimate customer who is capable of repaying and to avoid any other fraudulent scenarios. Several techniques in data archeological site commence been explored to analyze the customers creditworthiness and a few will be analyzed and emphasized in the further sections.Discussion on Selected paperIn this Section, I have listed the journals, IEEE papers referenced for my study and an alysis on Bank-loan risk prediction and categorized various factors for each in slacken 1.Table 1. Sources employ that focused on Bank-loan risk prediction using different data mining techniquesReferencesObjective Data Mining Techniques EmployedAuthors figure of Citations1SAS Enterprise Miner 5.3, logistic Regression Model and Decision steer employed in credit scoring prototypes for assessing credit risk.Bee Wah Yap, Seng Huat Ong, Nor Huselina Mohamed Husain.772Decision Tree model for credit assessments in a Bank.I Gusti Ngurah Narindra Mandala, Catharina Badra Nawangpalupia, FransiscusRian Praktiktoa.153Predictive Modelling technique and Nave Bayes algorithmic rule for loan risk assessment. overcharge Gerritsen344Multilayer Feed Forward Neural Network, Support Vector Machines, Genetic Programming, Logistic Regression, group Method of Data Handling, Probabilistic Neural Network techniques for monetary Fraud assessment.P.Ravisankar, V.Ravi, G. Raghava Rao, I.Bose147Expert Sy stems with Applications Using Data Mining to improve assessment of credit worthiness via credit scoring modelsProblem Description Bee Wah Yap et al.1 tack together a recreational ball auberge has been facing difficulties in identifying the defaulters who do non pay their monthly subscription fee causing a lot of chaos for the club to manage the funds effectively and divide the fund for any further activities or events in the club. The management decided to evaluate the credit worthiness of the club members by using the past members data as a data set and analyzed using three different data mining techniques in dictate to conclude the fittest of all1. stem technology Bee Wah Yap et al.1 employed Credit scorecard model, logistic regression model and decision tree model using SAS Enterprise Miner, a diverse tool to employ several data mining techniques in order to improvise and identify the potential defaulters in the club.Solution EvaluationBee Wah Yap et al.1 in the credit scorec ard model, identified the various factors determining a defaulter effect on their age, the number of dependents, the number of cars, territorial dominion of address and most importantly the classification of defaulters and non-defaulters based on the payment status. They then arrive ated the Information Value as the summation of the probability of good attribute(applicable value from the old dataset taken for prediction) minus the probability of bad attribute(values from the old dataset that have no added value to be included in the prediction) and identified that values greater than 0.02 as admissible values of inclusion on the score card.They then identified the Stepwise selection method suitable of all the other Logistic Regression model and open up a colossal range of information and conclusions on the type of defaulters.Finally, they applied the Decision tree algorithm in order to classify an if-then rule for the large dataset into smaller segments and obtained the profil e of defaulters.Based on the results he obtained from the above three techniques they had clearly identified that Decision Tree is by far a break-dance approach for prediction although all three have no big difference and that Credit scoring model with extinct up to(predicate) and proper data sets and old data could never perform well in prediction.Further EnhancementsThe study has employed several techniques in order to rationalize a better model for prediction as a substitute for the Credit scoring model but has overlooked the fact that the data sets used throughout are from past customers which may or may not be legitimate way of prediction and definitely not a tenable way to conclude Decision Tree better over Credit scoring as neither of the arguments is valid and may vary when using a large amount of real-time data from the present to predict the future defaulters.Assessing Credit Risk an Application of Data Mining in a boorish BankProblem DescriptionI Gusti Ngurah Narindra Mandala et al.2 felt that for rural banks to stay healthier, a certain benchmark has to be set on many factors out of which non-performing loan (NPL) factor played an important role. They identified that lower the NPL rate better the health of the rural bank. In order to employ this, they proposed that banks should approve wholly the right applicants and thereby increase the profit, credibility, and serve the improvements of their local community where such banks are most used. They were affirmative that banks with less than 5% of NPL are in better condition when compared to other with a greater value of NPL.Solution TechnologyI Gusti Ngurah Narindra Mandala et al.2 chose Decision Tree technique to be employed in a rural bank in Bali and scrutinized the various factors that are currently kept in consideration for lending loans to a customer.Solution EvaluationI Gusti Ngurah Narindra Mandala et al.2 found that the current NPL value of the rural bank of Bali is 11.99% very much high er than the expected value for a good performing bank. They made use of 84% of data from a sample data set of 1028 records for evaluation and determined approximately 13 parameters of consideration for evaluating the NPL customers. They developed a decision tree based on the alive parameters but reordered the determining factor as the collateral value and obtained an NPL of 3%, which by far is the most efficient a bank could perform.Further EnhancementsAlthough the above assessment and conclusion of a healthy bank seem appealing they could have employed a further emphasis on other factors that also contribute to a healthy bank / NPL and predicted the credibility further using various other Predictive and Descriptive modeling techniques which have better analysis and solution for the given scenario than what was obtained.Assessing Loan Risks A Data Mining Case StudyProblem DescriptionRob Gerritsen 3 identified that if customers who could not pay their loans bank can be predicted bef ore lending using data mining techniques then the information would be worthwhile. He found that USDAs agrestic Housing Service has been lending money to people in the rural areas and USDA realized that the huge number of applicants who are being approved of the loan may or may not be capable of repaying the amount. Hence USDA decided to perform a data mining technique in order to gather the information and predict the vulnerabilities of the customers3.Solution TechnologyRob Gerritsen 3 decided to use Predictive Modeling Techniques along with the Nave Bayes algorithm to come up with a solution for the above problem.Solution EvaluationRob Gerritsen 3 was given a sample data of 12,000 based on the existing mortgages of single families and had to train the given data set using the model and then predict the future scenarios. So, he first classified the dataset and applied the Nave Bayes binning algorithm in order to divide the customer based on loan amounts that are to be paid by eac h.Initially, he found this ineffective as a huge amount of people fell into a single bin as the bin range values where unremitting/uniform in distribution and hence difficult to identify precisely the original defaulters.He further organized the binning range distribution and made a decision tree from the results obtained to conclude the major factors of defaulters.Further EnhancementsRob Gerritsen 3 himself has identified that the data set taken was too less to conclude the results and further, a wide range of dataset has to be taken along with further factors of consideration for USDA to obtain the verified solution for their problem.Decision Support System Detection of pecuniary statement fraud and feature selection using data mining techniquesProblem DescriptionP. Ravisankar et al.4 conducted a study on 202 Chinese companies using a variety of data mining techniques simply to conclude if the financial statements, income statements, cash flow, and various other factors if assim ilated could give an better output from the companies and also decide if the loan has to be given to customers based on the results.Solution Technology P. Ravisankar et al.4 has employed a variety of data mining techniques namely Support Vector Machines (SVM), Group Method of Data Handling (GMDH), Genetic Programming (GP), Logistic Regression (LR), Multilayer Feed Forward Neural Network (MLFF) and Probabilistic Neural Network (PNN). He made use of a number of techniques for the same datasets in order to identify the best solution for the above problem.Solution Evaluation P. Ravisankar et al.4 identified that among the 202 Chinese companies taken as a data set 101 were Fraudulent and the remaining were Non-Fraudulent.He then applied the Genetic Algorithm to find the fitness function, SVM to obtain the permissible support vectors, GMDH to classify and obtain a Feed Forward network model(Polynomial Model), PNN and with or without Feature selection in order to obtain the features of f raudulent companies.He has clearly observed that among the several techniques used the main factors that have to be considered is the amount of dataset that is to be used should concede with the capability of the technique and with less time consumption for training and obtaining results from the dataset.Further Enhancements I would abide with P. Ravisankar et al.4 conclusion of classifying with an if-then rule on the dataset and to apply other hybrid data mining techniques inorder to further enhance the solutions.REFERENCES Yap, B. W., Ong, S. H., Husain, N. H. M. (2011). Using data mining to improve assessment of credit worthiness via credit scoring models. Expert Systems with Applications, 38, 13274-13283.GustiNgurah Narindra Mandalaa, Catharina Badra Nawangpalupia*, FransiscusRian Praktiktoa Assessing Credit Risk an Application of Data Mining in a Rural Bank / Procedia Economics and Finance 4 ( 2012 ) 406 412.R. Gerritsen, Assessing loan risks a data mining case study, IEEE IT maestro (1999) 16-21.P. Ravisankar, V. Ravi, G. Rao, I. Bose, Detection of financial statement fraud and feature selection using data mining techniques, Decision Support Systems 50 (2) (2011) 491-500.Question and AnswersWhy DM and DW technologies are becoming important tools for todays business world?Todays business world is a competitive environment where right decisions needs to be taken at right time by knowing the answers for what has happened and by predicting what will happen in the future.Data warehousing helps us to identify answers for questions like what, which and how through aggregations.Data mining known as KDD helps us to predict what can happen in future. This is done by discovering and analyzing the hidden patterns.Both DM and DW results are impact from large set of data records from either same or different data sources.What are the main differences between data mining, traditional statistics data analysis, and information retrieval?Data Mining is a process of ob taining a derived / discovering new information based on the existing information by observing the data, identifying the patterns and obtaining meaning(prenominal) analytics that can be used in business.A traditional statistics data analysis is method of testing a proposed phenomenon or hypothesis to validate and proffer a statistically significant data for accepting the outcome.Information Retrieval in simple terms is the process of collecting/retrieving required data from an existing information available in any form.How is data warehouse model different from a relational database model? Why DW technology is more ripe(p) in supporting business management?Relational Database ModelUsed for Online Transaction Processing (OLTP)Data stored are generally a fact in a single operational databaseTables are normalizedSQL are used to queryData Warehouse ModelUsed for Online Analytical Processing (OLAP)Data stored in DW are generally consolidated data(aggregation) from multiple databases o r sourcesTables are de-normalizedOLAP tools are used to queryThe key difference between DW model and relational database model is that, DW is a layer on top of other databases whereas relations database is a database itself.DW technology is more advanced in supporting business management because it provides pronto answer for question like WHAT, WHICH and HOW which helps the management to act accordingly on making decisions. i.e. they are very faster in generating reports for answering the management queries.What are the main difference between using OLAP on DW and using SQL on traditional database for supporting business decision making?The main difference is that tortuous questions which involves multiple aggregations can be answered in ad-hoc environments (i.e. data from different sources) easily in faster way using OLAP on DW

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.