DATA MINING AND DATA WAREHOUSE
UNIT-1:
Data Mining:
Data mining is defined as the procedure of extracting information from large sets of data i.e. there is a large of data available in the industry. This data is of no use until it is converted into useful information. It is necessary to analyze this large amount of data and extract useful information.
Sometimes referred as
Knowledge Extraction
Knowledge Mining
Pattern Anaysis
Data Archeology
Areas of Data mining:
Financial Data Analysis:
The financial data in banking and financial industry is generally reliable and of high quality which facilities systematic data analysis and data mining. Some of the typical cases are as follows:
Loan payment prediction and customer credit policy analysis.
Classification and clustering of customers for targeted marketing
Detection of money laundering and other financial crimes
Retail Industry:
Data mining in retail industry helps in identifying customer buying items and trends that lead to improved quality of customer services and good customer retention and satisfaction.
Telecommunication Industry:
Data mining in telecommunication industry helps in identifying the telecommunication pattern, catch fraudulent activities, make better se of resources, and improve quality of services.
Biological Data Analysis:
In recent times, we have seen a tremendous growth in the field of biology such as genomics, proteomics, functional Genomics and biomedical researches. Biological data mining is a very important part of Bioinformatics.
CLICK HERE FOR DATA MINING AND DW FOR BCA - UNIT-1 NOTES
UNIT-2
What is Data Warehouse?
Data warehousing provides architectures and tools for business executives to systematically organize, understand, and use their data to make strategic decisions.
The term "Data Warehouse" was first coined by William H. Inmon in 1990. According to Inmon, a data warehouse is a subject oriented, integrated, time-variant, and non-volatile collection of data. This data helps analysts to take informed decisions in an organization.
Characteristics of Data wherehouse:
Subject Oriented − A data warehouse is subject oriented because it provides information around a subject rather than the organization's ongoing operations. These subjects can be product, customers, suppliers, sales, revenue, etc. A data warehouse does not focus on the ongoing operations, rather it focuses on modelling and analysis of data for decision making.
Integrated − A data warehouse is constructed by integrating data from heterogeneous sources such as relational databases, flat files, etc. This integration enhances the effective analysis of data.
Time Variant − The data collected in a data warehouse is identified with a particular time period. The data in a data warehouse provides information from the historical point of view. (e.g., the past 5–10 years data)
Non-volatile − Non-volatile means the previous data is not erased when new data is added to it. A data warehouse is kept separate from the operational database and therefore frequent changes in operational database is not reflected in the data warehouse
CLICK HERE DATA MINING AND DW FOR BCA - UNIT-2 NOTES
UNIT-3
Mining Frequent Patterns and Associations
Frequent Itemset Mining :
Frequent Itemset Mining (FIM) is one of the most well known techniques to extract knowledge from data. FIM is the technique used mostly in field of data mining like finance, health care system.
Example 1: Most important use of FIM is customer segmentation in marketing, shopping cart analyzes, management relationship, web usage mining, and player tracking and so on.
Example 2: FIM in Market Basket Analysis:
This process analyzes customer buying habits by finding associations between the different items that customers place in their “shopping baskets” as shown in the adjacent figure.
The discovery of such associations can help retailers develop marketing strategies by gaining insight into which items are frequently purchased together by customers. For instance, if customers are buying milk, how likely are they to also buy bread (and what kind of bread) on the same trip to the supermarket.
CLICK HERE DATA MINING AND DW FOR BCA - UNIT-3 NOTES
UNIT-4
Classification and Prediction
A classification is a data mining technique that assigns categories to a collection of data in order to help or support in take accurate analysis. A classification identifies data into a predefined groups or classes. Before examining the data, the classes are determined because it is supervised learning.
Example: A Bank loan officer classifies the application are analyzed and determined whether to
make a bank loan and identifying the credit risks. Classification consists to predicting a certain outcome based on a given input. Prediction is a method which is used to estimate the future data based on the past and current data.
In order to predict the outcomes, the algorithm processes a training set containing a set of attributes and the respective outcomes, usually called goal or prediction attribute. The algorithm tries to discover relationships between the attributes that would make it possible to predict the outcomes.
Classification and prediction have numerous applications, including fraud detection, target marketing, performance prediction, manufacturing, and medical diagnosis.
CLICK HERE DATA MINING AND DW FOR BCA - UNIT-4 NOTES
UNIT-5
Cluster Analysis
Making a group of related things or objects that occur closely together is the process of clustering.
"The process of organizing objects into groups whose members are similar in some way" could be a definition of clustering. Therefore, a cluster is a group of objects that are "similar" to one another and "dissimilar" to those found in other clusters. The most significant unsupervised learning challenge is clustering.
CLICK HERE DATA MINING AND DW FOR BCA - UNIT-5 NOTES
Comments