## Introduction  ### What is correspondence Analysis

• Correspondence Analysis is a related mapping perceptual technique with similar objectives where each object is evaluated in non metric terms on a series of attributes.
• It deals with categorical data.

## Correspondence Analysis compare to MDS

1. It is a compositional method rather than decomposition approach.
2. Measured on nominal Scale.
3. Simultaneously representing rows and columns for example, brands and attributes in joint space.

## Also known as :

• Dual Scaling.
• Method of reciprocal average.
• Optimal Scaling.
• Canonical analysis of contingency tables
• Categorical discriminant Analysis
• Homogeneity Analysis
• Quantification of qualitative data.

## How is it measured..

• It is based on frequency count between objects and variables.

• • Where nij are frequency counts.
• We have to develop contingency table or cross-tabulation.
• To understand if there is correspondence between the variables and frequency independently or jointly.

• ## ANOVA

• Main Effects - Factors
• Interaction – between the factors

• ## Compare to Factor Analysis

• While factor analysis captures linear relationships, CA captures non-linearity between the variables represented in contingency tables.

## Overview of flow ### Note:

1. Multiple Correspondence Analysis.
2. Bivariate CA

## Cross-tabulated data Detailing product sales by age category    ### Interpretation of the map..

1. Products B and Young adults are close to each other.
2. Product A and Middle age
3. Mature Individual and product C.

### Assessing the number of dimension..

1. The maximum number of dimensions that can be estimated is one less than the smaller number of rows and columns. For example six columns and eight rows, maximum number of dimension would be five.
2. Each dimension added to the solution increases the explained variance, but explained decreases with added dimensions.
3. Adding dimensions increases the complexity of the interpretation process. Perceptual map greater than three dimensions become increasing complex to analyze.

### Assumptions

• The data should be non-metric data. That is frequency / nominal or ordinal data.

### Key Question :

1. What are the similarities and differences among the 3-age groups with respect to category of products?
2. What are the similarities and difference among 8 category of products with respects to 3 vendors?
3. What is the relationship between Age and products?
4. Can these relationships be represented graphically in a joint low dimensional space.

### Over all fit of measures

• Total inertia explained by each of the components extracted.
• Total inertia is defined as the “weighted sum of square distances from the points to their respective centroids”
• Inertia explained is similar to explaining the percentage of variation by R-square in regression.
• The number of components to be extracted is decided based on cumulative percentage of intertia explained(e.g.,>=90%).
• Inertia accounted for by each of the vendors as well as the categories of defects.
• Contribution of vendors and categories of defects to the PCs extracted.
• Inertia of a point (each of the vendors or categories of defects) explained by PC (Column labelled correlation).

### Practical… • library(ca)
• data("author")
• print(HairEyeColor)
• haireye<-margin.table(HairEyeColor,1:2)
• haireye.ca<-ca(haireye)
• plot(haireye.ca)
• plot(haireye.ca,lines = TRUE)
• data_auth<-author
• fit_auth<-ca(data_auth)
• plot(fit_auth)
• plot(fit_auth,lines=TRUE)
• plot(fit_auth,arrows=c(TRUE,FALSE))

### Quiz :

1. What kind of data is used in CA?
a) Metric b) Non-Metric c) Continuous d) Interval
2. What is the difference between MDS and CA?
3. What are other different names of CA?
4. Create a process flow?
5. How is Correspondence Analysis different from Factorial Analysis.
6. How do we assesses the number of dimension.
7. What is inertia?
8. What is the difference between MDS and CA?
9. What are other different names of CA?
10. Create a process flow?