IS291B2 Week 1 Reading Notes
Part 1, An introduction to inquiry
1: Human inquiry and science
2: Paradigms, theory, and social research
Ch 16: Statistical analyses
Descriptive statistics is a medium for describing data in manageable forms.
Inferential statistics, on the other hand, assists researchers in drawing conclusions from their observations.
Table 16-1 (p. 468 in 12th)
Partial raw-data matrix
summarizing of univariate data: averages (mode, median, mean), measures of dispersion (range, standard deviation), associations among variables (
deciding on the appropriate measure depends on the nature of the two variables you’re trying to describe.
Measures of association based on the proportionate reduction of error (PRE) model
to compute the PRE by knowing the relationship between the two variables, the greater the relationship, the greater the reduction of error (p. 469 in 12th)
the basic PRE model is modified to account for different measurement levels
He has chosen three measures: nominal, ordinal, and interval
Nominal variables: If the two variables consist of nominal data (for example, gender, religious affiliation, race), lambda (λ) would be one appropriate measure. Lambda is based on your ability to guess values on one of the variables. Lambda, then, represents the reduction in errors as a proportion of the errors that would have been made on the basis of the overall distribution. In other words, λ is the proportion of errors you avoid by knowing something about the other variable. Instead of x errors based on the distribution, you only make y errors because you have additional information.
use this equation: λ = (x-y)/x, where x is the # of errors you would make going by the distribution alone, and y is the number of errors you would make using the additional information.
If λ = 0, then the variables are independent. If λ = 1, then there is a perfect statistical association.
If the variables being related are ordinal (for example, social class, religiosity, alienation), gamma (γ) is one appropriate measure of association. Whereas lambda is based on guessing exact values, gamma is based on guessing the ordinal arrangement of values. γ is the proportion of paired comparisons that fits the described pattern. In other words, γ is the proportion of all of the cases in which your proposed correlation (positive, negative, perfect, or independent) is true.
γ = (same-opposite)/(same+opposite)
γ ranges from -1 to 1, representing the magnitude and direction of the association.
The series of numbers connecting a pair of characteristics is an ordinal sequence. Good 1 2 3 4 5 6 7 Bad
If you want to find out the extent to which several measures are related to one another, choose γ and use a correlation matrix like on page 472 of 12th.
Interval or ratio variables
If interval or ratio variables (for example, age, income, grade point average, and so forth) are being associated, one appropriate measure of association is Pearson’s product-moment correlation (r). r reflects how closely you can guess the value of one variable through your knowledge of the value of another. For interval or ratio data, you would minimize your errors by always guessing the mean value of the variable. Although this practice produces few if any perfect guesses, the extent of your errors will be minimized. In the case of r, errors are measured in terms of the sum of the squared differences between the actual value and the mean. This sum is called the total variation.
The general formula for describing the association between two variables is Y = f(X). This formula is read “Y is a function of X,” meaning that values of Y can be explained in terms of variations in the values of X. Stated more strongly, we might say that X causes Y, so the value of X determines the value of Y. Regression analysis is a method of determining the specific function relating Y to X. Regression analysis is a technique for establishing the regression equation representing the geometric line that comes closest to the distribution of points on a graph. The regression equation provides a mathematical description of the relationship between the variables, and it allows us to infer values of Y when we have values of X.
Linear regression analysis: perfect linear association between two variables
has descriptive and inferential value: illustrates the relationship and gives a basis for prediction
The sum of squared differences between actual and estimated values of Y is called the unexplained variation because it represents errors that still exist even when estimates are based on known values of X. The explained variation is the difference between the total variation and the unexplained variation. Dividing the explained variation by the total variation produces a measure of the proportionate reduction of error.
Multiple regression analysis
Use when a given dependent variable is affected simultaneously by several independent variables. Helps you find the relative influence multiple factors have on a variable as well as any residual variance not accounted for by the variables analyzed.
You end up with a multiple-correlation coefficient, R, as an indicator of the extent to which the analyzed variables predict the value of the dependent variable, such that R represents the proportion of the variance explained by the analyzed independent variables.
The equation summarizing the relationship between variables is computed on the basis of the test variables remaining constant. As in the case of the elaboration model, the result may then be compared with the uncontrolled relationship between the two variables to clarify further the overall relationship.
In the real world there is no reason to assume that the relationship among every set of variables will be linear. In some cases, then, curvilinear regression analysis can provide a better understanding of empirical relationships than any linear model can. Basically, Babbie says we using curvilinear regression analysis may be more accurate empirically, but then it is less helpful inferentially.
regression lines are good for interpolation (estimating cases between those observed), but less trustworthy for extrapolation (estimating cases beyond the range of observation).
statistical measures used for making inferences from findings based on sample observations