Indexes, scales, typologies
unit of measurement is key
operationalization: the actual wording of the question is important; needs to be framed so you're only measuring one thing in your RQ
so you're comparing independent things
Index
items of equal value so you can count on an interval scale: equidistant
Scale -- ordinal; weighted score
When you're designing a study need to use your time with your subjects as efficiently as you can. make every question count. So ifyou can test your questions and determine that two are really of the same thing, then you an discard one. Need to validate them in some way. In operationalization, this question is my measure of this construct. This question represents this larger construct.
Often pick up questions used in other studies/surveys. Look for measures that are already used and tested.
Handling Missing Data
If there are few cases with missing data, you may decide to exclude them from the construction of the index and analyses.
Treat missing data as one of the available responses.
Analyze the missing data to interpret their meaning.
Assign missing data the middle value, or the mean value
Assign values to the proportion of variables scored.
Missing data from a survey of 50 items to 500 people
400 of those ppl answered all 50 questions completely
The other 100 ppl skipped 1-10 questions or gave odd answers to some number of questions
Choice: throw out 20%, you've lost the representative sample and they're not normally distributed OR do something abt these gaps-->"inference for missing values"
you could run the numbers on all the pairs of ppl who skipped and impute the answer-->imputing missing values
in designing your questions in the first place, look for questions that are as clean as possible to begin with
EPSEM (Equal Probability of Selection Method)
Randomization Sampling
statistical concept
law of large numbers
truth value (validity)
reliability (replication)
more randomization, more external validity
not nec. internal validity-->want to start out with each group at the same place (look at Shadish p. 25*)
p. 248 Shadish list of bullets
Error variance
general spread of the variance in your sample
if you have a heterogenous population, you're going to have a highly distributed variance of error, and will need a larger sample
then you can back and control this with probability sampling, assigning ppl to groups in diff ways (gender, age, other attributes on which we can get the indicators in advance)
probability and multi-stage cluster sampling
Confidence Levels and Confidence Intervals
Confidence Level – the estimated probability that a population parameter lies within a given confidence interval.
Confidence Interval – the range of values within which a population parameter is estimated to lie.
Population
have to figure out from where you're going to get your subjects
Atul Butte: Dry lab research where he has the hypothesis and uses open data to run studies
labs will sell their excess capacity order online
blood tests done, after 48 hrs degraded
a whole subindustry that collects that blood before it degrades and runs other tests on it
runs it in two different labs
aggregated samples exploited by this whole new subindustry
https://med.stanford.edu/profiles/Atul_Butte