How do I look for data?
There are different ways to think about data to be used in research. First is the type of data needed to address a research question. Next a review of the study documentation such as questionnaires and codebooks will help determine which data will useful. Here is a visual of ways to think about the kinds of sources where you can find data.
You can look for information in the popular press discussing recent studies. You can find news articles in indexes of newspapers and by using Lexis Nexis. The UCLA Library has access to LexisNexis – you can also search by various newspaper indexes, such as the New York Times or use periodical indexes.
Sometimes we read about things in the news and we want something more extensive to use in research. For this, we can use the article databases to locate analyses and find research about studies discussed in the news. Example: Social Sciences Citation Index.
But maybe we want to find some more detail from the scholarly articles we read – sometimes information in tabular form – tables, charts, etc. can be useful. Government sites provide a great deal of statistical information – remember to use local, state, and national government sites. Example: the LA County Election Office you might use a statistical abstract for both the US and/or for individual states.
Sometimes we look at a chart or a table and we want to know about the data that created the table – so here we are talking about the raw data. The codebook or other materials that are the eye-readable guide to the data can be as useful as the data files.
Finally, sometimes you won’t have the data points you seek until you actually analyze your data with a software tool or statistical package.
Next, it is important to be aware of the units of analysis, types of variables, and the data structure or format. These are described below.
Units of Analysis
What is the unit of measurement in the data file? Examples might be:
Why is this important? Let's say you want to look at the income of each person in a household. A study which is organized with only household level of information will not be useful to you since it will not provide income amounts for individuals. Keep in mind that some data files have more than one type of unit of analysis or measurement.
Types of Variables
Alpha: data that is in text format. vs Numeric: data that are coded responses to questions. Continuous: represents a 'real' piece of information, such as age in actual years, or income in actual dollars. vs Discrete: data that are coded responses to questions. Summary: data that have been tabulated or averaged or in some way combined such that single units of data cannot be evaluated. vs Micro: a data file containing information about individuals or individual units; organized for analysis by single units.
Finally, you need to study the codebook to determine the physical format of the file as it is arranged electronically. You will need to describe this arrangement in your statistical analysis. In SPSS, this description will appear in the DATA LIST. In SAS, this will be part of your INPUT statement. The three most common structures are: Rectangular, Hierarchical, and Relational.
This is the most common form of data structure. In a survey, the answers given by each respondent are arranged in the same order. If the data were printed, it would resemble an array of persons and responses to questions, as illustrated.
These files are also described as having a tree-like structure. Data in this example are organized by household. Within households, it is possible to study individuals, and for each individual is possible to study sources of income. Data items are linked via their household identification.
The most common form of a relational file is one created using a database, such as Microsoft Access. In this example, there is a household file, a person file, and an income file. Each of these may be analyzed separately from the others. The files are linked by keys, or identification pointers. For example, there is an identifier for persons and income in the household file, and so forth.