Anda di halaman 1dari 10

7 Steps to Prepare Data for Analysis by Richard Pink March 2, 2010 5 We researchers spend a lot of time interviewing our

clients to determine their needs. Then we go about carefully creating a plan to collect the data that will be most useful. Having done that, the appropriate instrument is carefully crafted that will generate data that can ultimately be transformed into knowledge. All this up-front work necessitates and lot of time and effort. And well it should! But sooner or later we will have collected data and need to start the grunt work of data preparation.

So what is involved in data preparation? There are several simple, but sometimes overlooked steps, required to properly prepare data. They are:

Questionnaire checking: Questionnaire checking involves eliminating unacceptable questionnaires. These questionnaires may be incomplete, instructions not followed, little variance, missing pages, past cutoff date or respondent not qualified.

Editing: Editing looks to correct illegible, incomplete, inconsistent and ambiguous answers.

Coding: Coding typically assigns alpha or numeric codes to answers that do not already have them so that statistical techniques can be applied.

Transcribing: Transcribing data involves transferring data so as to make it accessible to people or applications for further processing.

Cleaning: Cleaning reviews data for consistencies. Inconsistencies may arise from faulty logic, out of range or extreme values.

Statistical adjustments: Statistical adjustments applies to data that requires

weighting and scale transformations.

Analysis strategy selection: Finally, selection of a data analysis strategy is based on earlier work in designing the research project but is finalized after consideration of the characteristics of the data that has been gathered.

Not all of these steps occur in every market research study. But as situations dictate, none of these steps should be overlooked in the name of expediency or economy. Later articles will drill down in to the details of these important steps in data preparation.

Data Preparation: Questionnaire Checking by Richard Pink March 5, 2010

Data preparation is one of the very necessary, albeit not the most fun, part of conducting a marketing research study. We have already interviewed the stakeholders in the study to determine what information they need in order to make some key decisions (and hopefully take some actions). We have started and/or completed our field work and are now sitting on top of dozens, hundreds or thousands of completed questionnaires. Now is the time to separate the wheat from the chaff. So what are we looking for?

Questionnaire checking involves eliminating unacceptable questionnaires. There are several reasons why a questionnaire may be unacceptable for use in a study. A questionnaire may incomplete. This is fairly common. A person may have started to take a questionnaire and then for reasons of fatigue, interruption or disinterest ceased providing information to the end of the questionnaire. (Please see earlier blogs about surveys that are too long.)

On review, it may become apparent that the respondent did not understand (or follow) the instructions. This becomes evident when questions are answered that should have been skipped or when appropriate "branching" of questions was not followed. For example, if Q1 directs if you answered Yes to this question, proceed to Q 3 and Q2 was answered inclusion of the answer

to Q2 would be inappropriate. Note: If you are conducting an online survey and using a professional survey software tool, this will automatically happen for you.

Questionnaire responses that have little or no variance may also be flawed. Some respondents will answer all multiple choice questions with the same answer, e.g., only 3s. It would seem that the respondent wants to be done with the survey as soon as possible, perhaps to qualify for an incentive, and will quickly select answers with out contemplating or even reading the questions.

Some questionnaires may be returned physically incomplete. Perhaps there are 5 pages to a mail survey and only three of the pages have been returned. This happens mostly with a hardcopy questionnaires and can result from physically mishandling the survey itself.

Questionnaires that arrive after a cutoff date may need to be eliminated from the study. The timeframe of the survey completion may be most important or when a survey was actually received by the researcher may be most important. Depending on the reason for a cutoff date it may be either inappropriate or impractical to include a survey response.

Finally, a participant may not be qualified to have participated in a market research study. Regardless of how or why a study may have ended up in the wrong hands, some people may participate in a study for which their answers are irrelevant.

The sometimes tedious process of questionnaire checking is necessary for having only valid data included in a study. If a researcher is sloppy in this regard or takes the position that "the more data, the better", incorrect information may result when one moves on to the next step analysis. Data Preparation: Questionnaire Editing by Richard Pink March 10, 2010 In our continuing review of data preparation, we will now look further in to the

topic of questionnaire editing. Editing a questionnaire can greatly enhance both the number of survey responses that a researcher may receive in a study as well as the quality of the responses to individual questions.

It is important to limit the size of a study so that potential respondents do not lose motivation to participate. But if we do a good job at limiting the length of a survey to only the most necessary questions, then we must also make absolutely sure that we get the absolute most from each of the questions. One of the best tools in fine tuning a question comes from conducting a pretest. Pre-tests involve having a limited number of people answer survey questions and then studying the responses to make sure that the results are what we might normally expect.

Potential flaws in questionnaires include ambiguous questions, double barreled questions (asking for two pieces of information in one question), overlapping answers and offering choices that are not inclusive of all possible answers. These problems should be handled by the researcher before a questionnaire is ever fielded. But too often, researchers do not take the time and effort to pre-test surveys.

Once a questionnaire has been carefully crafted and fielded for data collection, problems can also arise from the respondent side. These potential problems include illegible, incomplete, ambiguous and inconsistent answers. When this occurs, the researcher is then faced with the problem of how to remedy such problems. Solutions can include returning to field for further data collection, assigning missing values or discarding some or all of the unsatisfactory answers. There is much debate regarding the proper handling of unsatisfactory responses so it is well worthwhile for researchers to invest time up front in order to field the best possible questionnaires. Data Preparation: Questionnaire Coding by Richard Pink March 15, 2010 ? Now that weve completed our questionnaire checking and editing (the first two steps in market research data preparation), thus minimizing bad data as best we could, we may need to code some responses in order to enable analysis. Coding data is the process of assigning numeric or alpha information to question responses that do not ordinarily return to the

researcher in that format.

Coding typically assigns a number (sometimes a letter) to answers that do not already have them so that statistical techniques can be applied. Types of data that usually require coding are often demographic age, gender, marital status, household size you get the idea. As you can see, some of the original data may already be numeric such as age, but coding will provide for an aggregation into much small and therefore more useful categories.

Lets say that we are interested in exploring the size of family for participants in a store feedback study. Number of children can range from 0 to 12 or even more. However, such granularity is seldom useful for analysis. Therefore, we may want to assign an A to no children, a B to 1-2 children, a C to three to five children and a D to six or more children. Now instead of having 14 or more categories, we have four. The resulting survey data analysis will likely be much more digestible to clients and stakeholders.

In addition to assigning codes to answers returned from survey participants, we may also want to assign codes to provide other questionnaire information as well. Possibilities include: project code, interviewer code, date and time codes, and location.

Once we are satisfied with our coding technique, it is often useful to develop a code book. Code books will provide other necessary information particularly useful for computer programs. This code map will indicate where the columns are located on a data sheet, size of data fields and code type. A bit of thought and planning before actual coding can save significant time in data processing and analysis.

Remember: Many professional survey software tools will automatically code your survey data for you, as long as you set it up in the export values area.

Data Preparation: Questionnaire Transcribing by Richard Pink March 16, 2010

We have recently reviewed several steps in the market research process of data preparation. The process has taken us from preparing a plan for analysis to checking the questionnaire, editing and coding results. The next step is, in my opinion, somewhat mislabeled as transcribing. Transcribing involves the process of taking raw data and creating transformed data that can be used for further analysis.

Technology has done wondrous things for the transcribing process. It is with some nostalgia that I remember back in the 1960s when we researchers would have to create computer (IBM) cards by having them keypunched from questionnaires that were manually completed by respondents using their trusty number two pencil. Tragedy struck if one had the great misfortune of dropping their "deck" of responses before submitting it to data processing. Furthermore, one could verify the correctness of keypunching by using a wire specifically created for insertion through a deck via a single similar hole that was present on every computer card in the deck (this wire was also useful for picking locks).

So forgetting about decks and hanging chads, we now have much more trustworthy methods of transforming data into a format that can be used by computer programs. CATI programs are computer assisted telephone interviewing, and CAPI programs are computer aided personal interviews. Both programs enable an interviewer to directly enter information into a computer or database to be used by a computer program. They can automatically eliminate many entry errors, incorrect skip patterns, provide randomization of question/answers, ordering and help avoid some biases. Don't forget the ability to now use online market research surveys where the respondent automatically takes care of the data entry as they answer the questions.

There are many other innovations that have greatly aided transcribing of data. For one, grocery stores capture a mountain of information on each of us and our purchases every time we use our customer cards and have our product codes scanned. So the researcher should no longer think of transcribing as someone writing down information that is being feed to them by a respondent. Transcribing is now quite high tech and mostly invisible to us researchers. Thanks goodness!

Data Preparation: Questionnaire Cleaning by Richard Pink March 22, 2010 ? If youve been following the articles regarding data preparation and those steps in the market research process, you are now at the fifth part of that seven part process. So far we have reviewed:

Questionnaire checking Editing Coding Transcribing

Some researchers will also claim an initial step being preparing a "preliminary plan for analysis." However, since the selection of a data analysis plan is the final step in data preparation, and some believe it to be more a part of the overall project plan, the analysis plan will be covered in the final data preparation step.

Todays article deals with the process of data cleaning. Cleaning reviews data for inconsistencies. Inconsistencies may arise from data that are out of range, logically inconsistent or contain extreme values.

Out of range data often is a result of poor questionnaire design or faulty data entry. Lets say that a survey respondent is asked to rate an attribute satisfaction from one to five. If the numeral six shows up on the data sheet, there has likely been an incorrect transcription of data. The fix is to go back to the original data capture instrument, e.g., paper questionnaire (hopefully uniquely coded for just such an event) and see what the respondent actually entered for the answer.

Logically inconsistent data is a little more complicated since it can require thought and investigation into what and how survey questions are related

and dependent. For example, if a respondent claims to pay for most of their purchases via checking account but then answered a different question stating that they do not have a checking account, we have an obvious inconsistency. Proper survey question logic (such as branch logic) can sometimes help avoid data inconsistencies but not always.

Finally, extreme values arise when data is entered or transcribed that is out of the range of likelihood. If, for example, age is captured in a non-categorical question as 250 years, we know that this extreme value is incorrect. The true age is likely either 25 years or 50 years. A review of answers throughout the rest of the questionnaire may point to the proper age. If it does not, the best action for analysis is to code the data as missing.

Data cleaning is not difficult. Data cleaning is not fun. Data cleaning is necessary. Without data cleaning, suspect survey results will likely occur which cast doubts on the accuracy and usefulness of a research study.

Data Preparation: Questionnaire Statistical Adjustments by Richard Pink March 25, 2010

Statistical adjustments are one of the final steps in the market research process to prepare data for analysis. We have already gone through several steps to obtain and use a high quality database with which to study the survey results and help us provide keen insight into the problems and possible solutions that clients are having us explore. Even without applying adjustments to the data, the researcher is likely in good shape to start extracting knowledge. However, some added adjustments may enable a bit more information or a more elegant delivery of that knowledge. Most common statistical adjustments are applied for the purpose of weighting, variable respecification and/or scale transformation.

Weighting is simply adjusting data so that a particular respondent or case is given more (or less) importance than other respondents or cases. Often this is done in order to make the data more representative of the target population on various attributes. An example of weighting might be when a client is interested in taking action aimed at improving his offering to heavy

users of his product or service. In order to have a closer look at those heavy users, a weight of 3 might be given to a heavy user while a moderate user gets a 2 and the light user weights as a 1.

Variable respecification creates new variables or changes existing variables. One of the most common uses of variable respecification is when a number of categories used to answer a question are collapsed into fewer categories. Using the example above, we may ask a respondent to indicate their product usage by selecting from a ten category scale. However, we may not really need that degree of granularity. So we might combine some of the 10 categories to arrive at three categories low, medium and heavy users.

Finally, scale transformation is sometimes required when a survey employs scales of differing length and types. In order to have some degree of comparability or even compatibility scales may be transformed so that logic exists between various.

The last article in the series addressing data preparation will cover survey data analysis strategy. All of the work in collecting and treating data could become a great waste if researchers do not recognize and select the appropriate analysis

Data Preparation: Data Analysis Selection by Richard Pink April 2, 2010

The final step in the data preparation process is the selection of the data analysis strategy. This selection should be based on earlier work in designing the research project and is finalized after consideration of the characteristics of the data that has been gathered, properties of statistical methods and the knowledge and discretion of the researcher.

The basic qualitative and quantitative research design started with defining the study objectives to address a problem. Next the stakeholders constructed an overall approach to solving the problem and the researcher designed a

research project to aid that approach. As part of this, the researcher has already formulated an idea about how the data will be analyzed.

The characteristics of the data play a role in dictating how an analysis will be conducted. If the researcher has an adequate idea of which statistical techniques he may want to employ, then he has likely attempted to gather the data in a form appropriate to those market research or survey analysis techniques. However, sometimes the data arrives in a form a bit different than the ideal. In those cases, the data may need to be coded or transcribed, as discuss in previous posts.

Most statistical techniques have unique characteristics that can either make them useful for an analysis or render them inappropriate or unusable. Regressions are commonly used in establishing correlations between a dependent variable and other independent variables. Technically all variables but one should be totally independent from each other. In practice, the assumption of independence is often broken. How often do you see correlations which include the "independent" variables of age, family size and income? Certainly these attributes are not independent from each other.

Finally, the statistical techniques that a researcher knows, feels comfortable using and prefers to use will strongly point toward a particular plan for analysis. In fact, it is quite common for research houses to advertise certain techniques for which they are particularly adept and for which they are well known.

Anda mungkin juga menyukai