- inter-rater reliability (to examine to which different observers agree on the data collected from the observation);
- test-retes reliability (to check stability of data collection over time);
- regrounding (to repeat the data collection and compare both results);
- parallel form (to examine to which extent two versions of the same data collection procedure are really collecting the same data)
To assure reliability different methodologists suggest involving at least two observers to carry a sequential analysis (Becker 1970:79), or to achieve inter-observer agreement (Croll 1986:150). The idea of the former procedure is to carry out the analysis concurrently with data collection in the sense that one may step back from the data, so as to reflect on their possible meaning (Fielding 2001:158). Thus further subsequent data gathering will direct the observer either to abandon or pursue the original hypothesis. In the later procedure two observers look at the same events from different locations to categorise these events and compare the outcomes. Using systematic schemes with pre-specified categories they refine, or index (Fielding 2001:159) the definitions and categories of observation by applying in a consistent manner the procedures for data selection, collection, grouping, inclusion, exclusion etc. (Simpson and Tuson 1995:65).
2.3.2 Types and evidences of validity
Just as there are different types of reliability, Seliger and Shohamy (1989: 102) suggest that there are different types of validity which provide evidence for validity. Thus, their typology of evidences of validity comprises
- evidence on content validity which demonstrates appropriateness of data collection against the content to be measured;
- criterion validity which provides an indication as to whether the instrument can be measured against some other criterion and compared with the previous results (concurrent validity), and whether the procedure is capable of foretelling certain behaviour (predictive validity);
- construct validity which examines whether the data collection procedure is a good representation of and consistent with current theories underlying the variable being measured.
Chaudron (1988:24) gives another term to the content validity and suggests treatment validity which relates to the process component of process-product study and demonstrates that the treatment was in fact implemented and that it was identifiable different from whatever it was being compared with.
For the results of the second language research Seliger and Shohamy (1989:104) identify internal and external validity. They propose that a study has internal validity if the outcomes of the observational data can be directly and unambiguously attributed to the treatment that is applied to the observed group, and that the interpretation of these data is not dependent on the subjective judgement of an individual researcher. Internal validity in this sense relates to three areas: representativeness, retrievability, and confirmability of the data (Seliger and Shohamy 1989:104). External validity involves the extent to which the findings of a study can be generalized and applied to another situation and the categories of the study are treated as basic, applied, and practical.
To achieve evidences of validity items or questions of an instrument must be analyzed in the process of data collection. A researcher or observer should obtain information on whether the items are of low-inference or high-inference (Long 1980), too easy or too difficult, and whether the items are phrased and easily understood by the respondents. All these aspects are recommended to examine in the pilot phase of the research that is likely to be proved by evidences from a variety of sources, such as additional questionnaire data from pupils or teachers, interviews, surveys. Another way of examining the validity of observation is to ask colleagues to study the categories and to define the purpose of the observation. Simpson and Tuson (1995:65) treat this method as a useful check on face validity. Thus to achieve reliable and valid observation an evaluator should take into account the spatial location of an observer, engage more than one observer, involve low-inference categories that do not require complex interpretation and check agreement of key aspects against independent studies.
- Items of observation
2.4.1 The importance of items
In so far the language classroom observation does not simply mean watching classes (Wallace 1991:123). An observer may record either very narrowly defined data such as a specific speech act, or more general kinds of language learning activity such as turn-taking, group work.
Any scientific research or observation is characterised by terms as structured, organised, methodical, and systematic. To follow these characteristics any data collection obtains a structure or format, and guided by some questions or variables. Croll (1986:55) notifies a variable as a basic unit that represents the process by which a concept of interest is turned into a set of working definitions whereby the results of observation or some other data collecting process can be categorized and measured.
2.4.2 Items of observation in the language classroom
For classroom observation as a learning tool Richards (1998:143) proposes three perspectives on a lesson for pre-service training to develop a deeper understanding of how and why teachers teach the way they do and the different ways teachers approach their lessons. They are:
- Teacher-centered focus: the teacher is primary focus; factors include the teachers role, classroom management skills, questioning skills, presence, voice quality, manner, and quality instructions.
- Curriculum-centered focus: the lesson as an instructional unit is the primary focus; factors include lesson goals, opening, structuring, task types, flow, and development and pacing.
- Learner-centered focus: the learners are the primary focus; factors include the extent to which the lesson engaged them, participation patterns, and extent of language use.
Wallace (1998:68) substitutes the focus on the curriculum with the focus on the context in which the teacher teaches: the classroom layout, the teaching aids available and how they are used.
Low-inference and high-inference categoreis
The presentation of items involves constructing sets of categories into which occurrences must be coded unambiguously. In this respect Long (1980:3) introduces low-inference and high-inference measures. Low-inference categories include things that can be counted or coded without the observer having to infer their meaning from observable behaviour. Such categories according to Allwright and Bailey (2000:73) involve the number of times the student raises her/his hands, or the frequency with which the teacher uses the students name. High-frequency items demand that the observer make a judgement that goes beyond what is immediately observed. The samples of this type of categories cover factors like learners attention, or the social climate. I can conclude that observation data should cover categories of observable behaviour that does not require much interpretation.
- Typology of observation
Typology of classroom observation instruments is worked out by Wallace (1991:66) and he presents the following oppositions:
- system-based, ethnographic or ad-hoc
- global or specific
- evaluative, formative or research-related
- teacher-focused, learner-focused or neutral in focus
- quantitative or qualitative
He admits that some of the oppositions are not clear-cut and overlap. For example, observation techniques which are primarily evaluative may be employed for formative purposes, ethnographic approach is treated as global and qualitative. System based approach can focus on teachers activity and learners activities. System-based (systematic), ethnographic and ad-hoc approaches encompass other characteristics of the classification provided. Thus, I outline the features of the first opposition.
2.5.1 System-based approach
By system-based observation Wallace (1991:67) means the observation that is based on a system of fixed and pre-specified categories. They are global in nature, i.e. they are intended to give general coverage of the most salient aspects of the classroom process (Wallace 1991:110). Any system contains a finite array of categories. The endeavour of all system-based observation instruments is the analysis of teacher-class interaction. The two most influential systems are devised by Bellack (1966:267) and by Flanders (1970:314). Wallace (1991:112) has identified the characteristic features of the first system as:
- the data are measured from a transcript, i.e. the data have to be first recorded and then transcribed;
- the central place of labelled units of discourse are structure, solicit, response, reaction.
In the Flanders tradition there is a form of documented recall where tallies are made every three minutes under one range of categories. In chapter 2.6 the analysis of a range of interaction schemes, their advantages and disadvantages are presented with more details. They are widely used by researchers as they are ready-made, well known and it does not to be trialled and validated (Wallace 1991:111).
2.5.2. Ethnographic approach
The observation techniques share many of qu