The Effects of Explicit Film-based Instruction on EFL Teacher Trainees’

: Pragmatic competence is an indispensable dimension of overall language ability, and proper interpretation of implied meanings is a major constituent of pragmatic competence. In this regard, this study aimed to investigate the efficiency of a film-based instruction program devised to facilitate the interpretation of implied meanings in English. It was conducted with a quasi-experimental design. First, a multiple-choice discourse completion test was given to 144 English language teacher trainees with 77 people in the experimental group and 67 in the control group. After the 5-week instruction given to the experimental group, the test was administered to both groups again. The results revealed significant differences in favor of the experimental group. This makes the program a promising one as it made the participants, who were also prospective English as a Foreign Language (EFL) teachers responsible for helping their own students have pragmatic competence too, significantly more equipped about processing implied meanings as a major constituent of pragmatic competence. The effects explicit instruction on trainees’ interpretation


Introduction
Since Hymes (1971) laid the foundations of "communicative competence", "pragmatic competence, i.e. the ability to process and use language in context" has kept consolidating its position as an essential constituent in the evolving communicative competence models (Bachman & Palmer, 2010;Bagaric & Mihaljevic-Djigunovic, 2007;Canale & Swain, 1981;Council of Europe, 2001). In this regard, language education practices are worth examining in terms of pragmatic competence development. This examination would get worthier when we consider the reported neglect of "pragmatic competence as an instructional target" (Segueni, 2014) and the need for pragmatic competence instruction felt more strongly in foreign language contexts with limited exposure to authentic input (Li, 2015).
Within the wider notion of pragmatic competence, among its five main areas (Levinson, 1983), the study of pragmatics has focused on speech acts (Bella, 2014;Roever, 2013) and to a lesser extent on implicature (indirectly-conveyed meanings). Accordingly, considering also the aforementioned neglect of pragmatics in language teaching, it would not be hard to predict that implied meanings have not been frequently made the focus of attention in language education practices either.
In light of these considerations together with the fact that languages like Turkish is not among the first language (L1) backgrounds frequently represented in pragmatic studies (Rose, 2005), "implied meanings" as an instructional target for prospective Turkish EFL teachers was made the rationale for this study. Besides being learners of English with Turkish as their L1, they were future EFL teachers supposed to help their own students acquire pragmatic competence too. Accordingly, the present study set out to make them more equipped about a major constituent of pragmatic competence with a film-based teaching program. This added the study a "material development dimension" in the dearth of studies utilizing video-vignettes as an input source (Birjandi & Derakhshan, 2014) ideal for both learning about pragmatic strategies and a springboard for language use (Cohen, 2005;Tatsuki & Nishizawa, 2005).

Literature Review
Pragmatics and implicature (implied meanings) With "deixis", "presupposition", "speech acts" and "conversational structure" (Levinson, 1983), implicature (implied meanings) -i.e. cases where what is meant is distinct from what is said (Davis, 2007) are among the main areas of the study of pragmatics. Within the general notion of implicature, where the scale covers "conventional" and "conversational" implicatures (Grice, 1975), Bouton (1994) produced the taxonomy out of the latter as idiosyncratic and formulaic ones. The implicature labeled "Indirect Criticism" in the formulaic category can be considered a representative example. There is a semantic formula to it in that it is often used in response to a request for a value judgment like "How do you like my new shoes?" When that judgment might prove offensive to the person asking, the speaker often responds with a positive remark about some peripheral feature of whatever s/he is asked to evaluate. For instance; a response of "They certainly look comfortable" might be indirect criticism if the shoes are expensive dress shoes, for which the most important characteristic would be their appearance (Bouton, 1994). Another characteristic one would be (verbal) irony, where there must be "some discrepancy between the reality and the utterance, and the listener must recognize this discrepancy to interpret the utterance" (Kreuz & Roberts, 1995, p. 22). It happens when, for instance, someone responds with "Well, thanks for the help." after his/her ask for a small favor is rejected.
All the other implied meanings within the present study are discussed in detail in Cetinavci and Ozturk (2017), which is one of the studies interrelated with this one in a large project. Furthermore, Appendix A shows the test item specifications so that each item can be considered also a different example of the implied meaning it represents.

Teaching pragmatics and implied meanings
Within the framework of fundamental issues regarding "pragmatics and language learning", we could firstly note that teaching practices and assessment in EFL contexts like in Turkey tend to be grammar-oriented (Erkmen, 2014;Tercanlioglu, 2005), and even a high-level grammatical competence does not mean a paralleling level of pragmatic competence (Bardovi-Harlig, 1996;Jianda, 2006). The compensation can be offered by instructional interventions, but the literature suggests an air of "neglect" about handling "pragmatics as a learning target" in especially EFL classrooms (Rose, 2005;Segueni, 2014). In this regard, the processing control in accurate pragmatic performance may not be left merely to exposure especially in EFL contexts, where there is limited chance to observe and use the target language in natural contexts (Kasper, 2001;Li, 2015;Taguchi, 2008). The literature offers paralleling findings specifically about "speed" of pragmatic processing too. Taguchi (2008) reported that the longitudinal gain of speed in pragmatic information processing is smaller than that of accurate understanding in EFL environments. Contrariwise, ESL environments more strongly support the processing speed development than accurate comprehension of pragmatic meaning. This could be attributed once again to lack of abundant incidental processing practice in foreign language environments (Taguchi, 2011a). In this context, textbooks can be expected to step in, but the authenticity of the language presented in such materials is questionable (Economidou-Kogetsidis, 2015;Vasquez & Sharpless, 2009) and short of opportunities for learning L2 pragmatics (Li, 2015;Vellenga, 2004). These flaws have the potential to put learners at a disadvantage, "for pragmatic failures are sometimes not recognized as such by non-linguists" (Economidou-Kogetsidis, 2015, p. 1) and could be attributed even to impoliteness (Bardovi-Harlig & Dornyei, 1998;Crandall & Basturkmen, 2004;Thomas, 1983).
In interlanguage pragmatics, i.e. the study of non-native speakers' acquisition of L2 pragmatics (Kasper, 1996), Bouton (1988Bouton ( , 1992Bouton ( , 1994Bouton ( , 1999 was the first scholar to underscore the significance of implied meanings as a communicative tool which could lead to communicative failure when neglected. Taking the demonstrated importance of implicature in daily interaction as the departure point (Bouton, 1994), which would take on an added importance in light of the fact that the study of pragmatics has focused on speech acts (Bella, 2014;Roever, 2013) and to a lesser extent on implicature, he conducted studies where he tested whether non-native speakers (NNSs) of English derive the same meaning from implicatures as native speakers (NSs) do. He found that NNSs' ability to interpret implicatures varied and could significantly differ from NSs'. He also observed that NNSs would just slowly get closer to NSs when they have had ample communicative experience in the target language country. Bouton (1999) reported a lot through his instructional/experimental studies too. He found that formulaic implicatures, which are based on a formula of some structural, semantic or pragmatic sort, proved very much teachable though they might be considerably difficult for NNSs and less susceptible to exposure effects. With reference to the instructional sessions he had designed, Bouton also reported that the more explicit instruction language learners get on implicatures, the better results they achieve in interpreting them, which was confirmed by subsequent studies focused on several different pragmatic constructs (Taguchi, 2011b).
Claiming that pragmatic competence is a neglected part of the English curriculum in Japan, Kubota (1995) designed a study where three groups were given a multiple-choice test and a sentence-combining test. In one group, the explanations of rules were provided by a teacher; in the second, consciousness-raising tasks grew out of group discussion while the third group functioned as a control. All the subjects received a pre-test and two post-tests. The results confirmed Bouton's that teaching implicature through explicit explanations and consciousness-raising tasks was highly facilitative. Among the other relevant studies with an instructional perspective are Blight (2002), who discusses his self-developed procedure for raising pragmatic awareness by providing explicit instruction in NS use of implicature, and Murray (2011), which lends empirical support to the claim that Grice's model is valuable for the training of both English language learners and teachers on implied meanings.
In light of the abovementioned considerations, explicit teaching (direct explanation of target features followed by practice), whose crucial role for pragmatic development has been reported by such scholars as Jeon and Kaya (2006) and Taguchi (2015) too, was adopted as the methodological paradigm in this instructional/experimental study. The teaching materials had long been conceived as filmic materials, inspired by the premise that "films can be used effectively for analyzing speakers' language, specifically pragmatic aspects of language" (Abrams, 2014, p. 58). The rationale is that films can provide the richly contextualized exchanges (Abrams, 2014) essential for meaningful pragmatics instruction (Felix-Brasdefer, 2007;Kasper, 2006). In a similar vein, Washburn (2001) indicates that films enable learners to hear and see pragmatics accompanied by the sociopragmatic aspects of interaction. Eslami-Rasekh (2005, p. 201) specifies that "filmic materials boosted with discovery activities would let students identify what to look for, formulate and test hypotheses about language use and become reflective observers of language use in both L1 and L2". Taking account of his comparison between compliments in films and natural speech data, Rose (2001, p. 318) concluded that films can be manipulated as "a useful source of pragmalinguistic information". Motivated by the neglect of the improvement of aural-oral skills in FL teaching in Turkey, Aydin (2005) posits that TV series acquaint learners with linguistic diversity, showing them the contribution of register and context to communication.
After specifying the methodological paradigm as explicit teaching and the teaching instruments as filmic materials, given the myriad of the implied meanings in the literature, I had to make a decision on the ones to be included in the present study. This decision steered also the development of the data collection instrument used as the pretest, posttest and delayed posttest. In this regard, the following were decided to be the implied meaning groups included in the study: "Pope Questions", "Indirect Criticism", "(Verbal) Irony", "Indirect Refusals", "Topic Change", "Disclosures", "Indirect Requests (Requestive Hints)" and "Indirect Advice". It is considered worthwhile here to explain why they were specifically chosen.
First of all, Pope Questions, Indirect Criticism, Irony, Topic Change, Disclosures and Indirect Refusals had been included in several other studies (Bouton, 1994, Roever, 2005Taguchi, 2005) as overtly labelled "implicatures" or "implied meanings". The speech acts of indirect requests (Rinnert & Kobayashi, 1999) and indirect advice (Matsumura 2001(Matsumura , 2007, which have not been bunched together with the abovementioned implicatures before, were included in this study on the basis of a consideration like Verschueren's (2009), who observes that Grice's (1975) account of implicatures and Searle's (1975) definition of indirect speech acts are very similar, or Birner's (2012), who posits that indirect speech acts are a subtype of implicature. Second, after being grounded on the literature that labels them "implied meanings", the abovementioned implicatures needed to be shown as having specifically the qualities of "formulaic" ones as they were the version that had proved very much teachable though they might prove considerably difficult for NNSs (Bouton, 1999;2015). In this regard, we should emphasize here the fact that Pope Questions, Indirect Criticism, Irony, Topic Change and Indirect Refusals had already been defined to have a formula of some sort (Bouton, 1994, Roever, 2005Taguchi, 2005). For the rest -i.e. Disclosure, Requestive Hints and Indirect Advice, which have not been overtly declared as formulaic, my claim was that some of their variations can be deemed formulaic, or tentatively formulaic at least, thus worth being included in the instruction and tested in terms of teachability. This was a risk for the present study, but one worth taking as the intention was to respond to Bouton's (1994, p. 106) call that we should be "alert to implicature types of which we are not fully aware with an eye to including them in instruction programs".

Research Questions
Within the framework set so far, the following research questions guided the study: 1) Does instruction based on filmic materials make a difference in Turkish EFL teacher trainees' comprehension accuracy of implied meanings in English? 2) Does instruction based on filmic materials make a difference in the trainees' comprehension speed of implied meanings in English? Methodology This study set out to test the efficiency of an instruction program devised to facilitate the better and faster interpretation of implied meanings. Accordingly, it employed a "pretest-treatment-posttest" procedure in a quasiexperimental design, where an "interventional treatment effect" was investigated by comparing control and referent groups.

The data collection instrument
An online multiple-choice test was used as the pretest, posttest and delayed posttest. Following Roever (2005), its initial version was piloted with different groups of English proficiency to have varying perspectives. The first one consisted of two subgroups: 69 first-year EFL teacher trainees at Uludag University and 13 Turkish citizens who had been schooled and lived in an English-speaking country. The second group was 23 EFL learners at the School of Foreign Languages at Uludag University. They had been ranked at beginner/elementary level a year earlier by an official placement test. They participated in the study after a year's intensive EFL instruction. The third group was 12 NSs (5 American, 4 British, 1 Canadian, 1 Australian and 1 South African). As seven of them were later interviewed about each test item, they functioned also like Roever's (2005) NS participants producing verbal protocols.
The results seemed promising in that the test proved generally suitable for the EFL teacher trainees, to whom the implied meanings would be taught, and it reflected the variability between the different proficiency groups. The doubts arising were resolved when seven NSs were interviewed about the test items. Their comments overlapping with each other led to some rightful changes in terms of wording, distractors and grammar.
The modified version was examined with four of the NSs who had contributed in the previous debriefing sessions.
Their common point was that they were trained and experienced in the field of language teaching (in Turkey too). The new ideas that came up during the talk with any of them were later shared with the others and compromise was sought. Apart from the abridgement/simplification work, some items were added extra characters, statements and linguistic units characterizing the interactive nature of spoken English like discourse markers, interjections and hesitation markers.
Consequently, the new test was administered online to 43 EFL teacher trainees at Uludag University, 21 NSs (13 American, 3 British, 2 Australian, 2 Canadian, 1 New Zealander), 14 EFL learners at the School of Foreign Languages at Uludag University, who were ranked at pre-intermediate level four months earlier, and 11 high school students, who were getting a language intensive education to enroll for such university programs as English Language Teaching (ELT), Translation/Interpreting Studies etc. The data were analyzed with SPSS 22. The Cronbach Alpha's Reliability Coefficient was calculated ".777". To see if there were any significant differences between the participant groups, oneway ANOVA was performed. As the homogeneity of the variances was not satisfied (p<0.01), non-parametric tests (Kruskal Wallis) were conducted. The tests showed significant differences among the groups: χ2 = 54.589, p<0.01. The findings revealed statistically significant differences between the NSs' performance and all the other groups' (p<0.01). This shows that the study was able to address a problem worth pragmatic assessment and instruction.
Apart from the comparison between the NS and other groups, it was seen that the performance parallelisms and differences between particular pairs of groups could be attributable to the proficiency features. There was no significant difference (p>0.05) only between the high school students and EFL teacher trainees. This is predictable as students like the former function as the primary source for university programs such as ELT. Therefore, it can be postulated that the teacher trainees had the position of the high school students a few years ago while some of the latter would probably be the students of different ELT departments a couple of months later. Regarding all the other pairs, significant differences came out (all with p<0.01 but p=0.02 only between the high school students and those at the School of Foreign Languages). This is a strength of the test as it reflected the performance variability between participants from different proficiency levels. This is important since educational assessment is supposed to discriminate between who are assessed, and a good test should produce scores varying between high and low performers (Biggs, 1996).
The piloting phases are documented in even greater detail in Cetinavci and Ozturk (2017).

Research site and participants
The main study was conducted in the ELT Program of Uludag University in Bursa, Turkey. The participants comprised 144 (40 males and 104 females aged between 18 and 21) first-year EFL teacher trainees (within the bigger group of 220 students who had participated in the very beginning, only their data were subject to the final analyses as they were the ones who took the pretest, missed none of the instructional sessions and then took also the posttest). They had very similar educational backgrounds and achievements with English. For practical and administrative reasons, it was not possible to use an exam like TOEFL in the beginning. Hence there is no factual data showing that the groups were comparable regarding their language proficiency, which is a limitation.

Administration of the pretest
Among the 249 students enrolled in the Contextual Grammar course taught by me, 220 were administered the test firstly as a pretest. It was taken simultaneously in a large computer laboratory in five groups on the five consecutive days of the same week.
A professional programmer had cooperated to develop the test as a web-based one to run on any web browser. He wrote the codes so that the system would control all the functionality. The time allocated in it had the same number of minutes as that of the items. Although the responding time was rigorously limited, the test takers had the chance to use as much time as they wished to read through the instructions on how to take the test. Every test taker went through the same steps as follows: a welcoming page categorizing the participants, a background questionnaire, an instructions page and the main test section consisting of sequential pages for each item.
In addition to the parts of introductory statement, conversation, question and four answer choices, the item pages had a standardized frame set. Up on the left and right were shown the item number and the time left to complete the test. Below the response options was the "Submit" button to finalize the decision and move on to the following item. On the bottom of every item page were the "End Test" and "Instructions" buttons. When clicking the "End Test" button, the test takers were asked if they were sure they wanted to abandon, and with the appearing new buttons "Yes" and "Cancel", they were given the chance to resume.
Another important feature was that the system stored all the responses and recorded the average time spent for each item. Besides, a certain design feature was incorporated so that the test takers could not get back to the earlier items to change their recorded responses.
Starting from the piloting phases, a main concern was to minimize the effects of vocabulary knowledge and general language proficiency differences. To this end, besides the language of the items simplified to a significant extent, all the salient vocabulary items were displayed as underlined on the screen and whenever a test taker positioned his/her cursor on one, the related definition from an online dictionary automatically appeared.
All the technical/practical details of the test, the process of developing it and the comparative results that it yielded between teacher trainees and NSs are documented in greater detail in Cetinavci and Ozturk (2017) and Cetinavci (2018) respectively. Its full version can be seen in Appendix B.

Experimental phase
The experimental work began after the pretest. The students had been enrolled into six different classes (officially coded with letters from A to F) according to their preferences as to when they wanted to attend the lessons, which were finalized by the university's course registration system. Among the total of 220 test takers in those six classes, the population of four classes (n=141 with 39 males and 102 females) was assigned to the experimental group while that of two classes (n=79 with 26 males and 53 females) was designated as the control group members. The selection was random with every third class, i.e. classes C and F, being taken as the control groups.
The random assignment of the intact class populations was deemed appropriate as there was no reason to think that any of the classes was at a decisive advantage or disadvantage compared with any other one. Besides, as described earlier in the Methodology section, the language of the test items went through simplification work with the collaboration of English NSs and some software precautions were taken to eliminate the vocabulary knowledge problems. The aim was to put the participants face to face with the challenge of implicature interpretation only, as isolated as possible from the effects of proficiency differences. However, as mentioned before as a limitation, there was no factual data to ensure that the groups were comparable regarding their language proficiency.
The experimental group was significantly larger than the control group. It was anticipated that the number of the participants in the former who would regularly attend the instruction sessions would eventually be close to the number of the latter. The outcomes vindicated the anticipation. The number of the experimental group participants who took the pretest, regularly attended all the sessions and finally took the posttest was 77 (17 males and 60 females) while the number of the control group members who were able to take both the pretest and posttest was 67 (23 males and 44 females).

Instructional materials
Out of the 238 episodes of the sitcom "Friends (1994)", the scripts of 103 episodes had been perused in a period for over one year preceding the conduct of the study. The aim was to extract the best conversations that exemplify the use of the implied meanings covered. The dialogues chosen from the scripts served as the template on which the treatment sessions were built. All were ultimately negotiated with and confirmed by another researcher in the field of pragmatics.
The reasons why "Friends" was chosen are several. First, I was familiar with it as I had watched it as a keen fan. That was why perusing the scripts for some efficacious details was "just an arduous task" rather than a mental physical torture. Second, one can justifiably claim that "Friends" is one of the finest shows in television history. As Quaglio (2009) puts it, its popularity affected the public in various ways, from the style of hairdos to language use. The approximation of that language to every day American English and its influence on regular conversation was most probably a key to its success. Furthermore, excerpts from it were used to exemplify features of conversational English in ESL classrooms.
The materials prepared for the treatment were not based solely on "Friends". With some help from "tvtropes.org", a wiki that collects and expands upon tropes found within creative works like TV series, films, novels, plays, video games, anime, manga, comic strips/books etc., some other conversations were used as support. The rationale was to add versatility to the treatment and show that the content of the instruction is not peculiar to the language in "Friends". The sources utilized are given below:  (2002) Orange Is the New Black (2013) The Prince of Tides (1991) Batman (1989) * The course the participants were taking was geared towards American English in connection with the textbooks. In this regard, like "Friends", all the materials above but "About a Boy" are American-oriented. "About a Boy" with a statement in it by an English actor was used for "(Verbal) Irony", which must be common in both British and American English. ** To a limited extent, the genre of commercials was used too, for GEICO's is a campaign based on Pope Questions. In each episode, an actor walks into a room and queries the viewer with the question "Could switching to GEICO really save you 15% or more on car insurance?" After that, he asks a Pope Question like "Does it take two to tango?" immediately followed by a funny scene cut to the subject.

The Instructional procedure
With the materials finalized, the treatment began for the experimental group. No other teacher than me was involved so that the same methodology was adopted for each class. The participants had no chance to work on the videos or transcriptions before or after the sessions. This was to minimize the self-study effects and raise the possibility of attributing the results mainly to the classroom treatment. Two types of implied meanings were studied each week in one 40-minute class hour, so all the eight types were covered in four weeks in 160 minutes. The last week was allocated for a revision on each type in one more 40-minute class hour, which makes the whole period a 200-minute work completed in five class hours. Arguments could be developed on whether the treatment could have been shorter or longer. However, as Koike and Pearson (2005, p. 495) put it, "more time spent on a particular pragmatic construct during a semester is unlikely to occur, since the demands of the curriculum for the other elements of language study are unlikely to allow". Accordingly, devoting more hours to a program like in this study would be against practicality. This could sound like an arbitrary decision. However, as Taguchi (2015, p. 32) notes it, "decisions on treatment length have typically been arbitrary, reflecting practicality and convenience" in the given context.
The study had been conceived at the outset to meet the eligibility criteria in Taguchi's (2015) state-of-the-art article, which brings together the developments of instructed pragmatics over the preceding decades. Accordingly, in addition to a pre-/posttest design with a control group, fully described participants and comprehensive data showing the outcomes of the instruction, this study aimed to include also detailed information about the teaching methods employed in it. The following paragraphs and appendices are intended to do that.
The pedagogical rationale behind the program was to provide metapragmatic opportunities in which learners can reflect on cross-cultural differences and their understanding of pragmatics (Taguchi, 2015). For the practice of instruction, the template adopted was Ishihara's suggestions on how implicature might be addressed in advanced ESL/EFL classrooms: 1) Introduction of each type of implicature with the label, definition, and several examples; 2) discussion of new examples of implicature:  identification of the implicature;  explanation of how literal meaning did not hold and how the implicature was detected;  identification of what is actually implied in the messages;  illustration of learners' experiences with implicature;  identification of similar implicature in learners' L1s; 3) group work creating dialogues containing implicature; 4) analysis of new examples of implicature provided by the teacher or by the learners. (Ishihara, 2010, p. 154-155) The exemplificative figures in Appendix D provide a presentation of how Ishihara's (2010) steps were adapted to teach the covered implied meanings.

Administration of the posttest
Nearly four months after the pretest and 10 days after the treatment, all the available experimental and control group members took the posttest. The experimental group took it in two subgroups simultaneously in two computer laboratories. The control group took it collectively the following day in a large computer laboratory just like in the pretest conditions. The results of the experimental group participants missing even one treatment session were excluded from the comparisons. The aim was to be able to attribute the performance changes primarily to participation in the instruction.

Administration of the delayed posttest
In the literature on teaching pragmatics, "whether the pragmatic gains from instruction could be retained over time is questionable" (Koike and Pearson, 2005, p. 482). This concern was inspirational for the use of a delayed posttest in the present study.
It should be mentioned here that the experimental group had taken the posttest for 31% of their final exam assessment while the delayed posttest group took it voluntarily only for research purposes not collectively but under less controlled conditions wherever and whenever they felt free to. If such a rightful concern is voiced, the first counterargument is that there could not be another way as the participants were not taught by me anymore and it was not possible to demand anything from them with reference to course credits. Secondly, the procedure for the delayed posttest was followed as such with a particular expectation: If a positive significant difference is explored between the pretest and posttest scores, it is completely understandable to discuss the possibility that the difference is mainly because of the participants' motivation for obtaining a certain percentage of the course credits. However, if that difference is more or less kept in the delayed posttest taken with no worries about failing the course, the argument would be more convincing that the difference arose mainly from the efficiency of the treatment.

Results in terms of the whole test
As the sample sizes were relatively large, the Kolmogorov-Smirnov test results were taken into account, where the statistical limit for normality is secured with "p" values above "0.05" (Can, 2013). In this regard, the normality tests showed that the scores were not normally distributed (p= .006 and p= .025 for the experimental and control group respectively). Therefore, the nonparametric Mann-Whitney U test was applied to compare the probable differences between the pre and posttest performances of the experimental and control group participants. The experimental group achieved a progress of over 15% superiority in proportion to the scope of the test. Table 2 shows whether this was a statistically significant difference: As Table 2 reveals, a significant difference was found between the pre and posttest score differences of the groups (p < 0.01). This suggests that the instruction generated an apparently positive effect on overall comprehension.

Results in terms of the item subsets
Following the same procedural steps as those for the whole-test analyses, the Mann-Whitney U test was applied to compare the performances regarding the interpretation of each subset. As Table 3 displays, except for "Disclosure", the experimental group made a progress in all the subsets at varying extents of superiority to the control group in proportion to the number of items in each subset:  According to the results above and those in the preceding subsection, the treatment produced a noticeable positive effect on overall comprehension and also on six out of the eight implied meanings in specific terms. The "p" value was calculated lower than "0.01" for four of the six subsets where a statistically significant difference arose. For the other two (indirect advice and Pope Questions), the "p" value was calculated smaller than "0.05." In this regard, the hypothesis that the instruction would make a positive difference in trainees' comprehension accuracy for implied meanings is confirmed to a considerable extent.

Retention of the treatment effects
As the scores were not normally distributed in two out of the three tests (p= .052, .000 and .002 for the pretest, posttest and delayed posttest respectively) and because the statistical assumptions for the one-way ANOVA with repeated measures were mostly violated, the Friedman test as the non-parametric alternative was used to compare the pretest, posttest and delayed posttest performances of the 47 delayed posttest participants. Table 4 and 5 display the results: The output above suggests that there is an overall statistically significant difference between the mean ranks in the three tests under investigation (χ2 (2) = 67.640, p = 0.000). As the next step, the post hoc tests on different combinations were taken into consideration. So, the following combinations were compared: posttest -pretest scores / delayed posttest -posttest scores / delayed posttest -pretest scores.
The results are presented in the table that follows: The findings above display a significant difference between the pretest/posttest and pretest/delayed posttest scores (p < 0.01), which corroborates the previously-evidenced efficacy of the instruction. Moreover, we see that there is not a significant difference between the posttest/delayed posttest scores (p > 0.05), which suggests that the gains were retained.

Results in terms of pretest-posttest comparisons
As the normality tests showed normally-distributed response times for both groups (p= .200), the independentsamples t-test was employed for comparison, the results of which are given in Table 6. As Table 6 shows, the t-test did not find a significant difference between the pre and posttest item response time differences (p > 0.05). This suggests that the instruction did not make the experimental group quicker to respond to the implied meanings.

Results in terms of the delayed posttest perspective
To add this additional perspective into the findings, after the normality tests giving normally-distributed scores in all the tests (p= .200, .070 and .200 for the pretest, posttest and delayed posttest respectively), the one-way ANOVA with repeated measures was used as the necessary assumptions were proved. Mauchly's Test indicated that the assumption of sphericity had not been violated (p = .220), which is important for the repeated measures ANOVA in that it would not yield erroneous results. After that, "Tests of Within-Subjects Effects" suggested a statistically significant difference between the response-time scores in the three tests under investigation (p = .004). Table 7 shows in terms of which specific test-pairs the differences occurred:  Table 7 displays that, within the cluster in the experimental group who voluntarily took the delayed posttest, there is a significant difference between the pretest/posttest and pretest/delayed posttest response-time scores (p < 0.05), which corroborates the efficacy of the instruction in making one significantly quicker to respond to implicatures. Moreover, there is not a significant difference between the posttest/delayed posttest response-time scores (p > 0.05), which suggests that the speed gains were retained.
At this point, it is worth noting that those 47 participants from the whole experimental group might be viewed as some "strong" students who willingly took the delayed posttest. Therefore, it might be found foreseeable that those 47 trainees would naturally get faster between the tests and retain their gains. In such a case, the following fact could also be worth noting: While the experimental group as a whole had proved already faster than the control group in the pretest (t-test p= .014), those particular 47 were not found significantly different in speed from the control group participants at all (t-test p= .066). Given this perspective, it could be appropriate to add here that while those 47 participants did not differ in speed from the control group participants in the pretest, the former did significantly better in the posttest (t-test p= .019). Furthermore, with their delayed posttest response-time scores, the former outperformed the latter's both pretest and posttest scores (t-test p= .000 and p= .009 respectively).
In light of these results, the treatment turned out at least promising to make one significantly quicker to cognitively process implied meanings.

Discussion
This experimental study was conceived to test the effects of a film-based instructional kit devised to facilitate the comprehension of implied meanings. In this regard, a significant difference was found between the pre and posttest score differences of the experimental and control group participants in favor of the former, which meant a progress of almost 15% superiority in proportion to the whole test. As for the results regarding the item subsets, the experimental group made a progress in seven subsets at varying extents of superiority. Besides these, in light of the delayed posttest nearly seven months after the posttest, a significant difference was found between the pretest/posttest and pretest/delayed posttest scores, which would confirm the efficacy of the instruction. Moreover, no significant difference emerged between the posttest/delayed posttest scores, which suggests that the gains from instruction were retained too.
With an overall look, the considerable success of the instruction based on direct explanation of the target features accords with the reported superiority of explicit approaches in teaching pragmatics (Jeon & Kaya, 2006;Taguchi, 2015;Takahashi, 2010). The success in question is in line with the reports of instructional studies specifically on implied meanings too (Blight, 2002;Bouton, 1994Bouton, , 1999Kubota, 1995). They confirm the central role of explicit metapragmatic explanation for pragmatic development within the scope of helping learners properly interpret implicatures in English.
When we look at the findings on an item-subset basis, the success of the instruction specifically in "Pope Questions", "Indirect Criticism", "(Verbal) Irony", "Indirect Refusals" and "Topic Change", which had previously been reported as formulaic in the literature (Bouton, 1994(Bouton, , 1999Roever, 2011), testifies to the inferences that the effectiveness of instruction rests upon the focus on formulaic implicatures as less formulaic forms prove resistant to instruction (Bouton, 1999). The novelty here about these implied meanings would come from the novel instruction program, which successfully managed to fit into the established pattern in the literature based on the following premise: the formulaicness of an implied meaning can beget its teachability. We see that the instruction was able to engender a significant performance increase in even "Indirect Refusals", which had been reported as relatively easy (Taguchi, 2005) and where the experimental group of this study had already put in a performance over 90% in the pretest. The instruction proved to have the potential to make even that performance better and improve the experimental group participants to the extent that they got significantly differentiated from the control group.
The instruction managed to engender positive performance changes also about the variations of "Indirect Advice" and "Indirect Requests" included in the study, which had not been labelled "formulaic" in the literature. When we consider this in light of the fact that the teachability of an implied meaning type could attest to its formulaicness, the present study's formulaicness conceptualization and teaching approach about "Indirect Advice" and "Indirect Requests" would be tenable. With a broader look, these findings turned out an appropriate response to Bouton's call. As the first scholar who experimentally investigated implicature comprehension in L2, Bouton (1992) highlighted the need to broaden our understanding of the different implied meaning types that exist and to learn which could be troublesome to learners and why. This is supported by Taguchi (2005) too, who specified that different implied meanings to be integrated into study designs could help us better understand and learn more about pragmatic comprehension in a target language.
In this study, including Disclosure as an implicature type was another attempt to properly respond to the abovementioned calls. Like Indirect Advice and Indirect Requests, Disclosure had not been explicitly labeled "formulaic" in the literature. In this regard, including them was a risk for the present study, but one worth taking as the intention was to respond to Bouton's (1994, p. 106) another call that we should be "alert to implicature types of which we are not fully aware with an eye to including them in instruction programs". Nevertheless, Disclosure was the type about which the treatment turned out the least influential. The effects of the treatment could be deemed even detrimental to the interpretation of disclosure situations. On the one hand, we could postulate that the results might have been more positive if there had been more test items on Disclosure and/or if the metapragmatic explanations had been combined with supplementary production practices (Taguchi, 2015). On the other hand, if we look at the situation within the framework of the aforementioned relationship between the formulaicness and teachability, we should firstly conclude that the formulaicness conceptualization brought in this study to Disclosure was based on erroneous assumptions, and it would be hard for disclosures to be considered formulaic in a sense compatible with instruction. In this regard, the present study revealed that Disclosure must be an implied meaning type among the less formulaic ones resistant to formal instruction (Bouton, 1999), like those that "should not be taught at all until the need arises when specific cases prove difficult" (Bouton, 1994, p. 105). Seen in a different perspective, these results about Disclosure are still a theoretical and pedagogical contribution to the field when we reconsider Bouton (1992Bouton ( , 1994) and Taguchi's (2005) calls that our understanding of different implied meanings and pragmatic comprehension should be broadened.
When it comes to how the instruction affected the comprehension speed of implicatures, the primary finding was the lack of a significant difference between the pre and posttest item-response time differences in the experimental and control group. This perspective suggests that the treatment did not make the participants significantly quicker to respond to implied meanings. This could be attributed largely to the fact that explicit instruction like in the present study may be effective in developing declarative pragmatic knowledge in a relatively short time, but the development of procedural pragmatic knowledge, thus "speed", takes longer time and requires abundant incidental processing practice available in ESL environments (Taguchi, 2015). Besides the shortcomings of the instruction, a plausible reason for the result in question would be the fact that the experimental group members felt the need to respond "slowly" as the posttest items amounted to nearly 30% of their final exam content. This assertion is supported by the results of the delayed posttest conducted with the volunteering 61% of the experimental group, which did point to "quickness in responses". The analyses within that cluster showed a significant difference between the pretest/posttest and pretest/delayed posttest response-time scores, which corroborates the efficacy of the treatment in making one also significantly "quicker" to respond to implied meanings. This finding came to light when the concern for grades was cleared away. Moreover, no significant difference was detected between the posttest/delayed posttest response-time scores, which suggests that the speed gains were retained as well.
At this point, it could be worth remarking that those 47 participants from the experimental group might be considered the "good" students there, who did not mind taking the delayed posttest after the end of everything about the whole experience. Therefore, one might find it fairly predictable that those 47 people would naturally get speedier and preserve their pragmatic gains too. In such a case, the following fact is worth noting too: While the experimental group as a whole (n=77) had proved already faster than the control group in the pretest, those particular 47 trainees were not found significantly different in speed from the control group participants at all. They were far from being a bunch whose mere existence was making the experimental group notably "quick". Given this perspective, it could be appropriate to add here also the fact that while those 47 participants did not differ in speed from the control group participants in the pretest, the former did significantly better in the posttest. Furthermore, with their delayed posttest response-time scores, the former outperformed the latter's both pretest and posttest scores. All this suggests that the treatment has the potential to quicken those who need it. In light of these results, the instructional kit could be deemed at least promising to make one also significantly quicker to respond to implied meanings.
To sum up, we can postulate that this interventional study, which examines the effect of a particular instructional treatment on acquisition of a targeted feature (Kasper, 1999), produced some remarkable results. The participants were found to have gotten more equipped about indirectly-conveyed meanings with an inspiring instruction program. With a broader look, the study took a step to bridge the gap voiced in Wyner and Cohen (2015, p. 542): "Few L2/FL teacher development courses provide practical techniques for teachers to integrate pragmatics instruction into their respective classrooms". In this regard, one should not overlook the fact that the results in question occurred at the "first" implementation of the program. Things could change for the better at subsequent implementations with the experiences learnt. For instance, about the implied meanings on which the treatment proved less influential, the number of the audiovisual examples could be increased and/or the explanatory notes could be revised. Apart from such details, it must be noted that the instruction at its initial step was able to draw teacher candidate comments saying that it can really provide people with a foundation for the pragmatic and intercultural communication aspects of the language and inspire language teachers to integrate the teaching philosophy and procedures in the study in their own teaching practice. The two-phased interviews that gathered such comments, the details on how the conversations proceeded, the template developed to analyze the interview data and some interviews characterizing the whole process will hopefully make another academic report.

Conclusion
To delineate the significance of the study, we should emphasize the fact that it is a pioneering one to devise and test the effects of a special instruction program as a tangible product to facilitate implied meanings comprehension. Its content can be exploited for both explicit teaching approaches and implicit ones (withheld explanation but input and practice opportunities for implicit understanding). With that aspect, the study gained "a material development" dimension. This is particularly important in light of the postulations that films are optimal to teach pragmatic strategies also as a jumping-off point for language use. Besides, the significance of this study should grow even bigger as the instruction was conducted in a foreign language context, where a learner's opportunities to come into contact with the target language are not plenty and instruction is considered especially necessary in developing pragmatic awareness. Moreover, the instruction addressed NNS teacher trainees, who are reported to be in a disadvantageous position when compared to NS teachers in many areas including pragmatics. Given the fact that teacher training is critical as it inevitably influences how instructional practices are used in the future, it is important that this study aimed to teach about a major area of pragmatics to prospective EFL teachers, who will be supposed to help their own students to have pragmatic competence too. Another point that would enhance the significance of the study is that it was conducted with participants with a relatively less studied L1 background, which was a response to the call in the literature that the range of L1 and target languages needs to be extended so that researchers and educators are better supported to evaluate to what extent findings from studies of a particular L1 or target language could be transferable to other language combinations.
In view of the limitations of this study and experiences learnt through its conduct, some recommendations can be made for further research. First, as this study measured pragmatic comprehension with a reading instrument while people mostly "see and hear" in communication, the data collection procedures in future studies could be designed based on a sufficient number of corpus-based conversations, readymade video extracts and/or fictionalized dramas to the purpose. Another recommendation for further research could be about the identification and integration of more implied meaning types into the designs so that we can add to our understanding of pragmatic interpretation and learn which ones could be troublesome to learners and why, which is an attempt made by this study with the integration of "indirect advice" and "indirect requests". What is more, the range of L1s and target languages in studies on pragmatic interpretation and instruction could be expanded to help assessing to what extent findings from studies of a particular L1 or target language may be valid regarding other language combinations. Besides all these, further research could be conducted also on how competent language learners are in terms of "producing" implicatures. This would provide a new perspective beyond the focus merely on comprehension/interpretation. As even one further step, one could investigate to what extent it is possible to teach learners so that they can employ implied meanings as a set in their productive potential when needed or possible. This would directly contribute to their general communicative competence. To that end, reconsidering the postulations that films would be an ideal medium for teaching pragmatics as a springboard for language use (Cohen, 2005), the efficiency of the film-based instruction could be tested.
Within the framework of enlightening us as to "whether understanding of one pragmatic area facilitates understanding of others" (Taguchi, 2015, p. 40), further research could investigate also if the promising instructional program developed in this study on formulaic implicatures would prove effective in the interpretation of non-formulaic implicatures too, which were considered in the literature to be more frequent when compared to formulaic ones.

Item 6:
Jack sees his classmate Jane in the faculty hallway. Jack: "Oh, Jane. I'm so glad I ran into you. I need your help!" Jane: "What's up?" Jack: "I have a paper due tomorrow, but I'm working tonight in the cafe. Can you type my paper?" • In "Friends"; Cheryl, Ross's beautiful girlfriend, asks him if he wants to come into her house! • Ross gives an answer about "homo-erectus hunting with his wooden tool". . .

Cheryl
Ross Homo-erectus with his wooden tool As the figure suggests, the context of the exemplary situation was firstly introduced. The characters were shown with their names in a screenshot, which was always taken from the scene to come so that the participants could gain familiarity to it. Also, any element likely to be new was pre-taught with visual ads whenever possible (like about "homo-erectus" above). The slide above shows the first frame of the first scene used to exemplify the situations where Pope Questions can occur. With a video-editing software, every scene was cut from the episode or movie including it. The beginnings and ends of the scenes were determined considering how much the participants would need for a good grasp of the context.

European Journal of Educational Research 605
After the participants saw the starting frame, the scene was played as a linked video so that everybody could watch it with the help of an amplifier. When the participants demanded it for any comprehension problem, the scene was played the second or third time.
• Cheryl: Um, would you like to come in?
• Ross: Did homo-erectus hunt with wooden tools?
• As we know, homo-erectus did not have automatic guns. He, of course, hunted with wooden tools. SO, asking a new question with the clear answer "YES", Ross answers Cheryl's question with an indirect but humorous "YES". As seen, after a slide introducing the context of the scene to come and then watching the scene itself, the participants saw the conversation transcribed. The primary aim was to clear up any persisting miscomprehension. The figure gives the impression that everything in the slide was shown in one go, which was not the case. Firstly, the turns of the conversation appeared so that the participants could produce their preliminary ideas on the identification of the implicature examined. Then along came the explanation of how the literal meaning did not hold and how the implicature was detected. If sought, any further clarification was given. Later, the explanation to identify what was actually implied in the message was provided. These steps were followed just as described here for two or three additional examples for all the implied meaning types. Illustration of the learners' experiences with the implied meaning and identification of any similar implicatures in Turkish came as the concluding steps.