Appendixes



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 185
Systems for State Science Assessment Appendixes

OCR for page 185
Systems for State Science Assessment This page intentionally left blank.

OCR for page 185
Systems for State Science Assessment A Practical Tips In the December 1, 2004, issue of Education Week, writer Lynn Olson described the No Child Left Behind Act (NCLB) as a bounty for test publishers. Citing a General Accounting Office study indicating that anywhere from $1.9 to $5.3 billion will be spent on test development and administration by 2010, Olson details the rapid growth in both the number of test publishers and in the number of contracts being let by states to testing companies and their subcontractors. In 2002, Matt Gandal (Achieve, 2002) indicated that more than 200 new tests in required subjects at appropriate grade levels would have to be developed by the testing industry just to meet NCLB requirements, and they would have to do it in a window of approximately five years. Partnerships between states and test publishers are key to getting the task done. In an effort to stimulate thinking about a number of important issues, members of the assessment directors and state science supervisors working groups that collaborated with the committee asked us to outline some of these issues in our report. The information contained in this appendix is drawn from the experiences of members of the committee, the working groups, and the design teams as well as from conversations that took place in June 2004 at a meeting sponsored by the U.S. Department of Education in Boston, at which test developers and state testing directors had a chance to discuss issues of mutual interest related to science assessment. In addition, we drew from the design team report, “Building Partnerships,” in which the authors discuss what test developers need from states to build quality assessments. We encourage readers to consider these ideas, and we hope to stimulate thinking but make no claims that the issues we raise are an exhaustive list or that other approaches to working with contractors might not be successful.

OCR for page 185
Systems for State Science Assessment We would like to see more systematic attention paid to helping states and testing companies work together effectively, and we encourage such organizations as the Council of Chief State School Officers to organize regular opportunities for states to discuss these types of issues and to share perspectives with each other and with test publishers, state contracting officers, and representatives from state technical advisory committees. WRITING STATE ASSESSMENT REQUESTS FOR PROPOSALS The request for proposals (RFP) is the way that states communicate to test publishers what they expect in the design of their state science assessments. Mis-communication at this stage can lead to costly mistakes in the testing process. Below are a series of questions that states should consider before letting a contract via RFP. General One approach is to call for a prime contractor who will be responsible for the performance of any subcontractors. This approach has the advantage of allowing state staff to deal with one contractor, who in turn handles any problems, issues, and communication with subcontractors. This approach may be essential if the state does not have enough staff or the capacity to manage multiple vendors. It is particularly efficient in eliminating any issues among vendors about hand-offs at times of transitions—whether the first vendor met the timeline, whether the material was in final form, etc., as the prime contractor is responsible for meeting the overall deadlines and quality requirements. A prime contractor approach may also work with more than one vendor if the testing program is divided into several stand-alone projects in which each vendor has full responsibility for an entire section of a testing program, for example, an entire grade level or subject. This assumes that students receive multiple score reports for each tested subject. If there are sufficient state staff and capacity to manage multiple vendors, there may be several advantages: vendors with particular specialties would bid on the part of the program for which they are uniquely qualified, potentially offering a higher quality proposal. The potential cost competition among vendors bidding on only one piece of a larger program may result in a lower overall price for the assessment. Small vendors or vendors with innovative approaches that may otherwise not be part of a prime contractor’s package may bring interesting ideas and cost savings to the project. Finally, state staff would need to communicate directly with each vendor, thus reducing the potential for miscommunication of directions and decisions. In addition, state staff would potentially have access to multiple teams of psychometricians and other test development staff to provide a variety of potential solutions to problems and issues that may arise. The following are some of the important issues that states should consider when entering into contractual agreements for the development of assessments.

OCR for page 185
Systems for State Science Assessment Questions Eligible Offerers What types of entities are allowed to bid? What is the basic product/service to be provided? Is the assessment to be paper-and-pencil or online? How many contracts will be awarded? For what products/services? Is the state calling for a prime contractor with subcontractors under its direction, or will individual contractors be permitted to bid on a piece of the assessment? Contract Period When will the work start? When does the contractor assume authority for the administration of tests? What is the total length of the contract? Budget What is the amount available or allocated, if there is a certain sum? Will the contract run beyond the state annual budget cycle, and, if so, what are the expectations for continuation? Authority Who has sign-off authority in the state department of education? Who will be the primary contact in the state department of education? Applicable Laws, Rules, and Guidelines What are the controlling state/federal laws or rules governing the testing program in the state? What are the controlling state/federal laws or rules governing test security and student confidentiality? Ownership of Test Items Who owns test items? Who is responsible for obtaining copyright permission for the state to use copyrighted material or art?

OCR for page 185
Systems for State Science Assessment If any copyrighted material or art will be publicly released, who is responsible for obtaining the necessary permissions? Test Development Technical and quality standards to be met: What is the state’s test development process? Describe steps: who is involved in each step, who approves each step, what are the time frames for each step, and who sets performance standards and how? Specify standards for technical quality: Will the contractor adhere to the standards developed by the American Psychological Association, American Educational Research Association, and National Council on Measurement in Education? If not, which standards will be used? Specification of Products Grade levels and subjects must be specified. Specify numbers of students to be tested annually by grade level. Timeline What is the timeline for test development? When are first live tests to be administered? Background and Contextual Information The state should provide as much information as possible about expectations for the basic content of the assessment. If curriculum standards are very general or banded in multiple grade levels, a considerable amount of work will need to be done to make decisions about the content of the assessment, as the curriculum standards may be too general to assess directly. Questions to be answered include: Will the assessment cover a single grade’s content or be a cumulative assessment of multiple grades, and what balance of content versus process is desired? States should think through the issue of the number of standards to be assessed versus the length of the test—a general guideline is 3–4 test items for each objective tested. If there are 200 standards to be tested, this would mean an 800-item test. The question of how much the test is expected to drive instruction should also receive consideration. Bidders need to plan for additional training, telephone assistance, and other resources if, for example, a state desires a majority of the assessment to be composed of performance tasks in which physical items need to be supplied and used in the assessment in a situation in which the majority of teachers were not teaching in this manner on a regular basis.

OCR for page 185
Systems for State Science Assessment Questions Purpose of the Assessment Is it high stakes for students? If so, what are the consequences? Are multiple administrations expected for each student? If so, over what life span for each student? What type of historical files will be required or are maintained for each student? Will the contractor be expected to match individual student files over multiple years or across multiple districts? Are the tests high stakes for districts or campuses? If so, what are the consequences? Standards Being Assessed What content standards are being assessed? (These should be attached or links provided.) If standards are long or not conducive to direct assessment, have assessment objectives been established and specific test-eligible content determined? If not, what responsibilities would the contractor have in the process to make this determination? Describe special issues that may arise in the development of objectives. List state groups that need to be involved and the expected numbers of reviewers, responders to drafts, etc. What is the expected length of time for objectives to be developed and finalized? Interface with the Current Assessment Program Is this a stand-alone program, or will it be another assessment in an ongoing assessment program? If the latter, what requirements are there to produce an assessment that “looks like” the existing assessments? What requirements are there to produce the same type of score reports as the existing program? Is there an expectation to have a separate set of score reports, or will score reports be integrated with other subject area score reports? Is this program to be integrated into the existing program? If not integrated, what is expected in terms of coordination with the existing contractor?

OCR for page 185
Systems for State Science Assessment Is this a contract that will “take over” from an existing contractor? If so, what are the timelines? Will the existing contractor be expected to transfer files? What other transition or phase-out arrangements are planned? Any Anticipated Changes What are the planned changes to the program over the life of the contract? What are the potential changes to the program once it is started (e.g., state board or legislative changes that are on the horizon that may affect the contractor’s work plan)? What are the state growth rates in terms of numbers of students, additions of new educational entities (e.g., charter schools), and other infrastructural issues that the contractor will be required to address? ADDITIONAL DETAILS Questions Test Development What is the basic test design desired (e.g., 1-parameter, 3-parameter)? Are tests expected to be vertically linked or aligned between grade levels (elementary, middle, and high school)? Are scores on one test expected to be correlated to scores on another? What is the anticipated blueprint and length of the test? Is a custom test or an augmented norm-referenced test desired? Note: If an augmented norm-referenced test is planned, an alignment study of the “base” test items should be required. The RFP should require a test design, including how many items will be provided for each curriculum standard and the anticipated blueprint. The contractor’s experience in developing augmented assessments and a sample design of how it will be accomplished for the state should be required. Is the test expected to be released to the public? How often? The complete test or just a sample of the items? Are answer keys required to be provided with the released tests? Must test items be coded as to which curriculum objective is being tested on the released test? Is any other information expected to be made available? What item types are expected? Performance items, multiple choice, constructed response? Who writes the items? Who reviews them? Who trains the item writers? What are the item specifications? How will they be developed? How will universal design or alternative assessments be incorporated?

OCR for page 185
Systems for State Science Assessment Is a field test expected? What sample size is expected? Will it be separate or embedded? Who is required to solicit district or student participation? If performance items are expected: Who provides materials, the district or the contractor? How will the contractor know how many and where to send the materials? Is the contractor responsible to reship by the test day if the materials don’t arrive? Are the items restocked each year? Note: States should consider that the more open-ended items and performance items that are included, the more costly the test. For item review procedures prior to live use: Does the contractor or the state call meetings? What is the purpose of the meetings? How many times is each group expected to meet and for what purpose? Who selects the participants? What are the parameters of selection? Who is expected to participate—teachers, university professors, parents, members of the public, etc? How many people are expected to participate? How long are the meetings expected to be, and where are they expected to be held? What is the frequency of the meetings? What stipulations are there for linguistic or cognitive demands in the items (e.g., English and Spanish, universal design, “plain language”)? Is an item bank expected to be made available? If so, what are the specifications? Must it be query-able? By what parameters should the items be coded (e.g., curriculum objective, field test statistics, etc.)? Quality of test instruments: What is the expected standard for graphics, print quality, paper weight, ancillary materials and equipment, etc.? What are the expectations for “sealing” sections of the test booklets? What oversight or review is expected by state staff or outside reviewers prior to final production? What are the quality control procedures, checks for test production, accuracy, etc.? Who has the final sign-off on page proofs and test booklet production? What is the anticipated timing or timeline for these critical review tasks? Quality of scaling or equating: describe the procedures for conducting studies for scaling or equating custom tests, and describe the plan to ensure accuracy of scaling or equating of augmented tests. What review or oversight is expected on these procedures by state staff or outside experts? What is the anticipated timeline for these studies?

OCR for page 185
Systems for State Science Assessment Test Administration Timing of the test: When is the test to be given? If dates are not yet decided, what is the process for deciding and when will the decision be made? Is the test a single day or a testing window? Is the test to be secure or not? Who must be tested? How many students are anticipated on any single day? Are there any student exemptions? How will accommodations be handled? Is the contractor expected to provide customer service phones on test day? What types of questions will need to be prepared for? What sampling procedures (if applicable) will be used? How will shipping of the proper supply of materials or online connections be handled? How will the contractor obtain enrollment data? Who is responsible at the local level? What are the administration instructions, the procedures for security of materials, the procedures for checking quantity and obtaining additional materials, if needed? What are the procedures for problem resolution (paper-and-pencil and online testing have different issues)? Training at local level: Who is eligible for testing, data collection procedures, standardization of administration, allowable accommodations? How is the return of materials handled? What are the procedures for breaches of test security? Test security and confidentiality of student information include: Current procedures in place at the local level. Explanation of the procedures the vendor will follow, including confidentiality procedures, secure storage requirements, numbering and sealing test booklets, the disposition of answer documents and test booklets at the end of administration, and records storage over multiple years. Scoring and Reporting What is the expectation for standard setting? Identify the procedure, if it has been decided, or have bidders provide a plan for recommendation. Consider overall data collection needs: coordination with existing state collections, coordination with any other vendors, and coordination with other state assessments. What scores or data are to be reported and to whom? What data elements (e.g., demographic information) need to be collected with individual assessment responses? What form must score reports take: paper, online, or a combination?

OCR for page 185
Systems for State Science Assessment Do score reports need to be joined to other assessments (either current or historical) or vendors (or both)? What are the deadlines for scores to be returned? What are the form and content of data files that are expected to be provided to the state? What are the procedures for updating data or error correction? Appeals: Can district personnel or parents ask for rescoring? If so, what is the process for maintaining confidentiality, and who pays? Scoring procedures, especially open-ended scoring: When will the standards be set? How, and by whom? Will they use the whole test or subtest scores? What rubrics will be used? What training is planned? Will it be done online, scanned, or on paper? What are the procedures to ensure interrater reliability? What are the quality control procedures, including internal tracking procedures, to ensure that the correct score is transferred to the correct student’s score report? Quality control procedures should include: Delineation of who has sign-off authority for equating and production of score reports. Ensuring correct scoring. Ensuring correct output to reports. Ensuring accurate equating. How the score reports will be shipped and delivered to the district, the campus, and the student. Customer service. What public relations arrangements have been made? Contractor Issues Stipulate as much detail as possible or consider a two-stage process with a request for information first. Provide ways for bidders to acquire more information or to clarify areas of potential misunderstanding; have a vendor conference, allow for published Q&A documents, and give plenty of time between clarification and due dates. Articulate the proposal review process: specify the timelines for review and selection and who has ultimate authority to enter into a contract. Identify all costs for which the contractor will be responsible, including travel for committees; when the contractor staff must be available on-site; etc.

OCR for page 185
Systems for State Science Assessment Determine the cost basis expected: per student or based on the activity deliverable? Clearly specify activities in detail. A cost for a fixed set of services or products may provide the best cost comparison among bidders. However, if the state has not specified every detail of the expected services, then a set of costs by bidders detailing what would be provided at multiple cost levels may be more helpful. Identify areas in which the contractor may (or must) offer services to districts for a fee, if any. Identify issues of potential marketing conflict or prohibitions. Specify possible financial penalties or incentives for the vendor for missing or making deadlines, etc. Identify how changes of plan or modifications of the contract can occur: who has authority, whether notice is required, etc. Identify any current amounts allowed or expected for specifications that may be helpful, for example, current square footage of warehouse space, number of toll-free customer service phone lines, meeting space requirements, location, and amount of office space, if required. Other Questions Public relations: describe any documents or materials expected to be developed to explain the testing program to various audiences, such as parents, the media, legislators, etc. Describe the media expected and the time frame and quantity anticipated. Legal defensibility: Describe whether the contractor is expected to assist with legal defense under the current contract or if this possibility would mean an addendum to the contract. If the former, describe the potential types of assistance that would be expected, for example, explaining how the test development process met applicable legal and psychometric standards. Required reports: describe any and all reports expected, the audience for each, the interval expected for the reports, and the medium and quantity expected (technical digest, program activities, status reports, etc.) Committees: Technical advisory committee A technical advisory committee (TAC) is needed if it is a high-stakes testing program. It can be convened for other purposes, for example, review of tests against psychometric standards, item development or item review decision procedures, research and options for a variety of issues, and support for decisions that are necessary for sound

OCR for page 185
Systems for State Science Assessment testing but not popular or accepted by policy makers or the public. Describe how many members, whether the committee is expected to be in-state or national, whether the vendor is expected to make arrangements for meetings, etc. Ad hoc technical committees: describe if any are expected to be needed for such areas as hand scoring, evaluation of the program as a whole, advice on laboratory equipment use in the testing process, accommodations for students with disabilities or English language learners, etc. Management: What are the expectations for communication between the state staff and the contractor? How often are face-to-face meetings expected? (Once a month is typical but could be more or less frequent depending on the project complexity.) What staff are expected to be available? Are there other meetings or events for which the contractor staff is expected to be available (legislative committees, board meetings, teacher or administrator association meetings, testing conferences, training seminars, etc.)? Specify a timeline of project deliverables: anticipated events, completions, due dates—or require bidders to propose one. Require regular reports (and specify how often) against the project deliverables. Require regular reports on problems or issues to be resolved. Require names and résumés of the key staff to be dedicated to the project; require staff approval for changes. GETTING THE BEST FROM TEST CONTRACTORS Test contractors are key to getting a state testing program into operation. Even states that plan to design their own unique assessment systems often work with test contractors or consultants who are responsible for many aspects of test design. Here we provide suggestions for making the relationship work. Try to set up a collegial, not adversarial, relationship with the publisher. Do things to engender communication and cooperation. Staffing is very important. There need to be project managers on both sides (at the state and at the contractor) who can easily reach each other to make the day-to-day decisions. Both parties should be experienced and knowledgeable about testing and measurement. Specify a framework for the decisions that can be made at this management level and those that cannot. Set up some kind of decision-making hierarchy so that tough decisions can be taken to a higher level. This is

OCR for page 185
Systems for State Science Assessment important for both sides (state and contractor). The state project manager should be clear about what sorts of decisions he or she expects to be involved in. The state manager should not micromanage—let the publisher do its job—but should be specific about the things the state wants to weigh in on. Communication and cooperation are key. Make sure the project managers on both sides communicate on a regular basis. Set up opportunities for staff (at all levels) on each side to meet with each other and talk about the test. Stress the teamwork aspect of the project. It often helps to familiarize the contractor with the state—its priorities, students, and teachers—to humanize the process for them. Do whatever it takes to remind them that this is about children and learning, not the bottom line. (Some states told us that they invite representatives from the publisher to visit classrooms, meet teachers, and talk with students and parents.) The state project manager should visit the test publisher’s office regularly (2–3 times a year) to check on affairs, to see firsthand how things are managed, to meet staff, and to emphasize their interest in the project. It was suggested that states arrange for a kickoff meeting when the contract is assigned, two regular meetings, as well as a kickoff meeting each year, and a postadministration meeting. Review the contract carefully and have everything specified in detail in writing. There is no detail that is too small to specify. (Think about the contract as if you were spending your own money; study it in the same way you would study a lease or a home sale contract). If you expect a publisher to do something, put it in writing. For instance, if you expect to have a postadministration report and want it in writing, specify it as well as the topics to be covered, the general length of the report, and the deadline. Some states find it useful to build in time for the project manager to review and comment on first drafts of the report before it goes to the higher levels. Make sure that the appropriate subject matter experts are working on the test. You have a right to know about the staff assigned to your project, meet them, review their credentials, etc. Several states mentioned that it is imprudent to accept a nonscience expert, even if only temporarily, unless the length of temporary replacement is specified. Review the proposed staffing plan very carefully. Look at who is on the project and what their credentials are and pay close attention to the time allocations. If they have put a well-known person in the staffing plan (e.g., to do the equating), make sure they’ve allocated enough time for the person to actually do the work and not just delegate it. Try to negotiate the right to have final approval of all staff working on a project. This is tough to get. Most of the time the best you can get is to be

OCR for page 185
Systems for State Science Assessment informed about the staffing and their credentials and to be informed in advance of any changes to be made. However, most publishers will agree to inform you in advance of a change to be made in the project manager. Sometimes they allow the state to comment and offer suggestions, although they won’t always take them. There are a variety of alignment procedures; make sure they use one. Ask them how they plan to evaluate the alignment between your science standards and the test items. Get specifics: What strategy will they use, who will be involved, how will they analyze the results? There should be some alignment sessions; ask to observe or participate. If it makes sense, involve teachers from the state in the process. Insist that they use teachers in as many ways as possible and appropriate. Teachers can be involved in alignment studies, they can help write or review items, they can score constructed responses. This is a tremendous enrichment opportunity for teachers (but be sure that you pay teachers or give them administrative leave in exchange for their help). If the contractor is out of state, consider asking them to maintain office sites in your state, so they can easily run item writing, item reviewing, and scoring sessions in the state. (Having teachers write and review items can be problematic because of test security issues, but there are ways to do it.) Find out exactly how they plan to handle item development. How many items will they use from their existing item banks? How many do they plan to write? Who are their item writers? Who are the item reviewers? What is the acceptance ratio for items? If it is too high (e.g., they accept most items that are written), they may not be reviewing the items carefully enough. What are the criteria for determining if an item is acceptable or not? If they’re using external item writers, specify in the contract to receive a list. This is important if you decide to change contractors. Using the same item writers over time provides continuity to the testing program; and you want to be sure that the contractor won’t claim the list of item writers is proprietary. Find out exactly how they plan to handle setting achievement standards (basic, proficient, advanced): What method will they use, who will participate, how it will be handled? Again, insist that they involve teachers to the extent possible and feasible. Insist on observing the process. One of the key statistics that comes out of state testing programs is the percentage of students who are proficient. The integrity of this statistic depends on how the process is handled. Equating seems to be the area in which problems occur with state tests (e.g., New York’s equating error). Understanding equating procedures requires specialized knowledge. Pay careful attention to the proposed plans and solicit outside expert opinion about the plans.

OCR for page 185
Systems for State Science Assessment Invest in a technical advisory committee (TAC). A TAC is absolutely essential. This should be a group of people to advise the state about the more technical measurement issues associated with a testing program. Specify in the contract that the publisher is required to participate in TAC meetings. Some states require the publisher to organize the meetings, develop the agenda, hold the meetings, and prepare the minutes. This can save state personnel a great deal of time, but be sure the meetings serve the state’s needs and answer its questions. Make sure that neither the state nor the contractor holds back important information at TAC meetings, information that might forewarn about potential errors. Foster the teamwork aspect of the project and make sure everyone sees the TAC meetings as a way to improve the testing program, not a way to find fault with the contractor. There are crucial times during test development when states need to pay close attention to the contractors’ work, i.e., during the initial stages of getting a program up and running. TAC members or other outside experts should be called in to help if necessary in order to provide close oversight of what the contractor is doing. PERSPECTIVES FROM TEST PUBLISHERS The following material was drawn from presentations made at a U.S. Department of Education–sponsored meeting on science assessment for NCLB that was held in Boston in June 2004. At that meeting, test publishers agreed that partnerships with states are key to developing an effective system. They described, from their perspectives, things that states could do to help test publishers do an effective job. We list these below. States should clearly articulate, in advance of developing assessment systems, what type of data they want the system to generate so that the assessments can be designed to meet the goals and provide the needed results. States should begin the assessment development process by describing what reports the assessment system needs to generate once it is in place. States should take care to make their RFPs clear and precise. At a minimum, in their RFPs, states should: Be very explicit as to what content standards need to be assessed, including which grade levels should be tested. Describe what types of items should be in the assessment and how many of each type of item is desired. Stipulate whether or not the assessment should include the use of manipulatives. Tell the test developers if the state wants them to do validity studies. Define who will develop items and who will train the item developers.

OCR for page 185
Systems for State Science Assessment Describe what the needs are for data from the assessment system, including how the data will be reported and to whom. If some types of data are not needed, then that should be included in the RFP as well. Specify the level of cognitive demand and the dimensions of performance that are to be assessed. Articulate the minimum needs and where there is room for fresh ideas or creative approaches. States should keep all bidders informed during the submission process. Some strategies that have worked include: Hosting a forum at which there are ongoing opportunities to clarify and ask questions about the RFP. Establishing a process by which prospective bidders can access all of the questions that are asked by other bidders and that are answered by state assessment officials. Avoiding making last-minute changes to the RFP after it has been released, but if changes need to be made, do not make them close to the submission deadline and make sure all prospective bidders have access to them. States should develop a realistic timetable for the RFP and the decision-making process. Make sure there is enough time between when the RFP is released and when the proposals need to be submitted. Leave sufficient time between the end of the question-and-answer period and the final submission deadline. Do not change the length of the proposal review period or final vendor selection deadline. This makes it difficult for the test development companies to plan and staff accordingly. States should consider using a two-stage process in deciding on a contractor. If a state has a lot of uncertainty about its assessment system, then it should consider releasing first a request for information that can later help it to shape an RFP. States should consider awarding a small contract to a developer to help them define components of their RFP or assessment system that are new.