What does outcome measures mean




















Requiring a primary outcome measure to be set a priori prevents such cherry-picking. The larger the number of secondary outcomes, the higher the likely false-positive error rate. This is the reason why, ideally, investigators and readers should pay most attention to the primary outcome measure in the study results and why secondary outcomes should be viewed with caution.

We must decide a priori which one of these will be the primary outcome measure. We cannot decide after seeing the results which to project as the primary outcome, because that might well have been a chance finding. A clinical trial that is conducted should answer the research question that was set; for example, whether a particular treatment is effective for a particular purpose.

If the sample size is small and the study fails to show that the treatment is effective, then one of 2 explanations is possible:. Studies that fail because of inadequate sample size are unethical and wasteful see sidebar. Therefore, investigators need to a priori estimate the minimum sample size required to answer a particular research question.

This value can be calculated. However, the estimated sample size will vary depending on what the research question is. As already discussed, every clinical trial will have a number of efficacy and safety outcome measures; therefore, every clinical trial will also have a number of research questions, related to the many outcome measures. The necessary sample size to adequately power each research question would result in the silly situation of requiring many different sample sizes for the same study.

The investigator cannot solve the problem by taking the largest value for sample size, because the really important questions, such as those related to efficacy of the experimental treatment, may be answered through a much smaller sample.

To cut a long story short, the investigator chooses 1 safety or efficacy question that best justifies the trial, and he sets this as the primary outcome measure. The sample size is then estimated for this outcome. The study will then be adequately powered for the primary outcome but not necessarily for the secondary outcomes detailed in the plan of analysis.

Thus, statistical testing of secondary outcomes may yield false-negative results. Here are 2 examples. An investigator may define treatment efficacy as HDRS scores falling by at least 3 points more in the experimental antidepressant group as compared with the placebo group at the 8-week treatment endpoint in an intent-to-treat analysis. This would be an adequately powered trial.

Original research articles that do not specify a primary outcome have been published and continue to be published. What should a reader make of the results of these studies? In such articles, readers should consider that there is a higher than average likelihood that the article emphasizes results that support the objectives of the study, underplaying or omitting the inconvenient findings.

As stated earlier, what the primary outcome was can be ascertained from the clinical trial registry if the article describes the results of a clinical trial and if the clinical trial was registered in the country in which it was conducted, or elsewhere. This information can be identified through a simple Internet search.

Some studies are exploratory and have no primary outcome. Such studies are at increased risk of false-positive errors. Data mining and data dredging studies are particular examples of studies at high risk of false-positive findings. Primary and secondary outcomes are specified a priori in clinical trial protocols. Sometimes, investigators test hypotheses that are set a posteriori, that is, after discovering a pattern in the findings. The results of such hypothesis testing must be viewed with caution because the pattern may represent a chance finding and may not exist in other sets of data.

Occasionally, a posteriori hypotheses may represent genuine results that were serendipitously discovered. Some guidance is available for a posteriori testing of results in subgroups. In most contexts in psychiatry, there is no gold standard for the choice of primary outcome. The primary outcome is then selected by expert consensus or by following conventional practice.

After the study is completed and the data are analyzed, it may be found that, whereas the primary outcome for efficacy is not statistically significant, 1 or more secondary outcomes for efficacy are significant. This result can be interpreted in one of several ways:. There is no way of knowing for certain which interpretation is correct. Readers are reminded, for example, that lamotrigine separated from placebo on most secondary outcome measures but not on the primary outcome measure in the first trial in bipolar depression 9 ; later, however, a meta-analysis showed that the drug is effective for this indication.

It would be a travesty of research principles to regard secondary outcome measures as outcomes to fall back on in case the primary outcome measure fails to yield results that satisfy the investigators. Similar procedures have been developed to read blood smears for malaria. All these options have important consequences on the trial logistics and cost, so careful consideration needs to be given to them when designing the trial and selecting p.

Issues concerning laboratory tests of relevance to diagnosis in field trials are outlined in Chapter Some trials use lay workers fieldworkers to measure a study outcome. These types of outcomes are usually captured in questionnaires or study forms.

Interviewing techniques and questionnaire design are discussed in Chapter Fieldworkers may also measure a clinical indicator such as the body temperature or respiratory rate. Because of the high cost of using physicians, in many trials, lay workers or paramedical workers are trained to assess clinical signs and symptoms. When using lay workers or professional fieldworkers, such as nutritionists, auxiliary nurses, or nurse technicians, it is essential to train them and standardize the methods they use, in order to assure uniform implementation of these procedures in the field throughout the study, with good supervision and QC procedures.

In some trials, such as in phase IV trials, existing surveillance systems may be used to define a study outcome. These secondary data sources, in which trial outcomes are not measured directly by study staff, will have the limitations intrinsic to the quality of the existing surveillance system. Examples of such study outcomes are post-marketing passive surveillance of vaccine or drug-related SAEs, such as hospitalizations of any type, after the introduction of the intervention into general use.

They could also be used to evaluate the efficacy of a new vaccine or intervention on an important outcome which, for reasons of cost or ethics, could not be measured in a phase III trial such as the impact of a new vaccine on mortality.

All study outcomes to be used in a clinical trial need to be properly standardized. These standardization exercises could be done with real patients or mock subjects who may be trained actors. Standardization of this sort is not easy; it requires resources, time, and, in many cases, patients or volunteers willing to be examined by multiple persons. Ideally, the same set of samples, films, blood smears, subjects, or videos would be evaluated again by the same individual in a random order, under code, to allow the calculation of intra-observer reproducibility.

All these procedures need to be carefully described in operating manuals and recorded, so they can be reviewed by investigators, p. In studies that last for several years, it is important to re-standardize observers every 6 to 12 months or if any observer needs to be replaced, to assure that the quality of the study is maintained.

An important component of an outcome definition is the description of the inclusion and exclusion criteria for the subjects to be evaluated in the trial. Ideally, the trial results should be able to be generalized to the whole population in which the intervention will be used. Under ideal circumstances, nobody should be excluded from the trial.

However, for ethical, logistic, or analytical reasons, most trials establish stringent inclusion and exclusion criteria to exclude certain persons from participation. These criteria could be established on the basis of factors such as age, gender, literacy, being healthy or not, not affected by chronic diseases different from the study outcome, or not affected by other conditions such as abnormal baseline laboratory results.

All these criteria need careful evaluation and discussion not only within the research team and the sponsor of the trial, but also with the ethics committees, the regulatory agencies overseeing the trial, and the communities in which the trial will take place, to assure that the trial results can be generalized to the intended population.

It is common practice to exclude persons who are very sick from a trial unless, of course, the trial intervention is directed at such persons. This is done because early deaths, or other SAEs in such persons, may occur independently of the trial intervention but may complicate interpretation of the effects of the intervention.

Signing a written informed consent form is now a standard inclusion criterion in most clinical trials see Chapter 6. However, such a requirement will select a subgroup of the population who accept to sign such a form and participate in the study, generating a potential selection bias. To measure how strong that bias may be, it is important to register all eligible subjects who were considered as potential participants in the trial, indicating the reasons for refusal for those who did not enter into the trial.

Preventing deaths or severe disabilities is one of the most important public health outcomes of any type of treatment or preventive intervention. It is the most important outcome in driving disease control policies and the introduction of new interventions or treatments into the population, once they have been found to be safe and effective. These types of outcomes have the heaviest weight in terms of disability-adjusted life-years DALYs , when undertaking cost-effectiveness analyses of new drugs or interventions see Chapter Therefore, trials designed to evaluate these outcomes are very important.

But, for many reasons, they may be difficult and costly to conduct, and, in many cases, they may not be feasible or ethical to do. Counting deaths in the conduct of a trial is a very sensitive issue, particularly in developing countries with poor health systems.

It may create moral issues or generate political tension that may stop the trial. Therefore, few trials are done with these important outcomes, despite their major importance. However, those trials that are done with this endpoint and which demonstrate that an intervention significantly reduces mortality are most likely to influence a policy decision on a more widespread introduction of the intervention.

In many LMICs, the quality of vital registration systems is poor or they are non-existent, precluding their use. Therefore, methods are needed to identify deaths, as well as to establish causes of death. A verbal autopsy is a structured interview, conducted with the relatives of the deceased person, with the intention to reconstruct the series of events that led to the death or severe complication or disability.

Standard verbal autopsy questionnaires have been developed World Health Organization, This interview is then analysed in a standardized way, either by physicians or using a computer algorithm, to classify the likely cause of the death, following a predefined set of criteria Lopez et al.

The reliability of verbal autopsy methods varies according to the cause of death, as some causes of death may be confused because signs and symptoms in the illness leading up to death may be similar. The usefulness of verbal autopsies is also dependent on the culture of the population under surveillance.

It is essential to pilot-test the translated questionnaire to assure that appropriate local words are used to ascertain signs or symptoms of the causes of death.

In many populations, there could be a wide range of reasons why deaths may not be reported, and therefore special care should be taken to ensure that ascertainment is as complete as possible. This becomes crucial when the study outcome is death in the perinatal period, since an important proportion of live births that die in the minutes or hours after birth could be either missed or wrongly reported as stillbirths. In some trials, members of the study community may be hired as local informants to report any deaths.

Other techniques include enumerating all members in a community and checking for the absence of any of them in frequently conducted cross-sectional surveys. Special attention should be paid to households for which all members are absent during one of these follow-up surveys, because the death of an adult may lead to dissolution of a household or migration of household members. Enquiries should be made with neighbours in such circumstances.

Training and standardization of interviewers are essential. The frequency of surveillance will be a critical decision in designing trials with mortality outcomes, since a long recall period such as 1 year may miss deaths, particularly of children or infants; but each additional surveillance round will be expensive.

Non-clinical case definitions can also be used in trials such as quality of life in trials of the use of chemotherapy for advanced cancer, antibiotic use in children in settings where they are available without prescription, satisfaction of users of a health service, and economic outcomes costs which are discussed in Chapter They also may include outcomes that come directly from patients about how they feel or function in relation to a health condition and its therapy so called patient-reported outcomes , without interpretation by health care professionals or anyone else.

For these case definitions, p. Some trials may select outcome measures that are associated with the outcome of interest such as reported risky sexual behaviour, which are either easier to measure, cheaper, or more socially acceptable. Such measures, however, may be subject to invalidity and bias for example, misreporting, differential degrees of desirability bias between trial arms. A behaviour thought to be critical to reduce the disease of interest might be selected as a study outcome.

For example, in a study to investigate the effectiveness of a health education campaign to promote the use of latrines, where the ultimate objective was to reduce diarrhoeal disease, the frequency of use of latrines might be measured.

Sometimes, health-related behaviours may be measured by direct observation. Changes in knowledge or attitudes are sometimes an important initial step before a behaviour is changed, which, once changed, should reduce the risk of the disease of interest. Knowledge or attitudes can be assessed with reasonable reliability, using questionnaires or other interview methods, but observational studies may be required to determine if behavioural changes have actually occurred.

For example, in a study to investigate the effectiveness of a health education campaign to promote the use of latrines, it may be relatively straightforward to assess, after the campaign, whether individuals have a better knowledge of why using latrines is desirable, but observational studies, before and after the campaign, may be necessary to ascertain whether or not the frequency of use of latrines had actually changed, let alone whether behavioural change led to a reduction in the incidence of diarrhoea.

Similar issues arise with respect to the evaluation of a hand-washing intervention campaign. Further studies may then be needed to determine whether the changed behaviour has led to a reduction of diarrhoeal diseases. Some trials have the incidence of a self-reported behaviour as one of their outcomes. For example, in evaluating the effectiveness of sexual behaviour change interventions, it is not possible to observe sexual behaviours directly, so self-reported behaviours are frequently recorded.

But such measures are very open to desirability bias where the respondent reports the behaviour that they think the investigator would judge to be the desirable one. Furthermore, the desirability bias may be differential between the trial arms. Self-reported behaviours, though sometimes the only practical outcome for a trial, are potentially misleading and should be avoided, at least as the primary outcome measure in a trial, if at all possible. The purpose of interventions, based on vector control or environmental alteration, may be to reduce or interrupt transmission of the infectious agent of interest.

Generally, the p. For example, in trials in which insecticides are applied to reduce vector populations in order to reduce the transmission of some infectious agent, the first step would be to determine the impact of the intervention on the vector population. If the vector population is little affected, it may be reasonable to conclude that any impact on human disease is unlikely. However, if there is a reduction in vector population, it may be erroneous to conclude that the human disease load will also fall.

A further study to determine the impact on disease may be required. Similarly, if interventions are being evaluated that may reduce indoor air pollution as a measure against respiratory disease, it may be best to focus initial studies on the assessment of changes in pollution levels, before assessing the impact on respiratory diseases.

Usually, it will be more efficient to carry out trials to monitor the impact on disease only after there is evidence of an effect on the vector or on the agent against which the intervention is directed. In order to assess a change in transmission, any, or all, of several different outcomes may be used:. Any changes to these different outcomes will happen at different intervals after the intervention is in place, and may require studies over time to measure the overall study impact.

For instance, in an onchocerciasis control programme, the first evidence that an intensive larviciding of Simulium damnosum black fly breeding sites is having an effect may be a dramatic drop in fly-biting rates in the intervention area.

Over the next several years, there may be a steady fall in the intensity of microfilarial infections among those living in the endemic area, but only after some years might it be possible to detect evidence of a fall in the prevalence of infection, and later still an impact on blindness rates which is the major adverse health consequence of onchocercal infection.

An important outcome of all trials is to assess the safety of the intervention under evaluation for example, of a new drug or vaccine. Adverse events AEs are defined as any untoward clinical or laboratorial medical occurrence in a patient or clinical investigation subject, related or not to the use of an intervention in a trial.

They include patient hospitalization or prolongation of existing hospitalization, events that result in persistent or significant debilitation or incapacity, and congenital anomalies and birth defects.

Usually, two types of study outcomes are defined: 1 the active, prospective evaluation of a set of predefined potential AEs known or suspected to be associated with the type of drug, vaccine, or product under evaluation, and 2 recording all clinical or laboratory abnormalities, p.

For both types of safety outcomes, criteria must be developed to assess the severity, as well as the incidence of AEs associated with the drug, vaccine, or product under evaluation. Severity can be measured by the magnitude of a laboratory or clinical test abnormality, or by the subjective perception on how much the AE altered the function or quality of life of the individual.

For instance, a reaction at the site of injection of a vaccine could be graded as mild if only a colour change is noted with mild pain, without induration and without any restriction on the arm or leg movement; moderate if, in addition to colour change of the skin, induration is noted and there is some restriction of movement; and severe if the subject cries out or winces if the area is touched and the arm or leg cannot be moved without pain.

In many studies, a diary card may be provided to the study subject or, in case of children, to the mother or caretaker to record these reactions during a 7- or day period after the administration of a vaccine or during the drug therapy. To aid measuring an injection site reaction, a ruler may be provided to the subject. And to standardize the measurement of temperature, a digital thermometer may be provided as well. In addition to its severity, these reactions are usually classified as unrelated, unlikely to be related, or possibly related to the intervention under evaluation.

The criteria used for this classification may include proximity of the event to the administration of the intervention for instance, a rash developing within 20 minutes of an injection would most likely be classified as possibly related , the unusualness of the clinical event a disease which normally occurs in that age group or a complication expected to happen in the disease under study , or even the subjective interpretation of the investigator.

Whatever criteria are used should be stated. The incidences of AEs, graded by the severity and likelihood of being related to the interventional product, are later compared between the study group exposed to the intervention and the control group using placebo or an active comparator to assess statistically if AEs of different kinds were or were not associated with the drug, vaccine, or product.

All safety measurements need careful definition in the study protocol, study forms to record them, using standardized measurements and codes to register them, and active monitoring of their occurrence. Most trials require those AEs that are considered serious to be individually reported to the sponsor and to an ethics review board, to the regulatory agency overseeing the trial, and to an independent DSMB for their careful evaluation during the conduct of the trial, to allow the possibility for the trial to be stopped or modified before its completion if it is suspected that SAEs are associated with the drug, vaccine, or product under investigation.

The choice of the outcome measures in a specific trial largely depends on the purpose of the trial and how relevant, feasible, and acceptable the measures will be in a particular study population. Furthermore, the choice may be constrained by economic, logistic, or ethical considerations.

The outcome measures chosen should reflect these objectives as fully as possible, but, when intermediate variables are used, rather than those of main interest, care must be taken to choose variables of direct relevance to the main outcome. These sets represent the minimum that should be measured and reported in all clinical trials, audits of practice or other forms of research for a specific condition. They do not imply that outcomes in a particular study should be restricted to those in the core outcome set.

Rather, there is an expectation that the core outcomes will be collected and reported to allow the results of trials and other studies to be compared, contrasted and combined as appropriate; and that researchers will continue to collect and explore other outcomes as well. Patient reported outcome PRO data, such as quality of life and symptoms, if collected, analysed and reported appropriately can be used to inform shared-decision making, clinical guidelines and health policy.

Key considerations for researchers designing studies involving PROs, patient advocates and reviewers can be found on the Centre for Patient Reported Outcomes Research web based information resource.

We are still testing the new HRA website to ensure it meets your needs. Please complete our short feedback form.



0コメント

  • 1000 / 1000