Author + information
- James E. Udelson, MD, FACC⁎ ()
- ↵⁎Reprint requests and correspondence:
Dr. James E. Udelson, Tufts Medical Center, 750 Washington Street, Box 70, Boston, Massachusetts 02111.
A class of agonists relatively specific for the adenosine A2a receptor is under development for pharmacologic stress testing in conjunction with myocardial perfusion imaging (1–4). The hope is that the A2a receptor selectivity will reduce the side effects that accompany the use of adenosine or dipyridamole for pharmacologic stress, and possibly allow safe use in those with reactive airways disease, and, at the same time, provide similar imaging data as nonspecific adenosine-receptor agonists do. Two of these agents, regadenoson and binodenoson, are well along in development, and in this issue of JACC: Cardiovascular Imaging, the second pivotal trial data for regadenoson are reported (3). The investigators conclude that perfusion imaging with regadenoson is noninferior with regard to the extent of reversible defects produced compared with adenosine, and that side effects are reduced, which is consistent with the A2a receptor selectivity.
Do these agents work?
Stimulation of adenosine A2a receptors, given appropriate dosing, should result in coronary arteriolar vasodilation, reduction in coronary resistance, and, thus, in an increase in coronary flow velocity and reserve. With that as a metric, all 3 of the agents that have recently reached human trials—regadenoson, binodenoson, and apodenoson—work well. They have all been studied using coronary flow velocity or reserve measures in patients undergoing catheterization, and all have shown similar increments in coronary flow measures when compared to intracoronary adenosine (5–7). Thus, they all clearly do what they are supposed to do.
For regulatory purposes, however, these agents must demonstrate efficacy when paired with the tracers and imaging methodology that will accompany them in clinical practice: in this case, radionuclide perfusion tracers, single-photon emission computed tomography (SPECT) imaging, and some method of clinical analysis of those images. Therein lies the rub.
Our repertoire of approaches to demonstrate efficacy of any new imaging modality to a regulatory standard is limited. Assessing sensitivity and specificity for detecting a certain threshold of anatomic coronary disease is plagued by issues of referral bias, among other problematic issues, such that a true measure of performance is difficult to obtain. Thus, development strategies regarding efficacy for regadenoson and binodenoson have focused on concordance, the concept that the image data using the new stress agent should provide clinically similar imaging information as that derived from a standard approved agent, usually considered in these trials to be adenosine.
When are 2 images concordant, and how do you measure that objectively?
When reading paired images side by side, say when comparing a patient's old SPECT or echo study with a new study, experienced readers generally feel comfortable judging whether the new study is similar or different when compared to the old study. However, where is the border distinguishing similar and different? Our perhaps arrogant “feeling” that we readers know concordance or discordance when we see it does not translate well into the clinical trial or regulatory environment, where objective definitions need to be specified clearly and prospectively. Moreover, side-by-side reading, in the context of a trial examining whether a new agent is similar to an approved agent, is biased toward finding agreement.
With SPECT imaging, a semiquantitative segmental scoring system for perfusion at stress and rest is usually incorporated, with the difference between the stress and rest scores representing reversible defects on a segmental basis, and the summed difference score for the 17 segments giving 1 number that incorporates the extent and severity of reversible defects (8). We like to believe that this approach is “validated,” insofar as there is a general correlation between the summed stress and the summed difference scores and various natural history outcome events during follow-up, based on the extensive literature (9). These scores are often grouped into categories labeled normal, mild, moderate, and severe to correspond to clinically relevant nomenclature. Thus, a sheen of objectivity is added to the image analysis, though it remains the product of a human eyeball endeavor.
The current study
In the current regadenoson study (3), patients had a clinically indicated adenosine study and then were randomized to have a regadenoson study or a second adenosine study within 4 weeks, in a 2:1 ratio. The stress and rest images were analyzed in a central core lab, with expert readers assigning segmental perfusion scores. A segment was defined as reversible if it had a difference score of >1 (i.e., reversible did not distinguish the severity of the defect). The investigators then grouped the number of reversible segments into categories: none-to-minimal (0 or 1 reversible segments), small-to-moderate (2 to 4 reversible segments), or large (>5 reversible segments), representing the extent of reversibility for the patient. The hypothesis was that the agreement of patients' adenosine and regadenoson scans, defined as the percentage of patients who had similar categorization of the extent of reversible defects for their initial adenosine and then their randomized regadenoson study, would be no worse (noninferior) than the agreement that occurred when adenosine scans were done twice. The resulting agreement results fell within the investigators' pre-specified noninferiority boundaries, thus the investigators considered the trial positive for efficacy.
So from the investigators' pre-specified perspective, the trial is positive. However, one could also say that 62% agreement, when the exact same adenosine test is done a second time, sets a low bar for analyzing a new test. How did that modest agreement happen? We like to think that SPECT imaging is reasonably reproducible, based on published literature mostly emanating from single centers in small numbers of patients, using varying analytic methodologies (10–12). I think the data in the current study has exposed some of the important issues in how we analyze SPECT images for clinical trial purposes for any agent and, thus, is a very important contribution.
Splitting and lumping
The imperative to categorize results to generate ordinal data for analytic purposes may actually create difference out of similarity. Given the category cut points, it is possible that a patient whose studies had 2 and 4 reversible defects would be labeled agreement, but another patient whose 2 studies had 4 and 5 reversible defects would be labeled disagreement. This is simply a consequence of the need to impose arbitrary cut points. Thus, the agreement between regadenoson and adenosine may actually be better than as analyzed by the investigators.
On the other hand, collapsing the segmental difference scores into reversible or not reversible per segment (essentially making any severity of difference scores of 1, 2, 3, or 4 equivalent) masks variability in the scoring by minimizing the range of scores being categorically compared. It also creates possibilities for the 2 studies from an individual patient to theoretically be very different with regard to the extent and severity of ischemia but be called agreement. A study with 2 segments with difference scores of 2 (summed score = 4) would be considered to agree with a second study with 4 segments with a difference score of 4 (summed score = 16), because both fall into the 2-to-4-reversible-segments category, though the full extent and severity of reversible defects (reflected by the summed difference scores) is not at all similar. Hence, the analytic lumping method leaves open the possibility that the agents' efficacy may not be similar. Because the catheterization lab flow studies show a similar increment in coronary flow with regadenoson compared with adenosine (5), dissimilarity of efficacy is not likely to be the case. However, the analytic approach for the primary agreement end point in this analysis may not be conclusive on this point because of these issues, which could apply to any new SPECT imaging agent.
Analytic variability and the need for quantitation
The adenosine–adenosine data illustrate the variability that ensues when humans (even highly expert humans) assign semiquantitative scores to a 17-segment model for 2 separate SPECT studies from the same patient, in a rigorous analytic environment that must be in keeping with the Food and Drug Administration's (FDA) guidance document for imaging in clinical trials (13).
There are many potential sources of variability, including biologic variability of coronary flow response to adenosine-receptor stimulation and acquisition variables. However, it is likely that human variability in assigning the segmental scores plays a large role. Intra- and inter-reader reproducibility data are not reported for this data set to help understand that element. Other modalities also have substantial variability when rigorously analyzed (14,15). The potential advantage of radionuclide imaging is its inherently digital nature, which lends itself to more objective quantitation. Why was automated quantitative analysis not used here? Indeed, in the regadenoson phase 2 published study (16), agreement with adenosine was better with quantitative analysis compared with the human analysis. Quantitative programs are almost universally available and, although not without flaws, would at least remove the human variability element and possibly would have allowed the investigators to more closely approach the truth of potential concordance of the new agent compared with the old.
Consistent with regadenoson's degree of selectivity for the A2a receptor, the investigators report a modest reduction in some of the common side effects of adenosine testing. Of the 3 most common, chest pain and flushing were lower, and dyspnea was numerically greater with regadenoson in what appears to be a combined analysis from the 2 pivotal trials. In the previously reported initial pivotal trial (4), dyspnea prevalence was higher with regadenoson. Overall, in the combined analysis, a patient tolerability score favored regadenoson (p < 0.05). The investigators do not clearly state how the 3 major nor the 7 total side effect categories were to be analyzed; they also do not indicate whether their approach called for corrections for multiple testing. It is stated that a pre-defined composite severity score was calculated for chest pain, dyspnea, and flushing, and this favored regadenoson, but it is not stated whether this score was a patient or a physician assessment. The strength of this finding is somewhat softened by the concepts that a composite result is strongest when all of its components move in a directionally similar manner (17) and that the comparative data emanate from different patients in this parallel design study.
Based on the coronary flow data alone, it appears that regadenoson, binodenoson, and apodenoson all result in the expected and adequate increase in coronary flow when administered intravenously (5–7) and, thus, should be efficacious as pharmacologic stress agents for SPECT imaging. The problem well illustrated by the 2 published pivotal trials of regadenoson is related to analytic methodology of SPECT perfusion images, in that a signal of concordance is hard to discern because of the noise incurred by segmental human scoring and the splitting and lumping of the resultant numerical data. This would likely be no different if binodenoson or apodenoson had been the agent under study using these methods. It is unfortunate that quantitation was not used, as that would have introduced some objectivity into the analysis and removed at least 1 source of variability (at least theoretically).
It is important to note, however, that the role of fully automated quantitative analysis in a development program for a new imaging agent (at least for cardiac purposes) is not clearly defined in the FDA's guidance documents, and thus, commercial sponsors of new agents are appropriately wary of relying too heavily on that method. The most important lesson to be learned here is that it is time for the American College of Cardiology, the American Society of Nuclear Cardiology, and our other imaging society colleagues to engage with the FDA to critically assess analytic methodology for cardiac imaging, with an eye toward incorporating quantitative methods appropriately into future FDA guidance documents. All cardiovascular imaging fields would benefit, and the development of new agents would be facilitated without the distraction and delay engendered by individual sponsors trying to figure out how best to analyze images.
Dr. Udelson is both a consultant and the principal investigator for the binodenoson trial program, and he has received compensation for his time from King Pharmaceuticals R&D.
↵⁎ Editorials published in JACC: Cardiovascular Imaging reflect the views of the authors and do not necessarily represent the views of JACC: Cardiovascular Imaging or the American College of Cardiology.
- American College of Cardiology Foundation
- Glover D.K.,
- Ruiz M.,
- Takehana K.,
- et al.
- Udelson J.E.,
- Heller G.V.,
- Wackers F.J.,
- et al.
- Cerqueira M.D.,
- Nguyen P.,
- Staehr P.,
- Underwood S.R.,
- Iskandrian A.E.,
- ADVANCE-MPI Trial Investigators
- Hendel R.C.,
- Taillefer R.,
- Crane P.D.,
- Widner P.J.
- Cerqueira M.D.,
- Weissman N.J.,
- Dilsizian V.,
- et al.
- Klocke F.J.,
- Baird M.G.,
- Lorell B.H.,
- et al.
- Johansen A.,
- Gaster A.L.,
- Veje A.,
- Thayssen P.,
- Haghfelt T.,
- Holund-Carlsen P.F.
- Food and Drug Administration
- Hoffmann R.,
- Lethen H.,
- Marwick T.,
- et al.
- Hoffmann R.,
- Marwick T.H.,
- Poldermans D.,
- et al.
- Hendel R.C.,
- Bateman T.M.,
- Cerqueira M.D.,
- et al.