Humans display a very good understanding of the content in briefly presented photographs. To achieve this understanding, humans rely on information from both high-acuity central vision and peripheral vision. Previous studies have investigated the relative contribution of central and peripheral vision. However, the role of attention in this task remains unclear. In this study, we presented composite images with a scene in the center and another scene in the periphery. The two channels conveyed different information, and the participants were asked to focus on one channel while ignoring the other. In two experiments, we showed that (a) people are better at recognizing the central part, (b) the conflicting signal in the ignored part hinders performance, and © this effect is true for both parts (focusing on the central or peripheral part). We conclude that scene recognition is based on both central and peripheral information, even when participants are instructed to focus only on one part of the image and ignore the other. In contrast to the zoom-out hypothesis, we propose that the gist recognition process should be interpreted in terms of the evidence accumulation model in which information from the to-be-ignored parts is also included.