Oddly, all of us has a tendency to ascribe the ability to see to the optical images captured by the human brain as though the human brain records the surrounding like a camera and interprets it instantly even without needing extra time to adjust and render the captured images in the wrong side. The optical images projected by the retina in the eye are literally upside down like in telescopes so, if it is true that the brain transmits the optical images directly to perceive the real world, the human brain must turn the upside-down inputs back before interpreting them.
That is practically what happened when we record a video or take a photo with a camera in terms of optical physics. But if that is the case for the human brain then there has to be a detection mechanism which can evaluate the upside-down images projected by the retina in the eye to turn them back in their original form as if there is a little tiny human who perceives the optical images even before the human brain itself interprets them. As a thought experiment, if the human brain has to utilize this little detection mechanism to achieve to perceive the outside world suddenly then the imaginary little human alike has to have another detection mechanism akin to itself. As a result of that, every sub-detection in order to interpret the optical images projected by the retina in the eye is similar to self-repeating algorithms in an endless loop. Not surprisingly, which is why the homunculus theory is a logical fallacy which makes the ability to see too hard to explain instead of coherent.
As we know already, the homunculus fallacy is not the answer we are looking for comprehending our ability to see in every possible way because the homunculus idea is wrong at its core. The human brain does not use the optical images projected by the retina in the eye to remake the outside word as an embodiment in itself. Conversely, the human brain utilizes the optical images to recreate the outside world artificially in an intricate but not impossible way. So that, for the human brain, turning the upside-down images back is not a prerequisite to perceive the outside world. We perceive a symbolic world which represents the real objects. Astonishingly, in that case, each of us has our own reality created by our own brain individually. As explained by Ramachandran remarkably down below;
"In order to understand perception, you need to first get rid of the notion that the image at the back of your eye simply gets ‘relayed’ back to your brain to be displayed on a screen. Instead, you must understand that as soon as the rays of light are converted into neural impulses at the back of your eye, it no longer makes any sense to think of the visual information as being an image. We must think, instead, of symbolic descriptions that represent the scenes and objects that had been in the image." (1)
(1) Ramachandran, V. S. "Phantom Limbs and Plastic Brains." The Tell-Tale Brain: A Neuroscientist's Quest for What Makes Us Human. New York: W.W. NORTON & COMPANY, 2012. 47. Print.
Figure - 32.1 https://s.hswstatic.com/gif/optical-illusions-2.jpg
Figure - 32.2 https://readingacts.files.wordpress.com/2015/03/upside-down.jpg
Figure - 32.3 https://qph.fs.quoracdn.net/main-qimg-8f24c74a6ed3781963c385af11e42825