As we look at the world around us, we have immediate access to the composition of the visual scene into objects. Similarly, in listening to speech, we are more immediately aware of words and meaning that frequencies and timing of the underlying auditory features. The processing of elementary stimulus features such as edges or syllables into object recognition necessarily involves multiple stages of nonlinear computation. However, most characterizations of the function of neurons underlying sensory processing -- such as the concept of a receptive field -- are based on linear assumptions, and in my view this has prevented connections from being made between theories of perception and experimental validation. Here I will introduce a statistical framework for using recordings from visual and auditory neurons to characterize crucial elements of their nonlinear computations. This approach is based on maximum likelihood estimation of point processes, and is able to cast common elements of neuronal processing into a form that can be efficiently estimated given commonly available spike train data. I will present an overview of this approach, as well as examples of the results when applied to neurons in both lower and higher levels of processing in the visual and auditory pathways.