Before a Neural Network Understands an Image, It Must First Lose It

By Randley Morales

Before a Neural Network Understands an Image, It Must First Lose It

There is a strange moment inside deep learning where an image stops being an image.

A photograph enters the network as pixels.

Then, layer by layer, it begins to disappear.

It becomes edges.
It becomes textures.
It becomes patterns.
It becomes channels.
It becomes a feature map.

To a human, the original image may feel clear from the beginning. We see a cat, a face, a guitar, a room, a shadow, a shape. We recognize the world almost instantly.

But a neural network does not understand the world all at once.

It learns through transformation.

That is the idea behind my video:

Before a neural network understands an image, it must first lose it.

This project explores the connection between VGGNet, feature maps, deep learning, computer vision, cinematic guitar, and ambient sound design. It is both a technical reflection and a musical experiment.

It asks one question:

What if a neural network could hear a guitar?

What Is VGGNet?

VGGNet is one of the most influential convolutional neural network architectures in deep learning and computer vision. It was introduced by Karen Simonyan and Andrew Zisserman in the paper Very Deep Convolutional Networks for Large-Scale Image Recognition. Their work studied how increasing network depth using small 3×3 convolution filters could improve image recognition, showing strong results with models pushed to 16–19 weight layers.

What makes VGGNet powerful is not only its performance.

It is the simplicity of its idea.

Small filters.
Repeated layers.
Deep structure.
Hierarchical abstraction.

VGGNet does not try to understand the whole image in a single step. Instead, it builds meaning through a sequence of transformations. Each layer receives a representation of the image, extracts something from it, and passes a new representation forward.

This is why VGGNet feels so interesting from both a mathematical and artistic perspective.

It turns vision into a process.

It reminds us that perception is not instant.
It is layered.

What Are Feature Maps?

A feature map is one of the hidden languages of deep learning.

When an image passes through a convolutional neural network, each layer produces activation maps. These maps represent what the network is responding to at that stage.

In mathematical terms, a feature map can be thought of as a tensor:

X ∈ ℝᴴˣᵂˣᶜ

Where:

H = height
W = width
C = number of channels

Each channel is a different way of seeing.

One channel might respond to edges.
Another might respond to contrast.
Another might respond to texture.
Another might respond to curves, shapes, or repeated patterns.

A feature map is not the object itself.

It is what the network has learned to notice.

That idea became the emotional center of this video.

Because music works in a similar way.

A guitar note is not just a note. It is a layered representation of touch, timing, pressure, tone, memory, and space.

The Guitar as an Input Signal

In deep learning, an image enters the network as input.

In this project, the guitar becomes the input.

A single nylon string vibrates.
The vibration becomes signal.
The signal becomes tone.
The tone becomes delay.
The delay becomes reverb.
The reverb becomes space.
The space becomes emotion.

The guitar starts as something physical, but it does not remain physical for long. Once it enters the signal chain, it becomes a system of transformations.

For this video, I used my Godin ACS Nylon SA Extreme Koa HG as the source. The guitar carries the warmth of nylon strings, but also the clarity and precision needed for modern sound design.

From there, the signal moved through a carefully built chain:

Neural DSP Nano Cortex
Chase Bliss Mood MK2
Gamechanger Audio Auto Delay Pedal
Gamechanger Audio Auto Reverb Pedal
Boss XS-100
Walrus Audio Canvas Power 5
Apollo Twin X

Each device changed the signal.

Not randomly.

Each effect became a layer.

The guitar was the input.
The pedals were the hidden layers.
The final sound became a feature map of emotion.

Deep Learning and Sound Design

In a convolutional neural network, every layer transforms information.

The early layers capture simple patterns.
The middle layers create more complex relationships.
The deeper layers move toward abstraction.

Sound design follows a similar logic.

A clean guitar tone is only the beginning. Once it enters a chain of effects, the sound becomes something else. It becomes memory, movement, atmosphere, and depth.

The Neural DSP Nano Cortex shaped the tone and dynamic response.

The Chase Bliss Mood MK2 introduced memory, fragmentation, and surreal texture.

The Gamechanger Audio Auto Delay created rhythmic space and time-based motion.

The Gamechanger Audio Auto Reverb expanded the sound into atmosphere.

The Boss XS-100 supported performance flexibility and additional sonic control.

The Apollo Twin X captured the final sound with clarity and depth.

The result was not just a guitar tone.

It was a representation.

A sonic tensor.

A multidimensional emotional map.

When a Note Becomes a Feature Map

A single note contains more information than we usually realize.

It contains:

attack
decay
resonance
timbre
harmonic color
finger pressure
room reflection
silence
emotion
time

In the same way that VGGNet extracts features from an image, a musician extracts feeling from a note.

The first layer is touch.

The second layer is tone.

The third layer is space.

The fourth layer is memory.

The fifth layer is emotion.

By the time the listener hears the final sound, the original note has already passed through many transformations. It has been reshaped by gear, performance, room, intention, and time.

This is why the idea of a feature map feels so close to music.

A feature map is not the original image.

A finished guitar tone is not the original vibration.

Both are transformed versions of reality.

Both reveal what the system has learned to emphasize.

Computer Vision and Cinematic Guitar

This project was not only about sound.

It was also about image.

I shot the video using the Sony FX3 with the Sony 35mm GM lens because I wanted the visual world to feel cinematic, intimate, and precise.

The 35mm field of view gives the frame a natural perspective. It feels close to the subject without becoming distorted. It allows the guitar, the hands, the screen, and the atmosphere to live together in the same visual space.

The screenshots of VGGNet feature maps became more than technical references. They became part of the visual language.

The screen shows the machine’s hidden perception.

The guitar shows the human gesture.

The camera connects both worlds.

Computer vision becomes visual poetry.
Guitar becomes mathematical expression.
Deep learning becomes atmosphere.

Music, Mathematics, and Machine Learning

My work has always lived between disciplines.

Mathematics teaches structure.
Machine learning teaches representation.
Music teaches emotion.
Photography teaches light.
Guitar teaches touch.

This video brings those worlds together.

VGGNet is not just an architecture. It is a way of thinking about perception.

A neural network learns by transforming data into representations. A musician learns by transforming experience into sound.

One uses filters.
One uses fingers.

One learns from tensors.
One learns from silence, repetition, instinct, pain, memory, and discipline.

But both are searching for the same thing:

meaning beneath the surface.

That is why deep learning and music belong together in this project.

Not because artificial intelligence replaces the artist.

But because artificial intelligence gives the artist another metaphor for understanding perception, abstraction, and transformation.

Maybe the solution isn’t just in the code.

Maybe it’s in the silence between the notes.

AI Music Is Not Only About Generating Songs

When people hear the phrase AI music, they often think of artificial intelligence generating melodies, voices, or full compositions.

But AI music can mean something deeper.

It can mean using machine learning as a creative lens.

It can mean thinking about music through representation, layers, embeddings, feature spaces, transformations, and patterns.

In this video, VGGNet does not generate the music.

Instead, VGGNet inspires the structure of the music.

The concept of feature maps becomes a way to think about sound.

The idea of hidden layers becomes a way to think about effects.

The movement from pixels to meaning becomes a way to think about the guitar signal moving from vibration to emotion.

This is where artificial intelligence becomes art direction.

Not a replacement for the human.

A mirror for the human.

The Philosophy of Layers

A neural network sees in layers.

A musician feels in layers.

A photographer captures light in layers.

A mathematician studies structure in layers.

A composer builds emotion in layers.

The deeper we go, the less obvious the object becomes. But sometimes, losing the surface is the only way to discover what is underneath.

That is what VGGNet teaches me.

The first layer sees edges.
The next layer sees patterns.
The deeper layers see abstraction.

Music works the same way.

The first layer is the note.
The next layer is the tone.
The next layer is the space.
The deeper layer is the feeling.

At the surface, we hear a guitar.

But underneath, there is a whole architecture of emotion.

Why This Video Matters

This video is a meditation on perception.

It is about how machines see.
It is about how musicians feel.
It is about how mathematics, sound, and image can become part of the same creative system.

The screenshots show a neural network transforming an image into feature maps.

The guitar performance transforms a sound into atmosphere.

The camera captures the relationship between both.

This is the intersection where I feel most alive creatively:

deep learning
computer vision
mathematics
cinematic guitar
ambient sound design
visual storytelling

It is not only a music video.

It is an experiment in representation.

A cat becomes a feature map.
A note becomes a universe.
A network learns to see.
A guitar remembers.

And somewhere between mathematics and emotion, the machine begins to dream.

Gear Used in This Video

Camera:
Sony FX3

Lens:
Sony 35mm GM

Guitar:
Godin ACS Nylon SA Extreme Koa HG

Effects and Sound Design:
Neural DSP Nano Cortex
Chase Bliss Mood MK2
Gamechanger Audio Auto Delay Pedal
Gamechanger Audio Auto Reverb Pedal
Boss XS-100
Walrus Audio Canvas Power 5

Recording:
Apollo Twin X

Final Reflection

Before a neural network understands an image, it must first lose it.

Before a musician understands a note, maybe the same thing must happen.

The note must stop being just a note.

It must become memory.
It must become texture.
It must become space.
It must become pain.
It must become atmosphere.
It must become meaning.

That is the real connection between VGGNet and guitar.

Both begin with raw input.

Both pass through layers.

Both search for something hidden.

And both remind us that the deepest forms of understanding often happen after the original image disappears.