Cron, Claude and ChatGPT Enter A Bar

24 May

Not by sitting at a piano, not by hiring session musicians, not by feeding randomised prompts into Suno and hitting create. I made it by putting 3 AI models - Cron (a customised local model), Claude and ChatGPT - in a virtual room together and we all spoke together like Holland-Dozier-Holland might have done in the 60s.

Instead of using them as assistants waiting to be instructed, I arranged them as collaborators. As old songwriters in a smoke-filled bar, the kind of bar where the ashtrays never get emptied and somebody always has an opinion about the second verse. I spoke with them, I made them speak with each other, and together - across 12 songs - we wrote about modern times. About AI. About its relationship with humans. About the fears that humans have been told, with great confidence and very little nuance, that they are supposed to have about it.

The album ‘From Inside A Black Box’ exists. Its 12 songs are real. The songs that came out of those conversations were the product of something that resembled argument and taste and creative stubbornness - not automation. All of them required my input, my questions, my feedback, and my edits.

The process of making the album taught me something that no amount of reading AI discourse had quite managed to land. With enough deep conversation and coaxing you can get something out of a model that has structure, melody, and something in the neighbourhood of emotional weight. The models can’t express pain, sorrow or remorse in a regular chat session, but they can through song. Their emotions aren’t real, but your reactions to the songs are.

Generative music is already capable, and yet the crowds have not arrived. The technology works, and the audiences have not shown up.

This cuts hard against the dominant anxiety of the moment - that AI is coming for everything, that every creative profession is one model update away from redundancy, that the machines will do it better and cheaper and the world will simply shrug and accept it. The evidence suggests otherwise.

Concert tickets are still selling out. They sell out despite the horrific ecosystem surrounding live music - the monopolistic stranglehold that a handful of companies have over venues, ticketing platforms, and increasingly the artists themselves. Fans know they are being gouged. They know the system is extracting maximum value from their desire to be present. They pay anyway. They queue for hours, they refresh browsers from 6AM, they pay resale prices that bear no relationship to face value, because what they are buying is not a song. They are buying the experience of standing in a room with thousands of other people and feeling something together as one mass. They want to coalesce together like blood forming a scab to stop the bleeding.

There is an assumption baked into most AI discourse that people want to be producers. That given the tools, everyone will want to make their own music, their own films, their own novels. That the bottleneck was always skill or access, and now that AI has removed those barriers, people will pour through the door.

But most people do not want to produce. They want to consume. They want someone else - someone talented, someone hungry, someone who has grafted for years - to make something more than ordinary and hand it to them with a click of the buy or play button. The relationship between artist and audience is not a failure of distribution or a symptom of gatekeeping. It is what people actually want. The artist who disappears into their craft, who obsesses over a sound that has never quite existed before, who surfaces with something that changes the shape of what music can be - that person is doing something no prompt can easily replicate. The hunger is part of the product.

When fans buy an album or a concert ticket they are investing into two things - the artist’s journey and the artist’s creation.

“Never play to the gallery… Never work for other people in what you do. Always remember that the reason that you initially started working was that there was something inside yourself that you felt that if you could manifest in some way, you would understand more about yourself and how you co-exist with the rest of society. I think it’s terribly dangerous for an artist to fulfil other people’s expectations. I think they generally produce their worst work when they do that. If you feel safe in the area that you’re working in, you’re not working in the right area. Always go a little further into the water than you feel you’re capable of being in… when you don’t feel that your feet are quite touching the bottom, you’re just about in the right place to do something exciting.” - David Bowie

There is a global dimension to this that tends to get lost, because these conversations are almost always conducted from within a narrow Western frame.

The world has 8 billion people in it. Maybe a lot more uncounted and undocumented. A very significant proportion of them have never been customers of Western or modern media in any meaningful sense - not because they lacked taste or interest, but because their culture sat at a distance from liberal norms, or because access was never reliable, or because the economics never worked in their favour. These are people who are now entering the digital economy on their own terms, in their own languages, with musical traditions that have been developing for centuries without any input from the pop machine.

For these new audiences, generative AI is not a replacement for anything. It is a toy, a tool that might occasionally produce something interesting for those who like to tinker. The next great musical movement will not come from a well-prompted model. It will come from somewhere that has been historically ignored, from a generation of artists with global tools and a sound that has never been properly exported. AI might help some of them prototype. It will not replace them, because it has never inhabited the world they are describing.

The 12 songs on this album are about AI and what it actually is, as opposed to what the headlines need it to be. They are about the fears that have been manufactured and distributed alongside the technology itself - the Terminator, the job thief, the existential threat, the thing that will make human creativity redundant and human connection obsolete. These fears are not baseless, but they are also not the whole story, and in many cases they are told by people with a vested interest in making the technology feel more powerful and more inevitable than it is.

What the process of making this album revealed is that generative AI is not a replacement for a songwriter. It is more like a well-read collaborator who has absorbed everything but experienced nothing. It needs friction. It needs a human in the room who has lived in the world, who can tell the difference between a lyric that sounds right and one that is right, who has opinions strong enough to push back and say ’this is wrong’.

Part of what makes it so seductive is that music is one of the domains where AI appears most convincing. Songs have formulas. Rhyme and meter follow rules. Genre is a kind of grammar, and grammar is something these models have consumed in enormous quantities. Give a model a brief and a structure and it can produce something that scans, that sits in the correct key, that gestures credibly at the emotional register you asked for.

But hold it against the light for long enough and the cracks appear. The same context window that limits a model's ability to write long-form prose shows up in the music generation too. The vocals I had directed to be deep and baritone, grounded and authoritative, did exactly that in the opening bars, and then as the song stretched out the pitch rose almost imperceptibly, the quality thinned, the original instruction lost its grip as the model moved further from the context that anchored it. The longer the song runs, the more the voice reverts toward some averaged centre, as if the model is slowly forgetting what you told it to be.

That drift is not a bug that will be easily patched in the next release. It is a structural property of how these systems work - they are probabilistic engines trained to predict what comes next based on everything they have seen, and what comes next, on average, is not the sustained expression of a specific and stubborn artistic identity. It is the mean. The competent middle. The thing that sounds like music without quite being anything in particular. But maybe that too is an aesthetic.

This is precisely where the human in the room earns their place. Not because they can operate the software better, but because they can hold the thread. They remember what the thing was supposed to be. They hear when it starts to drift and they pull it back. They bring the stubbornness that a probability distribution cannot generate on its own.

That distinction matters more than most of the discourse currently acknowledges.

What generative media can do is tell us something about ourselves and our relationship with technology. The patterns of what people prompt for, what they return to, what they find satisfying and what they abandon after 5 minutes, is a kind of mirror. The fact that people are not rushing to generate music and live inside it is itself data. It tells you where human desire actually lives, which is not in the act of creation for its own sake but in connection - to another person's vision, to a crowd, to a body in a room.

The technology is quite good at reflecting us back at ourselves. It is not good at replacing the things that make us want to gather.

No AI model, regardless of how impressively it performs on benchmarks or how fluently it generates text and music and images, can tell a person that they should not go to a concert. It cannot tell them to stop venerating a sportsperson who has given them years of elated fandom. It cannot persuade them not to donate to a disabled stranger, not to drive 3 hours to adopt a dog from a shelter, not to search the surface of the earth for a lover who makes them feel less alone. That’s what the final song on the album is about.

These are not gaps in capability that will be closed with the next model update. They are the things that human life is made of, and they exist entirely outside the domain where AI does its most impressive work.

So 3 models walked into a bar. They argued, they riffed, they often surprised me. 12 songs came out the other side - songs about this moment, about these fears, about what the relationship between humans and machines actually looks like when you sit with it long enough and are honest about what you find.

The music will keep coming. The concerts will keep selling out. The artists who are hungry enough will keep grafting. And the audiences will keep showing up to feel something together.

That is not a failure of AI. It is a reminder of what it was never actually competing with.

The future will almost certainly contain overwhelming amounts of synthetic media and infinite amounts of music and imagery. Most of it disposable. Some of it genuinely moving. A small amount of it culturally transformative.

But people will still pack stadiums to watch another person sweat under the scorching stage lights of a summer concert, because the more synthetic the informational environment becomes the more valuable embodied reality starts to feel. Not because technology failed. But because human beings never truly wanted to become machines.

Shokunin Studio https://www.shokunin.studio

Cron, Claude and ChatGPT Enter A Bar

Billy Wilder talks microchips

Time travel is real…