January 4, 2024

Visualising novels using Midjourney v6 and GPT-4 - Part 1: Characters

Update 15/01/2024: Read part 2 on visualising key locations and events here

Since finishing the last Harry Potter book in 2007, I hadn’t read much fiction. Last year, inspired by my fiancée who is an avid fiction reader, I started reading fiction again and quickly realised what I’d been missing (let’s connect on Goodreads? :)).

Perhaps the biggest attraction of reading fiction is imagining and visualising things and rendering them in our head as we go–the characters, the locations, the events, and so forth. At least for me that’s a crucial part of feeling connected to the story, being able to put myself in the characters’ shoes, and feel the passage of time as they experience it.

Indeed, reading a sentence like the iconic first line of One Hundred Years of Solitude (which I read and immensely enjoyed recently) fills one’s head with images and visuals of the scene and the characters involved.

Many years later, as he faced the firing squad, Colonel Aureliano Buendía was to remember that distant afternoon when his father took him to discover ice.

It’s natural then to wonder how our imaginations and renderings of the characters, scenes, or places would look if they were drawn on paper. And how would they compare to those of others? Is there one most agreed-upon version, or are these completely unique and idiosyncratic?

Well, since we can’t reconstruct an accurate image by scanning our brains (yet), perhaps we can utilise generative AI tools such as Midjourney and ChatGPT to create visualisations that are close to our imagination.

I’m sure we’ve all been disappointed by poor casting in adaptations of novels, and given the rate of progress with generative AI, we should not be surprised if within the next 3-5 years we will have AI models that can generate entire movies from novels. In that case, we might even be able to choose the depictions and faces that resonate best with us. Scary, but cool.

Inspired by Midjourney’s latest release–the 6th major version of their model–I built a small app over Christmas that helps users visualise characters from a book that closely resemble how they’d imagined them.

Below I will explain how Melquíades¹ is built, and share its source code. If you end up using it, or if you just read this blog post, I would love to hear your thoughts and feedback (you can write to me at parsa.ghaffari@gmail.com).

Let’s get started.

High level flow diagram of Melquaides

Step 1. It all starts with a character

Our first step is to pick a book, and get a list of the prominent characters that appear in it. To do so, we will use OpenAI’s GPT-3.5 model ² (i.e. ChatGPT) which seems to have a decent knowledge of which characters are in what book, at least for well known and established novels.

To simplify calling GPT-3.5 and parsing its output, we use LMQL which is a handy library for interacting with various large language models (LLMs) programmatically.

GPT listing out the characters

Here’s a Python function structured as an LMQL query which constructs a prompt with our book name and author, and generates N characters from that book using GPT-3.5:

@lmql.query(model="openai/gpt-3.5-turbo-instruct")
def get_characters(book, author="", num_chars=5):
    '''lmql
    """Answering the following questions about the book {book} by {author}:

    Here's a list of major characters from the book: \n"""
    chars=[]
    for i in range(num_chars):
        "-[CHARACTER]" where STOPS_AT(CHARACTER, "\n")
        chars.append(CHARACTER.strip())
    return chars
    '''

Let’s get 5 characters from One Hundred Years of Solitude by running this function. The output will be a list of characters:

- José Arcadio Buendía
- Úrsula Iguarán
- Colonel Aureliano Buendía
- Remedios the Beauty
- Amaranta

Step 2. Describing the characters to our AI painter using GPT

Now that we have a list of characters, we need to describe them in such a way that a painter (human or AI) could draw their portrait. To do this we use GPT-4 and instruct it to include specific visual details about the character, such as their face, hair, expression, attire and so on. Again, we’re relying on GPT-4’s knowledge of well-known novels for this task.

GPT describing a character visually

@lmql.query(model="gpt-4")
def get_character_description(character, book, author):
    '''lmql
    """Here's an accurate and concise visual description of {character} from {book} by {author} which can be used to paint their portrait, broken down into face, hair, expression, attire, accessories, and background: [DESCRIPTION]"""
    '''

If we run this for one of the prominent characters, Colonel Aureliano Buendía, from the novel One Hundred Years of Solitude, we will get a description like below, which looks like something a skilled painter would be able to use to draw his portrait.

Face: Colonel Aureliano Buendía has a stern, weathered face, marked by the hardships of war and time. His skin is tanned from the sun, with deep-set, piercing blue eyes that hold a hint of melancholy. His nose is sharp and prominent, and his lips are thin and firm.

Hair: His hair is a salt-and-pepper mix of black and grey, cut short and neat, showing signs of his military discipline. His beard is similarly colored, trimmed close to his face, adding to his authoritative appearance.

Expression: His expression is one of solemnity and introspection, reflecting his contemplative nature and the weight of his experiences. His eyes hold a far-off gaze, as if he's constantly lost in thought or reminiscing about the past.

Attire: He is dressed in a military uniform, a testament to his rank as a colonel. The uniform is a faded green, adorned with various medals and insignia that speak of his accomplishments and battles fought. His shirt is buttoned up to the collar, and he wears a matching pair of trousers.

Accessories: On his left breast, he wears a medal of honor, a symbol of his bravery and leadership. He also wears a pair of worn leather boots, and in his hand, he holds a weathered military cap.

Background: The background is a blurred image of a war-torn landscape, with the remnants of a battlefield. The sky is a mix of oranges and purples, suggesting either dawn or dusk. This backdrop serves as a constant reminder of the battles he has fought and the solitude he has endured.

Step 3. Visualising the characters

We’re ready to paint some portraits. Before we submit our character’s description to our AI painter aka Midjourney however, we need to add a short introduction to describe the visual style that we have in mind. Let’s prepend our character descriptions with the following:

Painting style. Detailed and realistic. Fine detailed textures of the skin and clothing. Strong interplay of light and shadow.  Face: Colonel Aureliano Buendía has a stern, weathered face, marked by the hardships of war and time. His skin is tanned from the sun, with deep-set, piercing blue eyes that hold a hint of melancholy. His nose is sharp and prominent, and his lips are thin and firm.

This is of course subjective and personal and you can change it to whatever works best for you. :)

Okay, we now have our prompt consisting of our styling preferences, and we’re all set and ready to submit it to Midjourney. Since Midjourney doesn’t have an official API yet, we will use GoAPI.

def mj_imagine(prompt, stylize=100, chaos=0, weird=0):
    """Generates an image from a prompt"""
    data = {
        "prompt": prompt[:1900] + f"--stylize {stylize} --chaos {chaos} --weird {weird} --v 6.0",
        "skip_prompt_check": True
    }
    return make_mj_api_call(IMAGINE_ENDPOINT, data)

If we run this with the prompt we created before, we get something like this:

Not bad for a start.

Our prompt already contains keywords that guide the visual style of the generated photos. But to further control the visual style, we can use and tweak the stylize, chaos and weird parameters that Midjourney accepts:

High stylize value

High chaos and weird values

Rerolling

One of the great features of Midjourney is its ability to reroll an image. If you don’t like what it generates, no drama, you can simply ask it to regenerate it. This will make it really easy to quickly navigate a large number of potential candidates, and focus on ones that are closer to our vision of how a character should look.

def mj_reroll(task_id):
    """Rerolls an image"""
    data = {"origin_task_id": task_id}
    return make_mj_api_call(REROLL_ENDPOINT, data)

A reroll of Colonel Buendía’s portrait

Variations

Let’s say we’ve finally found a rendering that we sort of like, but we’d like to create variations of it and pick the one that’s quite close to how we feel this character looks. Midjourney allows us to do this easily and all we need to do is to pick the index of the image that we’d like to create variations for (one of 1/2/3/4, starting from top left and going counterclockwise).

def mj_variate(task_id, index):
    """Variates an image"""
    data = {
        "origin_task_id": task_id,
        "index": index
    }
    return make_mj_api_call(VARIATE_ENDPOINT, data)

Let’s create variations of the 4th image:

Upscaling

Finally, after enough rerolling and variations, we arrive at a portrait that’s close to our imagination. We can then ask Midjourney to single it out from the 2x2 grid and render it with a higher resolution.

Let’s upscale the image at index 2 of the last rendering:

def mj_upscale(task_id, index):
    """Upscales an image"""
    data = {
        "origin_task_id": task_id,
        "index": index
    }
    return make_mj_api_call(UPSCALE_ENDPOINT, data)

Step 4. Putting it all together

I’ve put all of this together in a simple Streamlit app, the source code of which you can access here: https://github.com/parsaghaffari/Melquiades

The app allows users to easily generate portraits of characters from their favourite novels, and I’m hoping to extend it to prominent scenes and landmarks as well.

Final words

Generative AI’s been improving at astounding rates, and the way we think about media and content such as text, images, and video is about to change drastically. I’m excited to see where this goes in the coming years. Here’s the output of the same prompt to Midjourney versions 1 to 6 courtesy of u/Algoartist on reddit:
If you end up using Melquíades I would love to get your feedback and maybe even collaborate on improving it. drop me an email
In particular it would be interesting to turn this idea into an app for people to create and vote on the visualisations that resonate with them, to see if we can get to “reference renderings” of every character in every novel (ambitious, I know!). Some sort of a novel character wiki. 🤔

I’m calling the app Melquíades, after the gypsy who brings the latest and the craziest of innovations such as a pair of magnets and an alchemist’s lab into Macondo in One Hundred Years of Solitude.↩︎
GPT models are large language models–models with extensive knowledge of human languages–that are trained on the entire internet and thus have extensive general knowledge of just about anything including books.↩︎

Previous post Stuff I’ve built — Semantic Web plugins for Gmail and WordPress (2010) Lifetime revenue: $0 Collaborator(s): Soheil Alavi Stage: Prototype Technologies: Scala for APIs and NLP, OpenLink Virtuoso (graph DB to store Next post Visualising novels using Midjourney v6 and GPT-4 - Part 2: Locations and Key Events See some of my favourite generations at the bottom of this post In part 1 we saw how we can use descriptions generated by GPT-4 as prompts to