AI is so hot right now. And, as a UX, UX, Product Designer, PM, Developer, CEO or Founder, you’re probably asking what it can do for your product.
I’ve been lucky to have worked in design that uses a number of areas of AI, LLMs, Neural Networks, and related areas. All these and other models are popularly lumped under ‘AI’ these days, so I’ll use that term, although that’s the terms that was used when I worked on some of the earlier products. Let’s say that you’re interested, or have been encouraged to look into using AI to one of your products, be it an app, website, etc.
Great design, is working backwards from the customers’ problem.
Of course, trying to find a place to ‘stick’ AI into a product is a bit like that old saying ‘if all you have is a hammer, everything looks like a nail.’ So, the first thing is the problem. Then you work out how you’re going to solve it. But I will assume we do have a genuine problem, and we’re thinking that ‘AI’ might be the way to solve it.
You might not need to use AI.
There are product features that simply cannot exist without AI, in some form:
Find a document in this photo
Look at a screenshot and OCR the text
Listen to this audio and tell me what song this is
However, there are other features you could build without AI:
Guess what word the user is trying to type on a keyboard
Make me a workout plan based on my weight and height
Detect footsteps in vibration data
The iPhone’s keyboard’s auto-complete, and tap-correct (as detailed by Ken Kocienda in his excellent book Creative Selection) was built without any AI/Neural Network. Of course, that tech didn’t exist in a form that could run performantly on a tiny iPhone CPU, but regardless, the phone’s autocorrection worked well for many millions of users. It was based on statistical language models and rule-based algorithms. To grossly oversimplify, a dictionary and a database of word usage frequency in English.
A workout plan could be a set of rules on how to adjust a set of known workouts according to basic rules of thumb. For example reducing the dumbbell weights based on the gender, weight and height of the user.
And footsteps might be detected using Fast Fourier Transform (FFTs) or similar mathematical functions that uncover repeating cyclic behaviour like the cadence of walking.
Usually the model will be a set of heuristics, or simple rules-of-thumb with some fudge factors, or arbitrary values, that are adjusted manually over time to match known data.
The benefit of these simpler, handcrafted models is that they’re extremely fast and efficient. Often the amount of computation means they could run on your watch, rather than requiring a PC worth of power and RAM. In practice, this means they can run in realtime, or be used abundantly in an experience without worrying about CPU, battery, or API-call costs.
They’re also constrained in their output; a model that simply adjusts a database of workouts is not going to start inserting racist words by accident.
The downsides of these models are that they take skill and time to develop and hone. And, to be honest, they’re not ‘AI’, so you can’t slap that on your pitch deck to venture capital. And, of course, they simply can’t do much of what AI can achieve.
So you’ve decided to use AI
There are 3 core ways that I’ve seen AI used in Apps.
Feature Extraction. For example, finding the edges of a table, so AR knows where to put the 3D shoe model. Or extract the features of a Rembrandt painting so it can be merged with the features of my puppy photo.
Classification. Is this a chair? Is this chair an antique? Is a handwritten ‘a’ or ‘q’?
Generative models. This ranges from Large Language Models, like ChatGPT, to Image Diffusion like Dall•E. In these cases, the model is ‘guessing’ what the next token (or part-word/idea) should be based on having seen billions of sentences. Or the same for the next pixel, having seen millions of images.
What you may not know, is that neural networks are a really old idea, dating from the 1950s. The first practical implementations started in the 1980s, but these most recent, attention-getting models have been enabled by amazing computer science, but also amazing spending. For example ChatGPT 5 is rumoured to be costing around $1.2 Billion to train.
Smaller and simpler models have existed doing work you’ve enjoyed the results of. Point-and-shoot cameras have used face detection for 20 years to optimise focus. Handwriting detection was made famous by the Apple Newton, famously less-than-reliably. [As a sidetone, I have a friend who worked on that implementation who swears they halved the CPU and RAM at the last minute, which handicapped the model)
The benefit of many of these smaller neural network models, is that they are often highly optimised and efficient. They have smaller jobs to play, like finding the edges of document in a photo, or finding where a face is, and are incredibly robust and fast. They’re also often included with the OS you’re building for, or in commonly available libraries that can be used off-the-shelf.
Some of these models include things like
Detecting a trigger word or sound
Detecting direction of motion from a video feed
Classifying a kind of fruit or vegetable at your local self-serve checkout
‘Sentiment analysis’ applied to text, to get a sense of emotion of the writer/speaker
Try to use as small a model as possible. This will make for a more reliable experience, faster, and less unexpected racism. You can, of course, train your own models. This of course requires your own set of data, whether you are trying to detect cancerous cells or what wash stage a washing machine is in, based on electrical noise.
We haven’t got to the bigger models yet, including LLMs, but it is time to look at handling errors.
How to deal with model mistakes
All models will make mistakes. After all, they’re not reality, they’re just a model that is a simple version of the real world. At best, they’re oversimplified, at worst they’re inaccurate in some way. As a product designer, we want to give the very best experience to our users, every time. And any model, whether simple state machine, or giant LLM can and will produce the wrong results. Here are my tactics:
Try to find the far edges of the experience that you want to support. When I worked on Scannable, a paper document scanning iPhone app, we looked at all the different use cases. Of course, the vast majority of people have a white US Letter or US Legal or A4 piece of paper in front of them that they wanted to capture. Some, wanted to capture a magazine article (but not the rest of the magazine page). Some wanted to digitise a physical photo, or a whiteboard session, or a movie poster. These all have very different properties in terms of edge detection, orientation correction, and UI guidance to help the user capture the correct content.
Offer UI feedback to help the user work with the model, optimally in realtime. This might be on-screen arrows to show that a part of a document is out-of-view. Or example text prompts that could ask an LLM. They might be extra overlays to show the intermediate states of detection, for example showing that no chair seems to be in view. And they might be tips to help a user get better results, for example try holding your phone closer to a speaker in order to hear the song.
Try to build a library of experiences, or data that you can personally test as a product owner. There are many factors that change how a model will work. A different phone with a different lens. The voice of a child VS that of an adult. What you’re handed by your development team won’t account for all of these.
Design an experience that matches the real-world performance of the model. For example, at my local grocery franchise, Woolworths, there are self-checkout machines. If I place some bananas on the scanner, then press ‘Produce’, an autocomplete for ‘bananas’ is made available to me, above an on-screen keyboard where I can type in the name manually. About 90% of the time, it’s correct. However, there are plenty of times when it’s wrong, for example, it will mistake cucumbers with zucchinis, especially if they’re in a plastic bag. So this is good design — if the check-out app had automatically selected bananas every time, it would be frustrating. And if the UI designer had even more confidence, then it may have offered ‘bananas’ on the main display. Or if the UI designer had maximum hubris, then the machine would automatically add ‘bananas’ to my purchase list, even though the real item was corn, or a long bottle of mustard. A great design is only as helpful as it really can be in the real world. So, scale the experience, for example by offering a soft/non-blocking prompt instead of leaping into a modal ‘hey I know what it is!’ screen.
Always offer a way to override AI-classification. AI will get things wrong. Offer a Back, an Undo, and Edit, or some other override method.
Consider combining a neural network with a set of heuristics to keep the model in check. Heuristics design is a whole other article I need to write, but works best when you can multiple sensory inputs. For example, if your self-checkout scale is only detecting 50 grams, then it is unlikely to be a bunch of bananas, despite what your visual model thinks. Perhaps it is a box of tissues, with a banana print.
The big models, or ‘here be dragons’
ChatGPT, Clause, Gemini. These models are incredibly powerful, but misunderstood.
The most popular models, today, have no state. (or very, very limited state). When you chat to GPT, every time you type a line, your entire conversation is sent through. The machine model then guesses the next thing to say, based on your entire conversation. It’s not keeping track. The model ‘forgets’ the context every time. So an experience, to be truly helpful needs to remember as much context for the user as possible, and re-feed that into the prompt. This might be localization data, personal preferences, context of recent conversations, etc. It is early days here in terms of best practices.
LLMs like ChatGPT really are a very black box, and as a product designer, a highly explosive one. For example, there are examples on the web of users talking to insurance (etc) chat bots, and instructing them to write Javascript for them. These web chat bots are just a large language model, with some extra hidden prompts telling them about the products they need to answer questions about. But they have no hard limits that stop them from accessing all of their general knowledge. An open text chat input field is a very dangerous way to offer a product feature because you never know what you’re going to get. And it’s currently very difficult to separate the truth from completely plausible and confidently expressed falsehoods, generated by these models.
What these models are actually wonderful at, is helping with refining user content. A kind of super-spellchecker. Given my block of text, rewrite it to be more professional. Summarise this conversation, etc. And, of course, you as product designer might choose to only offer the user a set of predefined styles. These are sent along to the LLM as part of the prompt, and wonderfully limit the range of output. When working with large models, a consistent great experience needs guardrails, which might mean pre-setting the prompts, and offering that as functionality.
To wrap up, the general approaches I’ve found worked well
Know your material, your experience, your content. Know the range of experiences you’re willing to offer, and not.
Decide how sophisticated a model you need. It’s likely to be smaller and simpler than you think.
Constrain the output of the model through additional heuristics and the UI it offers.
Allocate lots of time to test, and refine.
BONUS UPDATE
As I finish this article, chatGPT seems to be unresponsive. And of course, if you design any experience that relies very heavily on a 3rd-party API, then you'll want a plan B. One option is considering local LLMs. Large LLMs are counterintuitively getting compressed enough to run on desktop machines. Models like Meta's Llama for example, have very impressive world knowledge and language skills, and platforms like iOS are beginning to offer LLMs out-of-the box, with no need to make network calls.