Human performance, cubed

My young son is into solving Rubik's cubes. But it's a very different thing than in my childhood. When I was a kid, I was happy if I could solve a single side after some extended tinkering. He, on the other hand, can solve the entire cube in just minutes [update: his latest time is one minute, three seconds] — and is working to improve that. He's a smart kid, but most of his ability has come from the infrastructure of on-demand learning.

But let's start at the beginning, and see how we got to where we are today…because maybe that will give us a clue to where we'll be tomorrow.

In 1974, Ernő Rubik was a teacher of architecture at the Budapest College of Applied Arts. His desire to simplify the teaching of three-dimensional geometry and spatial relationships led him to create a small teaching model, or puzzle made of wood and elastic bands. It could rotate independently on all axes while maintaining its shape.

Importantly, to this story, when Ernő Rubik first created the Rubik's Cube, it took him over a month to solve it.

Rubik transitioned to a robust plastic puzzle design, selling it locally until, in 1980, the American company Ideal Toy Corp. licensed the cube, rebranding it as the "Rubik's Cube" and launched it globally. It was a craze in the 1980s, but has seen a resurgence, including sophististicated 'GAN' Cubes that used internal magnet arrays to speed the physical act of solving the puzzle, named after speedcuber Ganyuan Jiang.

The world record is now 3.47 seconds, yes, seconds, solved by Yusheng Du in 2023.

So, how did we get from 'over a month' to solve the cube, to 3.47 seconds? Yes, there is some manufacturing improvements that enable this. But even if it took thirty seconds using a regular cube, the ratio between 30 days and 30 seconds is vast. That's a 2,880x improvement.

The answer, of course, is YouTube. My kid, after all, sees YouTube as the source of all on demand learning. Techniques, walk throughs, and the development of a set of algorithms that make for a guaranteed solved puzzle, with the fewest steps.

There’s also motivation. Once the YouTube algorithm sees your interest in speedcubing, then you’re inundated with videos on the topic. A kind of fake peer-group, where everyone talks about speedcubing, looks cool, and does cool things (and only cool things).

He doesn’t know anyone else who is into Rubik’s Cubes. He had one given to him, which sat on the shelf for years, and one day started his YouTube journey.

The delta between he and I, is instant, on-demand learning including instant synthetic peers. I don’t think we really understand how to direct that firehose yet. Traditional education certainly doesn’t. But the rise of cohort based learning with online materials is on the path.

Creative Jobs

In a Steve Jobs interview from years ago, Jobs decried that ‘we no longer have secretaries’. By which he meant—the invention of desktop publishing with its easy and forgiving interface, simple editing and layout had (and has) let the executives and managers of the world create their own documents, unaided.

Even further back, we can note that the job of "computer" has been replaced by machines called computers. "Computers" were once people who did routine mathematical operations for science and engineering, like those at NASA, until electromechanical and electronic computers were invented.

Even further, we are reminded of the telegraph. The telegraph was a huge success in business, outracing the post. When the telephone was invented, it was dismissed as a toy, but it allowed more people to communicate using familiar, and richer, language. Sir William Preece, Chief Engineer of the British Post Office: "The Americans have need of the telephone, but we do not. We have plenty of messenger boys." Well, we don’t have messenger boys now, either.=

Pixar’s computing prowress improved phenomoly year over year. But Monster’s Inc wasn’t made in half the time, or at half the price of Toy Story. Instead, the animators were able to make richer and expressive animation.

Great products are ones that help people get closer to expressing, iterating on, and communicating an idea.

AI: A Product Design Primer

AI is so hot right now. And, as a UX, UX, Product Designer, PM, Developer, CEO or Founder, you’re probably asking what it can do for your product.

I’ve been lucky to have worked in design that uses a number of areas of AI, LLMs, Neural Networks, and related areas. All these and other models are popularly lumped under ‘AI’ these days, so I’ll use that term, although that’s the terms that was used when I worked on some of the earlier products. Let’s say that you’re interested, or have been encouraged to look into using AI to one of your products, be it an app, website, etc.

Great design, is working backwards from the customers’ problem.

Of course, trying to find a place to ‘stick’ AI into a product is a bit like that old saying ‘if all you have is a hammer, everything looks like a nail.’ So, the first thing is the problem. Then you work out how you’re going to solve it. But I will assume we do have a genuine problem, and we’re thinking that ‘AI’ might be the way to solve it.

You might not need to use AI.

There are product features that simply cannot exist without AI, in some form:

  • Find a document in this photo

  • Look at a screenshot and OCR the text

  • Listen to this audio and tell me what song this is

However, there are other features you could build without AI:

  • Guess what word the user is trying to type on a keyboard

  • Make me a workout plan based on my weight and height

  • Detect footsteps in vibration data

The iPhone’s keyboard’s auto-complete, and tap-correct (as detailed by Ken Kocienda in his excellent book Creative Selection) was built without any AI/Neural Network. Of course, that tech didn’t exist in a form that could run performantly on a tiny iPhone CPU, but regardless, the phone’s autocorrection worked well for many millions of users. It was based on statistical language models and rule-based algorithms. To grossly oversimplify, a dictionary and a database of word usage frequency in English.

A workout plan could be a set of rules on how to adjust a set of known workouts according to basic rules of thumb. For example reducing the dumbbell weights based on the gender, weight and height of the user.

And footsteps might be detected using Fast Fourier Transform (FFTs) or similar mathematical functions that uncover repeating cyclic behaviour like the cadence of walking.

Usually the model will be a set of heuristics, or simple rules-of-thumb with some fudge factors, or arbitrary values, that are adjusted manually over time to match known data.

The benefit of these simpler, handcrafted models is that they’re extremely fast and efficient. Often the amount of computation means they could run on your watch, rather than requiring a PC worth of power and RAM. In practice, this means they can run in realtime, or be used abundantly in an experience without worrying about CPU, battery, or API-call costs.

They’re also constrained in their output; a model that simply adjusts a database of workouts is not going to start inserting racist words by accident.

The downsides of these models are that they take skill and time to develop and hone. And, to be honest, they’re not ‘AI’, so you can’t slap that on your pitch deck to venture capital. And, of course, they simply can’t do much of what AI can achieve.

So you’ve decided to use AI

There are 3 core ways that I’ve seen AI used in Apps.

  1. Feature Extraction. For example, finding the edges of a table, so AR knows where to put the 3D shoe model. Or extract the features of a Rembrandt painting so it can be merged with the features of my puppy photo.

  2. Classification. Is this a chair? Is this chair an antique? Is a handwritten ‘a’ or ‘q’?

  3. Generative models. This ranges from Large Language Models, like ChatGPT, to Image Diffusion like Dall•E. In these cases, the model is ‘guessing’ what the next token (or part-word/idea) should be based on having seen billions of sentences. Or the same for the next pixel, having seen millions of images.

What you may not know, is that neural networks are a really old idea, dating from the 1950s. The first practical implementations started in the 1980s, but these most recent, attention-getting models have been enabled by amazing computer science, but also amazing spending. For example ChatGPT 5 is rumoured to be costing around $1.2 Billion to train.

Smaller and simpler models have existed doing work you’ve enjoyed the results of. Point-and-shoot cameras have used face detection for 20 years to optimise focus. Handwriting detection was made famous by the Apple Newton, famously less-than-reliably. [As a sidetone, I have a friend who worked on that implementation who swears they halved the CPU and RAM at the last minute, which handicapped the model)

The benefit of many of these smaller neural network models, is that they are often highly optimised and efficient. They have smaller jobs to play, like finding the edges of document in a photo, or finding where a face is, and are incredibly robust and fast. They’re also often included with the OS you’re building for, or in commonly available libraries that can be used off-the-shelf.

Some of these models include things like

  • Detecting a trigger word or sound

  • Detecting direction of motion from a video feed

  • Classifying a kind of fruit or vegetable at your local self-serve checkout

  • ‘Sentiment analysis’ applied to text, to get a sense of emotion of the writer/speaker

Try to use as small a model as possible. This will make for a more reliable experience, faster, and less unexpected racism. You can, of course, train your own models. This of course requires your own set of data, whether you are trying to detect cancerous cells or what wash stage a washing machine is in, based on electrical noise.

We haven’t got to the bigger models yet, including LLMs, but it is time to look at handling errors.

How to deal with model mistakes

All models will make mistakes. After all, they’re not reality, they’re just a model that is a simple version of the real world. At best, they’re oversimplified, at worst they’re inaccurate in some way. As a product designer, we want to give the very best experience to our users, every time. And any model, whether simple state machine, or giant LLM can and will produce the wrong results. Here are my tactics:

  1. Try to find the far edges of the experience that you want to support. When I worked on Scannable, a paper document scanning iPhone app, we looked at all the different use cases. Of course, the vast majority of people have a white US Letter or US Legal or A4 piece of paper in front of them that they wanted to capture. Some, wanted to capture a magazine article (but not the rest of the magazine page). Some wanted to digitise a physical photo, or a whiteboard session, or a movie poster. These all have very different properties in terms of edge detection, orientation correction, and UI guidance to help the user capture the correct content.

  2. Offer UI feedback to help the user work with the model, optimally in realtime. This might be on-screen arrows to show that a part of a document is out-of-view. Or example text prompts that could ask an LLM. They might be extra overlays to show the intermediate states of detection, for example showing that no chair seems to be in view. And they might be tips to help a user get better results, for example try holding your phone closer to a speaker in order to hear the song.

  3. Try to build a library of experiences, or data that you can personally test as a product owner. There are many factors that change how a model will work. A different phone with a different lens. The voice of a child VS that of an adult. What you’re handed by your development team won’t account for all of these.

  4. Design an experience that matches the real-world performance of the model. For example, at my local grocery franchise, Woolworths, there are self-checkout machines. If I place some bananas on the scanner, then press ‘Produce’, an autocomplete for ‘bananas’ is made available to me, above an on-screen keyboard where I can type in the name manually. About 90% of the time, it’s correct. However, there are plenty of times when it’s wrong, for example, it will mistake cucumbers with zucchinis, especially if they’re in a plastic bag. So this is good design — if the check-out app had automatically selected bananas every time, it would be frustrating. And if the UI designer had even more confidence, then it may have offered ‘bananas’ on the main display. Or if the UI designer had maximum hubris, then the machine would automatically add ‘bananas’ to my purchase list, even though the real item was corn, or a long bottle of mustard. A great design is only as helpful as it really can be in the real world. So, scale the experience, for example by offering a soft/non-blocking prompt instead of leaping into a modal ‘hey I know what it is!’ screen.

  5. Always offer a way to override AI-classification. AI will get things wrong. Offer a Back, an Undo, and Edit, or some other override method.

  6. Consider combining a neural network with a set of heuristics to keep the model in check. Heuristics design is a whole other article I need to write, but works best when you can multiple sensory inputs. For example, if your self-checkout scale is only detecting 50 grams, then it is unlikely to be a bunch of bananas, despite what your visual model thinks. Perhaps it is a box of tissues, with a banana print.

The big models, or ‘here be dragons’

ChatGPT, Clause, Gemini. These models are incredibly powerful, but misunderstood.

  1. The most popular models, today, have no state. (or very, very limited state). When you chat to GPT, every time you type a line, your entire conversation is sent through. The machine model then guesses the next thing to say, based on your entire conversation. It’s not keeping track. The model ‘forgets’ the context every time. So an experience, to be truly helpful needs to remember as much context for the user as possible, and re-feed that into the prompt. This might be localization data, personal preferences, context of recent conversations, etc. It is early days here in terms of best practices.

  2. LLMs like ChatGPT really are a very black box, and as a product designer, a highly explosive one. For example, there are examples on the web of users talking to insurance (etc) chat bots, and instructing them to write Javascript for them. These web chat bots are just a large language model, with some extra hidden prompts telling them about the products they need to answer questions about. But they have no hard limits that stop them from accessing all of their general knowledge. An open text chat input field is a very dangerous way to offer a product feature because you never know what you’re going to get. And it’s currently very difficult to separate the truth from completely plausible and confidently expressed falsehoods, generated by these models.

  3. What these models are actually wonderful at, is helping with refining user content. A kind of super-spellchecker. Given my block of text, rewrite it to be more professional. Summarise this conversation, etc. And, of course, you as product designer might choose to only offer the user a set of predefined styles. These are sent along to the LLM as part of the prompt, and wonderfully limit the range of output. When working with large models, a consistent great experience needs guardrails, which might mean pre-setting the prompts, and offering that as functionality.

    To wrap up, the general approaches I’ve found worked well

    1. Know your material, your experience, your content. Know the range of experiences you’re willing to offer, and not.

    2. Decide how sophisticated a model you need. It’s likely to be smaller and simpler than you think.

    3. Constrain the output of the model through additional heuristics and the UI it offers.

    4. Allocate lots of time to test, and refine.

    BONUS UPDATE

    As I finish this article, chatGPT seems to be unresponsive. And of course, if you design any experience that relies very heavily on a 3rd-party API, then you'll want a plan B. One option is considering local LLMs. Large LLMs are counterintuitively getting compressed enough to run on desktop machines. Models like Meta's Llama for example, have very impressive world knowledge and language skills, and platforms like iOS are beginning to offer LLMs out-of-the box, with no need to make network calls.

Design and AI: Fret or Not?

I came across this tweet thread today.

Screen Shot 2020-10-28 at 11.06.50 am.png

The thread basically questions the idea of Bohemian Coding’s Sketch having good reasons to be a native, rather than Mac app. Yes, the web is more accessible. Yes Figma (the competition) does a much better job in some areas, especially anything to do with collaboration. It’s faster in some ways, and has some fantastic vector tools, and other unique things.

On the plus-side for Sketch, I love the fact that Sketch not only fits, but enables all kinds of flow in my Mac experience. I can drag and drop stuff in and out with zero friction. But I don’t think that’s the killer reason for Sketch to double-down on the native experience.

And it has to do with Rock’n’Roll, baby. Let’s get started.

Clippit, or Clippy, the annoying Microsoft Helper.

Clippit, or Clippy, the annoying Microsoft Helper.

It’s looking pretty inevitable that Machine Learning/AI is going to play a big part in all of our computing life. In my opinion, I’d like to think I know what this doesn’t look like. It doesn’t look like Clippit, the unhelpful Microsoft paperclip. It’s not a helper, and it’s not a one-click solution.

Have you ever hit the DEMO button on a small Casio music keyboard? Yes, you can press it, and play music. But it’s not the experience we’re hoping for. Instead, I think of great ML being like a fret on a guitar. If you don’t know it, the frets are the vertical lines between where you put your fingers on a guitar’s neck. They make it easy to choose notes on each string. There are many stringed instruments that don’t have frets—like violins, cellos, double basses. There are still fretless guitars, but they’re very much the niche. The fretted guitar is the winning instrument in terms of popularity for most people.

Frets on a guitar neck

Frets on a guitar neck

A fret is very much a constraint that you work with as you play. It helps guide where to put your fingers, and once you hold down your fingers, it guarantees that the notes will be in tune. In this way, playing a guitar is much easier than playing a violin, since you can start producing pleasant (if not simple) music in just days, rather than months or even years.

Note that the fret doesn’t play the music for you. You don’t interact with a fret and wait a second for the result to come back. And it’s not simple an on/off button. It’s a set of edges, shapes that you can interact with in infinite ways.

And this unlocks whole other art forms. Because you can play full, rich chords reliably, it’s possible to sing songs with rich harmony. The guitar is a portable instrument that goes back to the Renaissance lute, and shares that long history of popular songwriters and singers.

A lute and player.

A lute and player.

And without the ability to play guitar and sing, you wouldn’t have Rock’n’Roll. And about a zillion other popular music styes. The fret, because it makes a clean strong connection with the string, also enables instruments like the electric guitar to have long sustained notes (think Jimmy Hendrix’s Star Spangled Banner), and complex tapping work.

Bringing this back to design, I see the coming age of integrated ML best implemented likes frets on a guitar. It is not a Casio Keyboard’s demo button, instead a rich interaction that the creator works intimately with to enable faster creation, and whole new art forms. And we are seeing Apple, in particular, invest incredibly heavily in building powerful Neural engines into each and every iPhone and iPad. And soon, I expect, the Mac.

This is where I think Sketch can shine. This is where you want the lowest latency possible, and the biggest bandwidth between you and the tool.

The web is an amazing platform. Figma is a fantastic tool. And I love competition between design tools. Trust me, doing UI design in Photoshop sucked in comparison.

What I want is my design tools to be tactile, rich, solid experiences. And I think there’s a fantastic opportunity for native, close-to-the-metal apps to deliver this.