PoC: A talking camera
I wanted to test out a cool game mechanic: An automatic image feature detection system that tells what's in the image. You can get the idea by testing the Kittydar site that tells if there's a cat in an image.
Now I think it would be a cool game mechanic if the user could "take a photo" and a voice would call out what it sees in the image.
Fake it 'til you make it
While detecting cat faces is such a trivial task that it can be done in a browser using crude tools, I'd like to have a Google-level AI that makes valid sentences about what's in the picture.
Fortunately as a game creator I'm also in full control of the surrounding world, so I think it would be pretty easy to fake this effect by labelling all "important" stuff in the world and listing only those that are visible to the camera.
Priorities
Now sometimes you see a lot of stuff on the screen. You might be in a jungle where there's fireflies, bushes, trees, monkeys, birds... and a lion that's sprinting towards you. A normal person only describes the most important things, and so should my "AI".
I figured that I could add an increasing priority number to every labelled object so I can focus on the most important things happening on the screen.
Sprinkle some details
Removing all the details leaves us with a very specific list of the most important objects in the scene. But there are some important relations between objects in a scene that include good details that helps us figure out what's happening in the scene.
We can achieve this effect by linking two visible objects in a scene, and describe the relationship between them ("inside", "over", "under", etc). This way we can get the extra detail by making a "hop" to the most important visible link of the most important visible object:
Try it out
That's it! I think this was a good proof of concept for a system describing what's visible to a camera like a good "AI" would describe it.
Check out the interactive demo to test how it works yourself!