Ever struggled to get the perfect composition, or found yourself frustratedly relying on cumbersome tripods or tricky timers? Researchers have now come up with a unique way of getting the perfect picture in the form of PhotoBot, a robot photographer that can take instructions and use a reference photo to create the desired mood of a photograph.
“We introduce PhotoBot, a framework for fully automated photo acquisition based on an interplay between high-level human language guidance and a robot photographer,” researchers explain.
“We propose to communicate photography suggestions to the user via reference images that are selected from a curated gallery. We leverage a visual language model (VLM) and an object detector to characterize the reference images via textual descriptions and then use a large language model (LLM) to retrieve relevant reference images based on a user's language query through text-based reasoning.”
Oliver Limoyo, one of the creators of PhotoBot described the process as “really fun,” and worked on the project while at Samsung with his manager Jimmy Li. While struggling to find a good metric for aesthetics, the pair were inspired by the Getty Image Challenge, a COVID lockdown activity devised by the media company whereby people attempted to recreate famous images using three common household items.
Say cheese! We introduce PhotoBot, a framework for fully automated photo acquisition based on an interplay between high-level human language guidance and a robot photographer. PhD candidate @OliverLimoyo will present this work at #IROS2024!Paper: https://t.co/DHGFvfOKJf pic.twitter.com/BPrxDkMxlDOctober 3, 2024
This gave Liyomo and Li the idea to have the robot select a reference image to inspire the photograph, leaving them with the challenge of deciding how the PhotoBot would find the reference images, and how to adjust the camera to match that reference.
To use the PhotoBot, you first provide it with a written description of what you want, for example, “an image of me looking confident,” then the PhotoBot scans the environment around the subject, identifying objects and people it can detect. It then finds similar photos from a preprogrammed database of labeled images that have similar objects.
Then, an LLM compares the written description and the objects in the environment with a smaller set of labelled images, offering the closed matches to use as reference images. The LLM can be programmed to return any number of reference photographs.
Get the Digital Camera World Newsletter
The best camera deals, reviews, product advice, and unmissable photography news, direct to your inbox!
For example, if you input the written command “a picture of me looking happy,” it might identify a person with curly hair holding a cup of coffee, in a specific environment. PhotoBot would then produce numerous reference images. Once you select the appropriate image you want your photograph to mimic, it would then move its robotic arm to position the camera and take a similar image.
To make sure the PhotoBot moves to the correct position, it recognizes features that are consistent with the reference suggestion and image, for example the positioning of someone's chin or hands. Then it solves a “perspective-n-point” (PnP) problem by taking a camera’s 2D view and matching it to a 3D position in space. It then solves how to move the robot’s arm to transform its view to look like the reference image, repeating the process while making incremental adjustments as it gets closer to the desired pose.
To test whether the images taken with PhotoBot were favored over human photography, Limoyo’s team had eight people try the process. They then asked 20 people to judge which of the photographs were more appealing. Overall, the PhotoBot took the preferred image 242 times out of 360 photographs, roughly 67% of the time.
Photographers need not worry about PhotoBot coming for their jobs just yet, as the project is no longer in development, however, Li believes someone should create an app based on the underlying programming to allow people to take better photos of themselves and each other.
“Imagine right on your phone, you see a reference photo,” he said, “But you also see what the phone is seeing right now, and then that allows you to move around and align.”
Take a look at the best AI image generators, or if you're still in favor of old school methods, take a look at our guides to the best tripods, the best selfie cameras, and the best gimbals.
Why not take a look at the best Cyber Monday deals for photographers.