Drag Your GAN: Interactive Point-Based Manipulation of Images

vagabund · on May 19, 2023

The semantic understanding feels much richer than diffusion based modeling, e.g. the trees on the shore growing to match the manipulated reflection, the sun changing shape as it's moved up on the horizon, the horse's leg following proper biomechanics as its position is changed. I haven't gotten such a cohesive world model when doing text-guided in-painting with stable diffusion etc. This feels like it could very conceivably be guided by an animation rig with temporally consistent results.

orbital-decay · on May 19, 2023

Temporal consistency for a guided scene is a separate problem. It's been kind of solved a couple years ago. [0] It can be used with animation rigs and simplistic tagged geometries, and it even works in near real-time. "Kind of" because training a model from scratch from a large dataset is not something you want to do for the actual job; what you want is a good style transfer mechanism that can extract features from as few references as possible.

[0] https://isl-org.github.io/PhotorealismEnhancement/

johndough · on May 19, 2023

Project website mirror https://web.archive.org/web/20230519060439/https://vcai.mpi-...

GitHub (no code yet, only demo GIF) https://github.com/XingangPan/DragGAN

arXiv https://arxiv.org/abs/2305.10973

tikkun · on May 19, 2023

Has anyone built the "online photoshop that incorporates all of the latest AI image editing tools asap and sells access as a premium subscription with lots of GPU access for smooth editing" business yet? I'd be curious to know.

jahewson · on May 19, 2023

Adobe Firefly already did it https://www.adobe.com/sensei/generative-ai/firefly.html

Giorgi · on May 19, 2023

Adobe AI is a crap compared to Midjourney

belter · on May 19, 2023

Because one is on proper stock art and the other on anything without asking the creators for authorization?

"AI art tools Stable Diffusion and Midjourney targeted with copyright lawsuit" - https://www.theverge.com/2023/1/16/23557098/generative-ai-ar...

flangola7 · on May 20, 2023

Search on Twitter, the artists in the Adobe dataset are also livid (justifiably) because they didn't consent to their work being used for AI training. The Adobe license agreement is broad enough that Adobe is covered legally, but it isn't enthusiastic consent in any sense of the word. Many, many artists would never have submitted their work to Adobe if generative AI had been a known possibility, so using Adobe's product is really not any better at respecting creators' wishes.

cyanydeez · on May 21, 2023

So like two drunk people having sex and the law trying to figure out who consented

Der_Einzige · on May 20, 2023

Midjourney is crap compared to stable diffusion with all the features in automatic1111

echelon · on May 19, 2023

It's trained on their stock art and under-performs Stable Diffusion and Midjourney.

It's really poor, comparatively.

cubefox · on May 19, 2023

It makes way fewer visual mistakes (like wrong number of limbs) than Stable Diffusion, or even Bing Dall-E ~3. The latter is still the best at understanding your prompt though.

ftufek · on May 19, 2023

I think less effort has gone into image editing compared to image generation so far. That said, we're building some photo realistic image editing tools at https://www.faceshape.com, focused on face editing for now. Current models don't perform as well, but next generation currently under training will.

I'm always curious to know what kind of AI image editing people are interested in, can you share what kind of edits you'd like to do? There's the usual edits like background removal or object removal, but those are more general tools that are getting incorporated into lots of apps natively (say Google Photos).

blueyoda · on May 20, 2023

Tried your app, I liked the product idea but think the execution could use much more work. Personally, I am obsessed with FaceApp’s filters. If you could make an app with even more interesting filters but with the same (or better) realistic quality, I’d definitely use it :)

apodolny · on May 19, 2023

Playground AI (https://playgroundai.com/) does a lot of this.

echelon · on May 19, 2023

There are a million of these. It's a super crowded space.

https://civitai.com/

https://lexica.art/

https://openart.ai/

(Many more)

vaidhy · on May 19, 2023

They are all for image creation. I would love to have one to edit my photos.. not generate images from prompt.

dragonwriter · on May 22, 2023

You can edit your photos with the models and tools for image creation, because those tools can work on a source image (img2img, inpainting, outpainting) as well as just a prompt and blank canvas.

nadermx · on May 20, 2023

https://ImageEditor.ai

XCSme · on May 22, 2023

It would be cool if https://www.photopea.com supported those types of AI manipulation plugins.

ultra_nick · on May 19, 2023

Isn't that stability.ai's business model?

xrd · on May 20, 2023

I like the direction of github.com/invoke-ai/. It isn't a business but an open source project.

nine_k · on May 19, 2023

Hello, post-truth world!

More seriously, I think that digital photos, and especially low-res surveillance camera coverage, will soon be inadmissible in any reasonable court, because tools like this would allow to forge such evidence in very natural-looking ways.

grumbel · on May 19, 2023

Being able to fake something really doesn't matter all that much, as you'd still need to get that fake video into the surveillance camera system and you need to do so in the time between committing the crime and the police arriving, and without leaving a trace and hoping that whoever you try to incriminate doesn't have an alibi.

Fakes will be relevant for Twitter, TikTok and Co., where random videos are posted and distributed without sources, heavily edited and compressed, such that it is impossible to tell if that video ever started out as a real video or a fake. But in court the whole thing starts to fall apart the moment they ask where that video came from.

krunck · on May 22, 2023

Ah, but for law enforcement types to say "hey, look what we found on the surveillance video server!" is a potential problem. The power asymmetry in society is ripe for exploitation by the powerful. They just need some goons - of which there are plenty - to do the dirty work.

tehwebguy · on May 20, 2023

Without knowing what the current ones are, I'd guess we need improved chain-of-custody laws for evidence immediately!

imranq · on May 19, 2023

There are probably ways to embed cryptographic hashes within images. Any device that creates images from the real world could have secret keys that can be used to validate any image created by said device.

We will still need a centralized party that holds the secret keys for validation through

bick_nyers · on May 19, 2023

What about the recording device itself? Refeed a video/frame/hash back to a security camera and it tells you if it was originally sourced from that specific camera or not.

foota · on May 19, 2023

The sci-fi dystopian answer would be entangled photon lights and off site image recording that preserves entanglement :-)

ChainReaktion · on May 19, 2023

This is the right approach, but there’s lots of complexity around transcoding. In the courtroom that’s less of an issue if you can get the original unmodified outputs, but broader applications need to think through what it means to be “verified”

smrtinsert · on May 19, 2023

You don't need a centralized database. You have 3 companies 3 different hashes. They catch bob stealing, on 3 different cameras. Each footage can be independently verified as authentic and not doctored. This would prevent anonymous found footage suggesting someone committed a crime.

You definitely don't want one single leakable entity.

nine_k · on May 19, 2023

What shall we do with millions upon millions of existing mobile phones, and also surveillance cameras and dashcams?

Some of them possibly could be updated, but this will take time. Securing the keys within them is also going to be a problem; not all of them have a TPM.

vhcr · on May 19, 2023

There's no way this would work, either the master encryption key would be leaked, or someone would reverse-engineer the chip.

Also, what about someone putting a screen just in front of the sensor?

nine_k · on May 19, 2023

No need for that.

A private key is generated on device and never leaves it. It sits inside a TPM or equivalent.

The public key is pushed to a well-known site, visible to all.

Every shot is signed by hashing the bits into a reasonably short string (say, using sha512) and then encrypted with the private key.

Anyone can now decrypt the hash, and compare it with the hash they computed from the bits.

The problem, of course, is that any transformation whatsoever breaks the signature. You can't adjust levels and contrast, you can't even crop. Maybe it's a good property.

vhcr · on May 20, 2023

Why would you trust the key if it was generated on the device? Anyone could generate a key. You would either need a centralized server to sign that key, in which case you have to trust a centralized company / government agency.

dragonwriter · on May 20, 2023

You have always been able to forge photographic evidence in natural looking ways (most easily by using completely real photography with misrepresented time or context).

And, of course, the easiest to falsify evidence (eyewitness testimony) is still admissible.

That's why you have to provide support for provenance, all of which is subject to examination and counterevidence.

politelemon · on May 19, 2023

Seeing how slowly laws move, the cynic says it's: should be inadmissible in any reasonable court, but will continue to be admissible, and will take a major set of incidents for changes to be enacted across many countries.

nine_k · on May 19, 2023

Yes. The story if admissable DNA evidence is instructive and terrifying.

https://daily.jstor.org/forensic-dna-evidence-can-lead-wrong...

ortusdux · on May 19, 2023

Demo video: https://twitter.com/_akhaliq/status/1659424744490377217

lt · on May 19, 2023

Longer video with more examples from one of the paper authors:

https://twitter.com/XingangP/status/1659483374174584832

amelius · on May 19, 2023

Looks like it can't keep the background stable, so I guess this is not suitable for animations.

Scene_Cast2 · on May 19, 2023

Neat concept. I wonder if something like that can be applied to diffusion models (the new kid on the block that is outshining GANs right now) - especially since the technique doesn't seem to be too dependent on the generative image implementation.

Also, it's interesting that they're submitting to SIGGRAPH - kind of expected this to be in a more ML-ish conference.

cubefox · on May 19, 2023

Probably goes to show where SIGGRAPH is headed.

mft_ · on May 19, 2023

The main link you provided is either not loading, or loading with missing video links, for me - maybe hugged to death at the moment?

Github may be more resilient: https://github.com/XingangPan/DragGAN

ArekDymalski · on May 19, 2023

As a technology this is tremendously impressive, straight out of SF movie. However I wonder how it will impact our culture, fashion, standards of beauty etc. when more and more artists will be accepting the generated output then creating their own. Just like in case of music the invention of MIDI, synths and sequencers brought new styles but also boring and imagination-numbing standardization.

waqasy · on May 19, 2023

DragGAN consists of two main components including: 1) a feature-based motion supervision that drives the handle point to move towards the target position, and 2) a new point tracking approach that leverages the discriminative GAN features to keep localizing the position of the handle points.

bbminner · on May 19, 2023

There's also an older work called PuppetGAN http://ai.bu.edu/puppetgan/

Qweiuu · on May 19, 2023

It's similar but different.

Your paper makes an existing body a puppet.

The other one adjusts features.

krunck · on May 19, 2023

I'm thinking that soon video and images are going to be just dead weight in journalism, adding nothing other than decoration.

radarsat1 · on May 19, 2023

It seems to be a pretty common thing now on news network websites, that at the top of the story, or perhaps somewhere in the middle, there is a video. But, you click on the video, and it's something entirely unrelated to the article. I feel like this as been going on for quite a while, nothing to do with synthetic media, but just an observation of an annoying pattern I've noticed.

esafak · on May 19, 2023

Stock photos are already like that. To me they're worse than having no pictures at all. Generated images are better than nothing because you can make them very specific.

chpatrick · on May 19, 2023

I think there has to be some kind of cryptographic signature solution, like "The BBC verifies that this image is authentic".

mrshadowgoose · on May 19, 2023

We already have most of the required cryptographic primitives for this. PKI, trusted timestamping, secure hardware with remote attestation are some of the necessary building blocks. All that's really missing are camera sensors with built-in cryptography.

And societal care. Our society seems to really like to whine about "the danger of deepfake images", but our actions reveal that we don't really give a crap, as we could solve this problem today if we really wanted to.

chpatrick · on May 19, 2023

I think it would also need to have some kind of embedded version control from camera through Photoshop so you know what was changed.

mejutoco · on May 21, 2023

If you trust how the bbc tells the story, you also trust them when they say the image is true. Not a technical problem IMO.

chpatrick · on May 21, 2023

Sure but then the image gets copied shared reposted etc. If the images had cryptographic provenance included, sites like Reddit or Twitter could verify and display that, or lack thereof.

I would use the same infrastructure as HTTPS, so an image could be signed by bbc.co.uk. That way if you trust that domain you could trust the document, wherever you come across it.

mejutoco · on May 21, 2023

But why? If it is on bbc s website bbc is already saying it is good. Same for any other media. The technical approach is interesting but not needed IMO in this case.

chpatrick · on May 21, 2023

Like I said, because you might come across the image on Reddit or Twitter but have no idea if it's real or not.

qingcharles · on May 20, 2023

Why just the video and images? Half of the stories on the Internet are created out of whole cloth by AI now, never mind the embedded media...

nullc · on May 21, 2023

Best demo would be to take some famous photo, find its position in gan-space then move it around.

sroussey · on May 19, 2023

Would love to see this for architecture!

loandbehold · on May 19, 2023

What do we need human actors for at this point? Everything can be generated by AI now.

u385639 · on May 19, 2023

Please try to make an original movie that meets the standard of, say, The Godfather, with AI.

yamazakiwi · on May 19, 2023

I understand your point but I think it would be easier with AI than without. Many movies are not made to the standard of The Godfather because they don't sell like MCU Movie #53 and if you include more humans in the creation you're more likely to run into the current system's restrictions.

Making a movie as beloved as the Godfather would still be challenging of course.

esafak · on May 19, 2023

As long as they don't suck the air out of funding for real movies I can live with it, but I'd still be sad that people are being trained to like auto-generated junk. Like how people are losing their ability to concentrate on long-form content due to overexposure to addictive short-form content.

u385639 · on May 19, 2023

Of course it would be easier. I agree. I just take issue with the "why humans" thing because if anything, the recent advancements highlight just how big the human element really is.

Can you imitate a Bach prelude? Sure. And only people who aren't actually familiar with his music would be impressed.

Much of AI approaching "human performance", is it approaching the lowest bar. There's a Wittgenstein thing going on here. That an LLM can ace the LSAT or GMAT is mostly an indictment of those tests.

A little off topic.

famouswaffles · on May 19, 2023

>That an LLM can ace the LSAT or GMAT is mostly an indictment of those tests.

These kind of comments are always the funniest. You can just tell the person who makes them has never looked at those tests nevermind attempted them.

u385639 · on May 19, 2023

I scored 159 on the LSAT in 2014, so I am not claiming the tests are easy. I am pointing out that when an AI aces them, it says more about the test than anything else.

famouswaffles · on May 19, 2023

No it doesn't lol because you can insert any test you like into your equation. GPT-4 performs well above average on almost anything you throw at it.

"Says more about [insert test]" is not an intelligent argument. It doesn't even make sense. Can you tell me exactly what this mysterious thing is ?

If you have this secret test for "true" intelligence and understanding the entire world is missing on then please share it with us and get your acclaim.

u385639 · on May 19, 2023

We must be speaking past each other. I am not out for acclaim and sorry for any confusion. Everything you say is exactly the point I'm trying to make, evidently clumsily. The "mysterious thing" is the human element. I don't know what else to call it? Humans that ace tests prove only that they are good at acing tests. Not that they're good at running businesses or practicing law. Not creating films (in this example), or music, etc.

I am not knocking the advancements, the capabilities are incredible. But machines have been doing what humans cannot since the dawn of time. I'm just pointing out what I think (thought?) was obvious: machines will soon be able to do just about everything that doesn't really matter.

PS. are you familiar with Wittgensteins ruler? Ask chatGPT about it.

famouswaffles · on May 20, 2023

I didn't say test to mean just standardized tests lol. I meant that as problems you throw at it.

Sure seem good enough at law that multiple of the biggest law firms have partnered with Open ai backed Harvey https://twitter.com/ai__pub/status/1644735555752853504 https://www.lawnext.com/2023/04/harvey-ai-raises-21m-in-a-se...

and then there's what microsoft are doing with 4 in medicine. https://arstechnica.com/information-technology/2023/04/gpt-4...

There is no "human element" lol. That's the point. That's how you know the argument has no ground. People resort to "human element" when they have nothing to actually say. and because "human element" has no meaning, the goal posts for it just keeps getting moved further and further. apparently now we're at "make the godfather".

u385639 · on May 20, 2023

I'm not a critic of AI or moving any goal posts. I'm not lobbing comments in a vacuum. I was responding directly to the comical proposal that we don't need actors anymore, to which my Godfather comment has every relevance. Thanks anyway!

famouswaffles · on May 20, 2023

Guess i just don't think your comment has as much relevance as you think it does. Remove the "with ai" and nothing actually changes.

"Please try to make an original movie that meets the standard of The Godfather, without AI" and lets see how well that goes.

Is the human that fails this task also missing the "human element" ?

u385639 · on May 20, 2023

https://en.wikipedia.org/wiki/Rule_of_inference

pmoriarty · on May 19, 2023

Please try to make an original movie that meets the standard of The Godfather, with or without AI.

the_af · on May 19, 2023

> What do we need human actors for at this point? Everything can be generated by AI now.

For blockbusters? If the tech still isn't there, it may be soon, and we won't need actors.

For cinema where we care it was made by humans, for humans? Actors will always be needed. Also, theater still exists and people enjoy it.

loandbehold · on May 19, 2023

Theater is a niche now. And whenever I go to the theater with my wife we are the only members of the audience under 50. Doesn't look like this form of art will survive very long. I don't think people care very much whether characters are played by real humans. It used to be that dangerous stunts were performed live. Nobody cares that they were replaced by CGI. Nobody cares that Tom Cruise doesn't really jump out of the burning helicopter.

the_af · on May 20, 2023

What do you mean, "a niche"?

I mean, sure. Reading is a niche, too. HN is a niche as well. Almost everything you care about is a niche.

But back to acting: Broadway and off Broadway exist. Maybe it's not doing so well, I wouldn't know: I don't live in the US... but theaters exist in my city. Both big and indie plays are conducted by young people, for young people. People watch them. People act in them.

It's mistaken to believe that tech will replace things that people value other human beings doing. Theater and cinema -- barring blockbusters -- are not "processes" to "optimize". They exist for their own sake. People love watching other people act.

Want to know what else people love doing: acting themselves! Acting classes are everywhere.

So excuse my extreme skepticism: human actors aren't going anywhere.

Maybe Tom Cruise in Top Gun 5 will be auto-generated by AI, who cares? Those blockbusters sure are within reach of AI, since it's all about the thrills and no-one really cares about the acting behind all those CGI scenes.

> Doesn't look like this form of art will survive very long.

Art is more resilient than you give it credit for. Art has been with us -- mankind -- since our beginnings, and it will never be gone. It's something humans crave doing.

mejutoco · on May 21, 2023

According to this logic marvel superhero movies will soon be the only existing form of art.

kleer001 · on May 19, 2023

Not even close by several orders of magnitude across two dozen disciplines. But yea, we're heading there.

Zetobal · on May 19, 2023

The only thing that's new is the interactive interface the rest of it is old tech... You can use it on art breeder.com. Photoshop even has it in their face neural filter. GANs are not feasible for a variety of reasons you need to have models specific to your subject ie. why they change to a elephant model to manipulate the elephant.they are also not style agnostic but it's a great demo and the right time to release it. Just before the summit of the hype curve I bet one VC is dumb enough to throw millions at them.

chatmasta · on May 19, 2023

With ChatGPT, the only thing new was the chat interface. In fact even Sam Altman mentioned this on Lex Fridman's podcast, IIRC - he said what he was most surprised about was the outsized effect the interface had on bringing LLM to the forefront of public consciousness, despite the existing maturity of the underlying GPT models. At least in that case it was OpenAI adding interactivity to its own existing models. But similarly, from a more holistic viewpoint, OpenAI productized existing research from Google. Transformer models were "old tech" since Google published "Attention is all you need" in 2017... and yet, when OpenAI managed to turn it into a usable product, suddenly they became the first movers and the company to beat. So I'm not convinced that only a "dumb" investor would fund an effort with a proven ability to productize "old tech."

Zetobal · on May 19, 2023

The I don't understand the technology but will ramble about stuff until they just give up reading the comment approach ¯\_(ツ)_/¯

chatmasta · on May 19, 2023

Are you referring to my comment? I'm certainly no expert on AI, and if I'm misunderstanding the technology I'd like to know. What is wrong about what I wrote?

Zetobal · on May 19, 2023

Yes, I am referring to your comment and I am not going to explain why everyone and their mother jumped ship from GANs and went all in on transformers. Well, there is still the Alan Turing Institute in the UK but even they gave up and are into NFTs now :D

npunt · on May 19, 2023

This reads a lot like 'dropbox is trivial, rsync already exists'