Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Drag Your GAN: Interactive Point-Based Manipulation of Images (mpg.de)
184 points by waqasy on May 19, 2023 | hide | past | favorite | 86 comments


The semantic understanding feels much richer than diffusion based modeling, e.g. the trees on the shore growing to match the manipulated reflection, the sun changing shape as it's moved up on the horizon, the horse's leg following proper biomechanics as its position is changed. I haven't gotten such a cohesive world model when doing text-guided in-painting with stable diffusion etc. This feels like it could very conceivably be guided by an animation rig with temporally consistent results.


Temporal consistency for a guided scene is a separate problem. It's been kind of solved a couple years ago. [0] It can be used with animation rigs and simplistic tagged geometries, and it even works in near real-time. "Kind of" because training a model from scratch from a large dataset is not something you want to do for the actual job; what you want is a good style transfer mechanism that can extract features from as few references as possible.

[0] https://isl-org.github.io/PhotorealismEnhancement/



Has anyone built the "online photoshop that incorporates all of the latest AI image editing tools asap and sells access as a premium subscription with lots of GPU access for smooth editing" business yet? I'd be curious to know.



Adobe AI is a crap compared to Midjourney


Because one is on proper stock art and the other on anything without asking the creators for authorization?

"AI art tools Stable Diffusion and Midjourney targeted with copyright lawsuit" - https://www.theverge.com/2023/1/16/23557098/generative-ai-ar...


Search on Twitter, the artists in the Adobe dataset are also livid (justifiably) because they didn't consent to their work being used for AI training. The Adobe license agreement is broad enough that Adobe is covered legally, but it isn't enthusiastic consent in any sense of the word. Many, many artists would never have submitted their work to Adobe if generative AI had been a known possibility, so using Adobe's product is really not any better at respecting creators' wishes.


So like two drunk people having sex and the law trying to figure out who consented


Midjourney is crap compared to stable diffusion with all the features in automatic1111


It's trained on their stock art and under-performs Stable Diffusion and Midjourney.

It's really poor, comparatively.


It makes way fewer visual mistakes (like wrong number of limbs) than Stable Diffusion, or even Bing Dall-E ~3. The latter is still the best at understanding your prompt though.


I think less effort has gone into image editing compared to image generation so far. That said, we're building some photo realistic image editing tools at https://www.faceshape.com, focused on face editing for now. Current models don't perform as well, but next generation currently under training will.

I'm always curious to know what kind of AI image editing people are interested in, can you share what kind of edits you'd like to do? There's the usual edits like background removal or object removal, but those are more general tools that are getting incorporated into lots of apps natively (say Google Photos).


Tried your app, I liked the product idea but think the execution could use much more work. Personally, I am obsessed with FaceApp’s filters. If you could make an app with even more interesting filters but with the same (or better) realistic quality, I’d definitely use it :)


Playground AI (https://playgroundai.com/) does a lot of this.


There are a million of these. It's a super crowded space.

https://civitai.com/

https://lexica.art/

https://openart.ai/

(Many more)


They are all for image creation. I would love to have one to edit my photos.. not generate images from prompt.


You can edit your photos with the models and tools for image creation, because those tools can work on a source image (img2img, inpainting, outpainting) as well as just a prompt and blank canvas.



It would be cool if https://www.photopea.com supported those types of AI manipulation plugins.


Isn't that stability.ai's business model?


I like the direction of github.com/invoke-ai/. It isn't a business but an open source project.


Hello, post-truth world!

More seriously, I think that digital photos, and especially low-res surveillance camera coverage, will soon be inadmissible in any reasonable court, because tools like this would allow to forge such evidence in very natural-looking ways.


Being able to fake something really doesn't matter all that much, as you'd still need to get that fake video into the surveillance camera system and you need to do so in the time between committing the crime and the police arriving, and without leaving a trace and hoping that whoever you try to incriminate doesn't have an alibi.

Fakes will be relevant for Twitter, TikTok and Co., where random videos are posted and distributed without sources, heavily edited and compressed, such that it is impossible to tell if that video ever started out as a real video or a fake. But in court the whole thing starts to fall apart the moment they ask where that video came from.


Ah, but for law enforcement types to say "hey, look what we found on the surveillance video server!" is a potential problem. The power asymmetry in society is ripe for exploitation by the powerful. They just need some goons - of which there are plenty - to do the dirty work.


Without knowing what the current ones are, I'd guess we need improved chain-of-custody laws for evidence immediately!


There are probably ways to embed cryptographic hashes within images. Any device that creates images from the real world could have secret keys that can be used to validate any image created by said device.

We will still need a centralized party that holds the secret keys for validation through


What about the recording device itself? Refeed a video/frame/hash back to a security camera and it tells you if it was originally sourced from that specific camera or not.


The sci-fi dystopian answer would be entangled photon lights and off site image recording that preserves entanglement :-)


This is the right approach, but there’s lots of complexity around transcoding. In the courtroom that’s less of an issue if you can get the original unmodified outputs, but broader applications need to think through what it means to be “verified”


You don't need a centralized database. You have 3 companies 3 different hashes. They catch bob stealing, on 3 different cameras. Each footage can be independently verified as authentic and not doctored. This would prevent anonymous found footage suggesting someone committed a crime.

You definitely don't want one single leakable entity.


What shall we do with millions upon millions of existing mobile phones, and also surveillance cameras and dashcams?

Some of them possibly could be updated, but this will take time. Securing the keys within them is also going to be a problem; not all of them have a TPM.


There's no way this would work, either the master encryption key would be leaked, or someone would reverse-engineer the chip.

Also, what about someone putting a screen just in front of the sensor?


No need for that.

A private key is generated on device and never leaves it. It sits inside a TPM or equivalent.

The public key is pushed to a well-known site, visible to all.

Every shot is signed by hashing the bits into a reasonably short string (say, using sha512) and then encrypted with the private key.

Anyone can now decrypt the hash, and compare it with the hash they computed from the bits.

The problem, of course, is that any transformation whatsoever breaks the signature. You can't adjust levels and contrast, you can't even crop. Maybe it's a good property.


Why would you trust the key if it was generated on the device? Anyone could generate a key. You would either need a centralized server to sign that key, in which case you have to trust a centralized company / government agency.


You have always been able to forge photographic evidence in natural looking ways (most easily by using completely real photography with misrepresented time or context).

And, of course, the easiest to falsify evidence (eyewitness testimony) is still admissible.

That's why you have to provide support for provenance, all of which is subject to examination and counterevidence.


Seeing how slowly laws move, the cynic says it's: should be inadmissible in any reasonable court, but will continue to be admissible, and will take a major set of incidents for changes to be enacted across many countries.


Yes. The story if admissable DNA evidence is instructive and terrifying.

https://daily.jstor.org/forensic-dna-evidence-can-lead-wrong...



Longer video with more examples from one of the paper authors:

https://twitter.com/XingangP/status/1659483374174584832


Looks like it can't keep the background stable, so I guess this is not suitable for animations.


Neat concept. I wonder if something like that can be applied to diffusion models (the new kid on the block that is outshining GANs right now) - especially since the technique doesn't seem to be too dependent on the generative image implementation.

Also, it's interesting that they're submitting to SIGGRAPH - kind of expected this to be in a more ML-ish conference.


Probably goes to show where SIGGRAPH is headed.


The main link you provided is either not loading, or loading with missing video links, for me - maybe hugged to death at the moment?

Github may be more resilient: https://github.com/XingangPan/DragGAN


As a technology this is tremendously impressive, straight out of SF movie. However I wonder how it will impact our culture, fashion, standards of beauty etc. when more and more artists will be accepting the generated output then creating their own. Just like in case of music the invention of MIDI, synths and sequencers brought new styles but also boring and imagination-numbing standardization.


DragGAN consists of two main components including: 1) a feature-based motion supervision that drives the handle point to move towards the target position, and 2) a new point tracking approach that leverages the discriminative GAN features to keep localizing the position of the handle points.


There's also an older work called PuppetGAN http://ai.bu.edu/puppetgan/


It's similar but different.

Your paper makes an existing body a puppet.

The other one adjusts features.


I'm thinking that soon video and images are going to be just dead weight in journalism, adding nothing other than decoration.


It seems to be a pretty common thing now on news network websites, that at the top of the story, or perhaps somewhere in the middle, there is a video. But, you click on the video, and it's something entirely unrelated to the article. I feel like this as been going on for quite a while, nothing to do with synthetic media, but just an observation of an annoying pattern I've noticed.


Stock photos are already like that. To me they're worse than having no pictures at all. Generated images are better than nothing because you can make them very specific.


I think there has to be some kind of cryptographic signature solution, like "The BBC verifies that this image is authentic".


We already have most of the required cryptographic primitives for this. PKI, trusted timestamping, secure hardware with remote attestation are some of the necessary building blocks. All that's really missing are camera sensors with built-in cryptography.

And societal care. Our society seems to really like to whine about "the danger of deepfake images", but our actions reveal that we don't really give a crap, as we could solve this problem today if we really wanted to.


I think it would also need to have some kind of embedded version control from camera through Photoshop so you know what was changed.


If you trust how the bbc tells the story, you also trust them when they say the image is true. Not a technical problem IMO.


Sure but then the image gets copied shared reposted etc. If the images had cryptographic provenance included, sites like Reddit or Twitter could verify and display that, or lack thereof.

I would use the same infrastructure as HTTPS, so an image could be signed by bbc.co.uk. That way if you trust that domain you could trust the document, wherever you come across it.


But why? If it is on bbc s website bbc is already saying it is good. Same for any other media. The technical approach is interesting but not needed IMO in this case.


Like I said, because you might come across the image on Reddit or Twitter but have no idea if it's real or not.


Why just the video and images? Half of the stories on the Internet are created out of whole cloth by AI now, never mind the embedded media...


Best demo would be to take some famous photo, find its position in gan-space then move it around.


Would love to see this for architecture!


What do we need human actors for at this point? Everything can be generated by AI now.


Please try to make an original movie that meets the standard of, say, The Godfather, with AI.


I understand your point but I think it would be easier with AI than without. Many movies are not made to the standard of The Godfather because they don't sell like MCU Movie #53 and if you include more humans in the creation you're more likely to run into the current system's restrictions.

Making a movie as beloved as the Godfather would still be challenging of course.


As long as they don't suck the air out of funding for real movies I can live with it, but I'd still be sad that people are being trained to like auto-generated junk. Like how people are losing their ability to concentrate on long-form content due to overexposure to addictive short-form content.


Of course it would be easier. I agree. I just take issue with the "why humans" thing because if anything, the recent advancements highlight just how big the human element really is.

Can you imitate a Bach prelude? Sure. And only people who aren't actually familiar with his music would be impressed.

Much of AI approaching "human performance", is it approaching the lowest bar. There's a Wittgenstein thing going on here. That an LLM can ace the LSAT or GMAT is mostly an indictment of those tests.

A little off topic.


>That an LLM can ace the LSAT or GMAT is mostly an indictment of those tests.

These kind of comments are always the funniest. You can just tell the person who makes them has never looked at those tests nevermind attempted them.


I scored 159 on the LSAT in 2014, so I am not claiming the tests are easy. I am pointing out that when an AI aces them, it says more about the test than anything else.


No it doesn't lol because you can insert any test you like into your equation. GPT-4 performs well above average on almost anything you throw at it.

"Says more about [insert test]" is not an intelligent argument. It doesn't even make sense. Can you tell me exactly what this mysterious thing is ?

If you have this secret test for "true" intelligence and understanding the entire world is missing on then please share it with us and get your acclaim.


We must be speaking past each other. I am not out for acclaim and sorry for any confusion. Everything you say is exactly the point I'm trying to make, evidently clumsily. The "mysterious thing" is the human element. I don't know what else to call it? Humans that ace tests prove only that they are good at acing tests. Not that they're good at running businesses or practicing law. Not creating films (in this example), or music, etc.

I am not knocking the advancements, the capabilities are incredible. But machines have been doing what humans cannot since the dawn of time. I'm just pointing out what I think (thought?) was obvious: machines will soon be able to do just about everything that doesn't really matter.

PS. are you familiar with Wittgensteins ruler? Ask chatGPT about it.


I didn't say test to mean just standardized tests lol. I meant that as problems you throw at it.

Sure seem good enough at law that multiple of the biggest law firms have partnered with Open ai backed Harvey https://twitter.com/ai__pub/status/1644735555752853504 https://www.lawnext.com/2023/04/harvey-ai-raises-21m-in-a-se...

and then there's what microsoft are doing with 4 in medicine. https://arstechnica.com/information-technology/2023/04/gpt-4...

There is no "human element" lol. That's the point. That's how you know the argument has no ground. People resort to "human element" when they have nothing to actually say. and because "human element" has no meaning, the goal posts for it just keeps getting moved further and further. apparently now we're at "make the godfather".


I'm not a critic of AI or moving any goal posts. I'm not lobbing comments in a vacuum. I was responding directly to the comical proposal that we don't need actors anymore, to which my Godfather comment has every relevance. Thanks anyway!


Guess i just don't think your comment has as much relevance as you think it does. Remove the "with ai" and nothing actually changes.

"Please try to make an original movie that meets the standard of The Godfather, without AI" and lets see how well that goes.

Is the human that fails this task also missing the "human element" ?



Please try to make an original movie that meets the standard of The Godfather, with or without AI.


> What do we need human actors for at this point? Everything can be generated by AI now.

For blockbusters? If the tech still isn't there, it may be soon, and we won't need actors.

For cinema where we care it was made by humans, for humans? Actors will always be needed. Also, theater still exists and people enjoy it.


Theater is a niche now. And whenever I go to the theater with my wife we are the only members of the audience under 50. Doesn't look like this form of art will survive very long. I don't think people care very much whether characters are played by real humans. It used to be that dangerous stunts were performed live. Nobody cares that they were replaced by CGI. Nobody cares that Tom Cruise doesn't really jump out of the burning helicopter.


What do you mean, "a niche"?

I mean, sure. Reading is a niche, too. HN is a niche as well. Almost everything you care about is a niche.

But back to acting: Broadway and off Broadway exist. Maybe it's not doing so well, I wouldn't know: I don't live in the US... but theaters exist in my city. Both big and indie plays are conducted by young people, for young people. People watch them. People act in them.

It's mistaken to believe that tech will replace things that people value other human beings doing. Theater and cinema -- barring blockbusters -- are not "processes" to "optimize". They exist for their own sake. People love watching other people act.

Want to know what else people love doing: acting themselves! Acting classes are everywhere.

So excuse my extreme skepticism: human actors aren't going anywhere.

Maybe Tom Cruise in Top Gun 5 will be auto-generated by AI, who cares? Those blockbusters sure are within reach of AI, since it's all about the thrills and no-one really cares about the acting behind all those CGI scenes.

> Doesn't look like this form of art will survive very long.

Art is more resilient than you give it credit for. Art has been with us -- mankind -- since our beginnings, and it will never be gone. It's something humans crave doing.


According to this logic marvel superhero movies will soon be the only existing form of art.


Not even close by several orders of magnitude across two dozen disciplines. But yea, we're heading there.


The only thing that's new is the interactive interface the rest of it is old tech... You can use it on art breeder.com. Photoshop even has it in their face neural filter. GANs are not feasible for a variety of reasons you need to have models specific to your subject ie. why they change to a elephant model to manipulate the elephant.they are also not style agnostic but it's a great demo and the right time to release it. Just before the summit of the hype curve I bet one VC is dumb enough to throw millions at them.


With ChatGPT, the only thing new was the chat interface. In fact even Sam Altman mentioned this on Lex Fridman's podcast, IIRC - he said what he was most surprised about was the outsized effect the interface had on bringing LLM to the forefront of public consciousness, despite the existing maturity of the underlying GPT models. At least in that case it was OpenAI adding interactivity to its own existing models. But similarly, from a more holistic viewpoint, OpenAI productized existing research from Google. Transformer models were "old tech" since Google published "Attention is all you need" in 2017... and yet, when OpenAI managed to turn it into a usable product, suddenly they became the first movers and the company to beat. So I'm not convinced that only a "dumb" investor would fund an effort with a proven ability to productize "old tech."


The I don't understand the technology but will ramble about stuff until they just give up reading the comment approach ¯\_(ツ)_/¯


Are you referring to my comment? I'm certainly no expert on AI, and if I'm misunderstanding the technology I'd like to know. What is wrong about what I wrote?


Yes, I am referring to your comment and I am not going to explain why everyone and their mother jumped ship from GANs and went all in on transformers. Well, there is still the Alan Turing Institute in the UK but even they gave up and are into NFTs now :D


This reads a lot like 'dropbox is trivial, rsync already exists'




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: