The semantic understanding feels much richer than diffusion based modeling, e.g. the trees on the shore growing to match the manipulated reflection, the sun changing shape as it's moved up on the horizon, the horse's leg following proper biomechanics as its position is changed. I haven't gotten such a cohesive world model when doing text-guided in-painting with stable diffusion etc. This feels like it could very conceivably be guided by an animation rig with temporally consistent results.
Temporal consistency for a guided scene is a separate problem. It's been kind of solved a couple years ago. [0] It can be used with animation rigs and simplistic tagged geometries, and it even works in near real-time. "Kind of" because training a model from scratch from a large dataset is not something you want to do for the actual job; what you want is a good style transfer mechanism that can extract features from as few references as possible.
Has anyone built the "online photoshop that incorporates all of the latest AI image editing tools asap and sells access as a premium subscription with lots of GPU access for smooth editing" business yet? I'd be curious to know.
Search on Twitter, the artists in the Adobe dataset are also livid (justifiably) because they didn't consent to their work being used for AI training. The Adobe license agreement is broad enough that Adobe is covered legally, but it isn't enthusiastic consent in any sense of the word. Many, many artists would never have submitted their work to Adobe if generative AI had been a known possibility, so using Adobe's product is really not any better at respecting creators' wishes.
It makes way fewer visual mistakes (like wrong number of limbs) than Stable Diffusion, or even Bing Dall-E ~3. The latter is still the best at understanding your prompt though.
I think less effort has gone into image editing compared to image generation so far. That said, we're building some photo realistic image editing tools at https://www.faceshape.com, focused on face editing for now. Current models don't perform as well, but next generation currently under training will.
I'm always curious to know what kind of AI image editing people are interested in, can you share what kind of edits you'd like to do? There's the usual edits like background removal or object removal, but those are more general tools that are getting incorporated into lots of apps natively (say Google Photos).
Tried your app, I liked the product idea but think the execution could use much more work. Personally, I am obsessed with FaceApp’s filters. If you could make an app with even more interesting filters but with the same (or better) realistic quality, I’d definitely use it :)
You can edit your photos with the models and tools for image creation, because those tools can work on a source image (img2img, inpainting, outpainting) as well as just a prompt and blank canvas.
More seriously, I think that digital photos, and especially low-res surveillance camera coverage, will soon be inadmissible in any reasonable court, because tools like this would allow to forge such evidence in very natural-looking ways.
Being able to fake something really doesn't matter all that much, as you'd still need to get that fake video into the surveillance camera system and you need to do so in the time between committing the crime and the police arriving, and without leaving a trace and hoping that whoever you try to incriminate doesn't have an alibi.
Fakes will be relevant for Twitter, TikTok and Co., where random videos are posted and distributed without sources, heavily edited and compressed, such that it is impossible to tell if that video ever started out as a real video or a fake. But in court the whole thing starts to fall apart the moment they ask where that video came from.
Ah, but for law enforcement types to say "hey, look what we found on the surveillance video server!" is a potential problem. The power asymmetry in society is ripe for exploitation by the powerful. They just need some goons - of which there are plenty - to do the dirty work.
There are probably ways to embed cryptographic hashes within images. Any device that creates images from the real world could have secret keys that can be used to validate any image created by said device.
We will still need a centralized party that holds the secret keys for validation through
What about the recording device itself? Refeed a video/frame/hash back to a security camera and it tells you if it was originally sourced from that specific camera or not.
This is the right approach, but there’s lots of complexity around transcoding. In the courtroom that’s less of an issue if you can get the original unmodified outputs, but broader applications need to think through what it means to be “verified”
You don't need a centralized database. You have 3 companies 3 different hashes. They catch bob stealing, on 3 different cameras. Each footage can be independently verified as authentic and not doctored. This would prevent anonymous found footage suggesting someone committed a crime.
You definitely don't want one single leakable entity.
What shall we do with millions upon millions of existing mobile phones, and also surveillance cameras and dashcams?
Some of them possibly could be updated, but this will take time. Securing the keys within them is also going to be a problem; not all of them have a TPM.
A private key is generated on device and never leaves it. It sits inside a TPM or equivalent.
The public key is pushed to a well-known site, visible to all.
Every shot is signed by hashing the bits into a reasonably short string (say, using sha512) and then encrypted with the private key.
Anyone can now decrypt the hash, and compare it with the hash they computed from the bits.
The problem, of course, is that any transformation whatsoever breaks the signature. You can't adjust levels and contrast, you can't even crop. Maybe it's a good property.
Why would you trust the key if it was generated on the device? Anyone could generate a key. You would either need a centralized server to sign that key, in which case you have to trust a centralized company / government agency.
You have always been able to forge photographic evidence in natural looking ways (most easily by using completely real photography with misrepresented time or context).
And, of course, the easiest to falsify evidence (eyewitness testimony) is still admissible.
That's why you have to provide support for provenance, all of which is subject to examination and counterevidence.
Seeing how slowly laws move, the cynic says it's: should be inadmissible in any reasonable court, but will continue to be admissible, and will take a major set of incidents for changes to be enacted across many countries.
Neat concept. I wonder if something like that can be applied to diffusion models (the new kid on the block that is outshining GANs right now) - especially since the technique doesn't seem to be too dependent on the generative image implementation.
Also, it's interesting that they're submitting to SIGGRAPH - kind of expected this to be in a more ML-ish conference.
As a technology this is tremendously impressive, straight out of SF movie. However I wonder how it will impact our culture, fashion, standards of beauty etc. when more and more artists will be accepting the generated output then creating their own. Just like in case of music the invention of MIDI, synths and sequencers brought new styles but also boring and imagination-numbing standardization.
DragGAN consists of two main components including: 1) a feature-based motion supervision that drives the handle point to move towards the target position, and 2) a new point tracking approach that leverages the discriminative GAN features to keep localizing the position of the handle points.
It seems to be a pretty common thing now on news network websites, that at the top of the story, or perhaps somewhere in the middle, there is a video. But, you click on the video, and it's something entirely unrelated to the article. I feel like this as been going on for quite a while, nothing to do with synthetic media, but just an observation of an annoying pattern I've noticed.
Stock photos are already like that. To me they're worse than having no pictures at all. Generated images are better than nothing because you can make them very specific.
We already have most of the required cryptographic primitives for this. PKI, trusted timestamping, secure hardware with remote attestation are some of the necessary building blocks. All that's really missing are camera sensors with built-in cryptography.
And societal care. Our society seems to really like to whine about "the danger of deepfake images", but our actions reveal that we don't really give a crap, as we could solve this problem today if we really wanted to.
Sure but then the image gets copied shared reposted etc. If the images had cryptographic provenance included, sites like Reddit or Twitter could verify and display that, or lack thereof.
I would use the same infrastructure as HTTPS, so an image could be signed by bbc.co.uk. That way if you trust that domain you could trust the document, wherever you come across it.
But why? If it is on bbc s website bbc is already saying it is good. Same for any other media. The technical approach is interesting but not needed IMO in this case.
I understand your point but I think it would be easier with AI than without. Many movies are not made to the standard of The Godfather because they don't sell like MCU Movie #53 and if you include more humans in the creation you're more likely to run into the current system's restrictions.
Making a movie as beloved as the Godfather would still be challenging of course.
As long as they don't suck the air out of funding for real movies I can live with it, but I'd still be sad that people are being trained to like auto-generated junk. Like how people are losing their ability to concentrate on long-form content due to overexposure to addictive short-form content.
Of course it would be easier. I agree. I just take issue with the "why humans" thing because if anything, the recent advancements highlight just how big the human element really is.
Can you imitate a Bach prelude? Sure. And only people who aren't actually familiar with his music would be impressed.
Much of AI approaching "human performance", is it approaching the lowest bar. There's a Wittgenstein thing going on here. That an LLM can ace the LSAT or GMAT is mostly an indictment of those tests.
I scored 159 on the LSAT in 2014, so I am not claiming the tests are easy. I am pointing out that when an AI aces them, it says more about the test than anything else.
No it doesn't lol because you can insert any test you like into your equation. GPT-4 performs well above average on almost anything you throw at it.
"Says more about [insert test]" is not an intelligent argument. It doesn't even make sense. Can you tell me exactly what this mysterious thing is ?
If you have this secret test for "true" intelligence and understanding the entire world is missing on then please share it with us and get your acclaim.
We must be speaking past each other. I am not out for acclaim and sorry for any confusion. Everything you say is exactly the point I'm trying to make, evidently clumsily. The "mysterious thing" is the human element. I don't know what else to call it? Humans that ace tests prove only that they are good at acing tests. Not that they're good at running businesses or practicing law. Not creating films (in this example), or music, etc.
I am not knocking the advancements, the capabilities are incredible. But machines have been doing what humans cannot since the dawn of time. I'm just pointing out what I think (thought?) was obvious: machines will soon be able to do just about everything that doesn't really matter.
PS. are you familiar with Wittgensteins ruler? Ask chatGPT about it.
There is no "human element" lol. That's the point. That's how you know the argument has no ground. People resort to "human element" when they have nothing to actually say. and because "human element" has no meaning, the goal posts for it just keeps getting moved further and further. apparently now we're at "make the godfather".
I'm not a critic of AI or moving any goal posts. I'm not lobbing comments in a vacuum. I was responding directly to the comical proposal that we don't need actors anymore, to which my Godfather comment has every relevance. Thanks anyway!
Theater is a niche now. And whenever I go to the theater with my wife we are the only members of the audience under 50. Doesn't look like this form of art will survive very long. I don't think people care very much whether characters are played by real humans. It used to be that dangerous stunts were performed live. Nobody cares that they were replaced by CGI. Nobody cares that Tom Cruise doesn't really jump out of the burning helicopter.
I mean, sure. Reading is a niche, too. HN is a niche as well. Almost everything you care about is a niche.
But back to acting: Broadway and off Broadway exist. Maybe it's not doing so well, I wouldn't know: I don't live in the US... but theaters exist in my city. Both big and indie plays are conducted by young people, for young people. People watch them. People act in them.
It's mistaken to believe that tech will replace things that people value other human beings doing. Theater and cinema -- barring blockbusters -- are not "processes" to "optimize". They exist for their own sake. People love watching other people act.
Want to know what else people love doing: acting themselves! Acting classes are everywhere.
So excuse my extreme skepticism: human actors aren't going anywhere.
Maybe Tom Cruise in Top Gun 5 will be auto-generated by AI, who cares? Those blockbusters sure are within reach of AI, since it's all about the thrills and no-one really cares about the acting behind all those CGI scenes.
> Doesn't look like this form of art will survive very long.
Art is more resilient than you give it credit for. Art has been with us -- mankind -- since our beginnings, and it will never be gone. It's something humans crave doing.
The only thing that's new is the interactive interface the rest of it is old tech... You can use it on art breeder.com. Photoshop even has it in their face neural filter. GANs are not feasible for a variety of reasons you need to have models specific to your subject ie. why they change to a elephant model to manipulate the elephant.they are also not style agnostic but it's a great demo and the right time to release it. Just before the summit of the hype curve I bet one VC is dumb enough to throw millions at them.
With ChatGPT, the only thing new was the chat interface. In fact even Sam Altman mentioned this on Lex Fridman's podcast, IIRC - he said what he was most surprised about was the outsized effect the interface had on bringing LLM to the forefront of public consciousness, despite the existing maturity of the underlying GPT models. At least in that case it was OpenAI adding interactivity to its own existing models. But similarly, from a more holistic viewpoint, OpenAI productized existing research from Google. Transformer models were "old tech" since Google published "Attention is all you need" in 2017... and yet, when OpenAI managed to turn it into a usable product, suddenly they became the first movers and the company to beat. So I'm not convinced that only a "dumb" investor would fund an effort with a proven ability to productize "old tech."
Are you referring to my comment? I'm certainly no expert on AI, and if I'm misunderstanding the technology I'd like to know. What is wrong about what I wrote?
Yes, I am referring to your comment and I am not going to explain why everyone and their mother jumped ship from GANs and went all in on transformers. Well, there is still the Alan Turing Institute in the UK but even they gave up and are into NFTs now :D