Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The Apple Psi System [pdf] (apple.com)
168 points by lavp on Aug 8, 2021 | hide | past | favorite | 98 comments


This is incredibly annoying. They're providing all of this information which says that this program that runs on our devices is incredibly safe if we're not bad people. That the people at Apple and then law enforcement don't see anything about our photos until there is a human set threshold hit by a human programmed algorithm. Technically, this should be sound since the proof has checked out with multiple cryptography experts. But none of that matters.

The Apple PSI System is spyware.

They are providing all of this info to justify putting spyware on our devices. They are attempting to put spyware on our devices to see if we can be sent to jail. That's all that matters. That is the end effect.

Apple is justifying putting SPYWARE ON ALL OF THEIR PHONES. Any discussion of the technical merits of a SPYWARE system implemented against you is missing the point. It should not exist.


> until there is a human set threshold hit by a human programmed algorithm.

Humans at Apple thought nobody would have percent signs in their wifi names.

Programming is hard, not because knowing how to tell the computer what you want is hard, but because thinking about all the ways people will use your software forever is hard.


Yup, the horrible thing is not the current system itself, but the precedent they set.

And if it's a semi-exact match algorithm like they say, it won't actually prevent any new abuse from happening.


The precedent has been set long ago by Facebook, Google, Microsoft, Drop box and all of the other cloud image providers. From my understanding, the difference is that they scan in the cloud after being uploaded, which requires they decrypt your images in the cloud, rather than what Apple wants to do, scan on your device before uploading, so they don’t have to decrypt in the cloud. None of this scanning happens if the images aren’t destined for iCloud. This is moving he scan for the cloud service, that everyone is doing, to your phone. That’s the only difference that I see. Am I seeing it wrong?


For now, yes. But people are worried that that's just the first step to get the snooping service onto your phone. Because why stop here? Soon it maybe scanning photos not just destined for iCloud.

And why draw the line at photos? Videos, music, text files, anything could be interesting for governments, film studios and record companies. Why do you have an MP4 of the latest Spider-Man movie on your phone? Did you legally buy this Miles Davis album? Is this an image of a rainbow flag on your phone, my dear Russian citizen?

It's just a small step from here to constantly scanning all your local files on iOS, padOS and macOS because someone, somewhere might want to take a look at them.


It is closed source system. Nothing is stopping them doing it right now without telling it. And it might take years for someone to notice. All we can do is trust, and hope for their best intentions. Speculation is not strong argument in black box systems.


> It is closed source system. Nothing is stopping them doing it right now without telling it. And it might take years for someone to notice. All we can do is trust, and hope for their best intentions. Speculation is not strong argument in black box systems.

Yes, but ignoring the bad things that they tell you they're doing because they might be doing even worse things that they aren't telling you isn't a great stand to take.


If they move on into the speculated things, then we should take a stand. Current way is improvement what has been already happening.


Yes they can do it right now, but that would be illegal and couldn’t be used for any good. Now they make it legal- that’s the difference.


It has been legal for 10 years at least already. In-device scanning applies only into photos which are ending into the cloud.

https://www.govinfo.gov/app/details/USCODE-2011-title18/USCO...


The precedent is scanning on the user's device, yes. As far as I understand they are trying to limit the stream of subpoenas for users data by encrypting everything, without being called pedo enablers or something like that. But what they are doing is in effect showing the governments and anti-freedom activists that it is in fact possible to do this on users' devices, thus building a foundation for further expansion of such scanning to all data, sent or stored, on all apps, on the operating system level, and not only of their own initiative, but as a result of a legislation. And then why would you not fight other things that are currently considered to be societal ills? It's the next logical step. There are plenty of countries where you could get real prison sentences for writing or saying things, all with "well intentioned" justifications of course. We are steadily moving towards a "digital gulag", and Apple is paving the way. All of this also shows those who value freedom as pedo enablers, forcing other companies to also implement similar processes.


Sure, this particular move strictly improves privacy. But it has some terrible implications:

1. That there are circumstances in which it's okay to design an electronic device to work against its owner: an accusation is all it takes to ruin your life. Existing cloud services only scan their own computers, of course.

2. Because it provides more privacy than existing cloud services, people (esp. governments) might come to see this as an acceptable compromise between user privacy and law enforcement power.


I don't agree that it improves privacy.

Pre-emptive searches of ones private life should never be ok. Even if it's done by some "blind" algorithm.

And we should stand firmly that our phones and personal electronic devices should fall within the legal boundaries of private life.

We've already accepted some bad legal precedent with 3rd party doctrine and related loopholes, no expectation of privacy justifications, non negotiable ToSs that surrender ownership of data, and sharing of private data obtained under pretenses of it being required for whatever service you're signing up for.

With apple running privacy invading software on the users device, how long until a judge declares that there's no expectation of privacy by plugging information into your phone. After all apple is a 3rd party, and you were aware that they are scanning all files, which create new linked files owned and decrypt-able by apple.

There might be a short-term technical gain, but there's a long term legal and privacy loss.


That difference is a very significant one.

The scanning either happens on devices that the company owns, the cloud servers, or it happens on a device that the consumer owns, without the consumer being able to control it.


The consumer can control it by not enabling iCloud photo sync, right?


The consumer can't stop an application from automatically running without crippling an unrelated process? That doesn't sound right.

Apple already can run this server-side, on their own devices, and not by installing a backdoor on a device someone purchased.


They're providing all of this information which says that this program that runs on our devices is incredibly safe if we're not bad people.

That's not far from saying "only criminals have something to hide".

Seeing this come from Apple, a company which has won a lot of popularity with its stance on privacy, is absolutely astounding and definitely makes one wonder whether there is an ulterior motive.


Stop defending Apple! They have always handed over their Chinese users' data to the authorities. They were always willing to compromise user privacy to the state.

They once had a marketing stunt where they wouldn't unlock an iPhone for the FBI only after they had already handed them decrypted backups from the very same phone.


The marketing stunt was police throwing a fit and not the other way around.


We need to draw the line somewhere. Maybe this is ok. But where does it end? What if we have a Neuralink-like device? Is there ever a line too far?


The line is here. And this is absolutely not OK.

It is ripe for abuse by governments, which Apple absolutely has a history of bowing to.

Saudi Arabia WILL use this to track dissidents and homosexuals. China WILL use this for whatever they need, on a daily basis. Hell even the US will use this, I'm sure.

The only way this would be OK was if the input data (the CSAM database) was cryptographically signed and no government, no single entity and not even Apple, could change the content, with the only signing key split in 10 parts to be held personally by Bruce Schneier, the Pope, Linus Torvalds, the Orthodox Patriarch, Keanu Reeves, a couple head Rabbis and a few of whatever their equivalent in Islam is, and they had to personally review the images one by one and certify that perceptual hash 0FB89C8A7DF6AA1945B is indeed CSAM content and agree to collectively sign it for addition.

This would only work if the full IOS source was fully published, and compiled as a reproducible build, so everyone could confirm the scanning code does what it's supposed to, and is not altered with any subsequent update.

P.S. Don't nitpick on the names, it was a deliberately absurd list of people either with a good reputation or with a lot to lose in the respectively chosen afterlife.


This is the line and it's not ok. I am astounded that anyone here is entertaining the idea that it is somehow ok. Take out the "CSAM" noun from the equation - it really poisons the argument with emotion. "Apple devices check your files against government blacklist" is the headline, and not enough of us are saying "no."


Well said. I knew it wouldn’t take long for discourse to switch to the morality, technical implementation etc. That is of course no accident.


If I don't have a root access to the device - it is not my device.


If you have root access it's also not your device, because you probably don't have BUP, BSP and SMM/SV mode access. And if you have those it's still not yours because you don't have access to the EDA source (or at least not much more detail than the basic silicon floorpan).

This race to the bottom of 'ownership' ends nowhere. Do you really own your device if it uses NDA or licensed parts? (be it software or hardware)

Do you really own your device if it depends on communications with other devices that you do not own? What about the first communication with a cell tower? The eNodeB? The RAN? The HLB? Or if you are communicating with someone, should you own their device as well?

Say your boundary is communications, what communications are we talking about? SPI? I2C? MII? The interface between the baseband and de application processor? The matrix scanner on a physical keyboard?

I'm not saying it's amazing to have an appliance where you don't control every aspect, but it's also not realistic to have a free-for-all at scale either. At the end of the day, when you live in a diverse society, not every device your pay for is a device that is your device. And that can be fine.

(Diverse doesn't universally mean religion, shades of skin or nationality - we're talking about carpenters, mechanics, teachers, artists, bus drivers, bakers, none of those will ever 'own' a device or software stack at scale; if you put a bunch of people together, organise them and specialise, not everyone will 'do tech' to the same degree, if any)


This is a way worse of a spyware than I initially thought.

- They have a database of file hashes.

- They can’t validate its contents by design.

- They can’t explain who supplies it in detail.

- The suppliers of these data work with / close to the US gov.

- The match your own files on your own device against this database that contains who knows what.

- When it matches a couple of times, they alert the authorities.

- There is zero fucking visibility for both Apple and you by design and this is a very good thing for Apple.

- I assume the source of this data can update / add new hashes in time and your device will happily comply.

And their only concern is to say that the algorithm is so perfect, they can not see what’s happening and there won’t be false positives (hopefully).

You know what, I trust your ability to do it properly. No need to explain more.

Problem is that the very thing you are building is fucked up by design. And obviously Apple does not address it in any way.

But think about the children…


So if I understand correctly, they'll have a db of hashes which are probably tagged with some info on what's in the picture?

Who is supplying the hashes? It seems like a LOT of power will be concentrated there. They can basically decide who gets in trouble with seemingly no accountability whatsoever?


This already happens server-side now so what you mention here does not actually change.


Not on iCloud. They have all the content already encrypted on your phone. Can’t fingerprint after that. Them adding this as an on device process is due to that.

And the fact that Apple has been talking about privacy all this time (and rightly so) means their past mitigations makes this more painful. Any other vendor could just do it on the server, add a line to their Privacy Policy and nobody would know.


This is false. Up until now, iCloud photos are encrypted in transit and in store but can be decrypted with Apples own key. There is no e2ee.

Apple has been talking a lot about putting all ML and data handling on your own device so that the data does not need to leave your device without e2ee


I guess you are correct, they are not e2e encrypted. So part of my reasoning does not hold up, they could do this on the server if they wanted.

That I’d prefer to be honest.


Apple does not scan iCloud for csam because they feel that blanket scanning of iCloud violates end user privacy.

That’s the whole reason for their research into differential privacy.


> Apple has confirmed that it’s automatically scanning images backed up to iCloud to ferret out child abuse images.

https://nakedsecurity.sophos.com/2020/01/09/apples-scanning-...


Read the original telegraph article cited.

https://www.telegraph.co.uk/technology/2020/01/08/apple-scan...

>Update January 9, 2020

>This story originally said Apple screens photos when they are uploaded to iCloud, Apple's cloud storage service. Ms Horvath and Apple's disclaimer did not mention iCloud, and the company has not specified how it screens material, saying this information could help criminals.

This confirms the reporting from NYT

https://www.nytimes.com/2021/08/05/technology/apple-iphones-...

>U.S. law requires tech companies to flag cases of child sexual abuse to the authorities. Apple has historically flagged fewer cases than other companies. Last year, for instance, Apple reported 265 cases to the National Center for Missing & Exploited Children, while Facebook reported 20.3 million, according to the center’s statistics. That enormous gap is due in part to Apple’s decision not to scan for such material, citing the privacy of its users


I am thinking of the children and what worries me is wouldn't scanning for existing porn sharply increase the value of new porn and stimulate abuse of children for money.


> But think about the children…

Actually, I am thinking of the children, most child sexual abuse happens inside the home by people who personally knows the child, and this crypto mumbojumbo does nothing to address this!!!

What we actually need is for every parent to be required to install surveillance equipment, cameras, sensors and whatnot into their home.

Then *something*something* AI/ML can filter out all the excess information, but send the child abuse to the authorities…

(/s)


Of course Apple knows where the hashes come from and they know whom they send flagged account info to. You’re drawing conclusions from false assumptions.


While it is good that Apple seeks formal security analysis over its proposed system, there are some fundamental security assumptions baked in the system:

- It assumes that the server would not tamper its dataset (i,e., the list of CSAM). So it is OK to disclose the information if the client has enough matchings. But in reality, nothing prevents a malicious server adding arbitrary content to the list.

- It fails to consider the vulnerabilities of the perceptual hash. This includes false positives and adversarial collision attacks (https://arxiv.org/abs/2011.09473).

Another potential long term issue is that it is unclear how long Apple will store the safety vouchers. As a storage service, Apple may store them forever. The system is based on Elliptic-curve cryptography. Despite it is the current state-of-the-art encryption technique, it will be broken when the quantum computer becomes a reality in the future. So it is possible that every encrypted safety vouchers can be decrypted in the next 50 years.


Some apps like WhatsApp save images from messages automatically to the camera roll.

So a theoretical attacker could send you a message on WA with a set of adversarial collision images (not even child pornography potentially), and effectively SWAT you


This is already possible on both iOS (iCloud Photos) and Android (Google Photos) as both scan for CSAM server side.


Google photos does not by default upload photos in WhatsApp folder till you enable it, default is no, so no...


And everyone will be on the list as pedophiles.


> Privacy for the server: A malicious client should learn nothing about the server’s dataset X ⊆ U other than its size. In particular, it is important that the client learn nothing about the intersection size |id(Y̅ ∩ X)|. Otherwise, the client can use that to extract information about X by adding test items to its list Y̅, and checking if the intersection size changes.

If one of your pictures is a false positive hash collision, you'll have no idea until your front door gets broken down.

> Privacy for the client: Let X be the server’s input from which pdata is derived. A malicious server must learn nothing about the client’s Y̅ beyond the output of ftPSIAD with respect to this set X.

Apple can't check whether a hash match is a false positive or not, because they only get the matching hashes and not the pictures that triggered them. So if you have a bunch of false positives, your front door is getting broken down, with no opportunity for a human to realize the problem and intervene.

> The protocol need not provide correct output against a malicious client. That is, the protocol need not prevent a malicious client from causing the server to obtain an incorrect ftPSI-AD output when the protocol terminates. The reason for this is that a malicious client can always choose to hide some of its data from the PSI system in order to cause an undercount of the intersection.

Their protocol isn't (and can't be) secure against the one attack that the people that this system is supposed to catch would actually commit.

> Moreover, a malicious client that attempts to cause an overcount of the intersection will be detected by mechanisms outside of the cryptographic protocol.

This seems eerie to me but I can't put my finger on why.


"So if you have a bunch of false positives, your front door is getting broken down, with no opportunity for a human to realize the problem and intervene."

Well, technically speaking, once enough security vouchers have been submitted and reached the threshold as stated, a report will be sent to an Apple employee (somewhere). The vouchers will then, combined, be decryptable and contain grayscale low-res versions of the original image for confirmation, in which case the NCEMC (National Center for Exploited and Missing Children) will be alerted and law enforcement.

I'm just pointing out that your door being broke down is by multiple false flags and a human will get a chance to "realize the problem and intervene" before it goes to the FBI or whatever. Not saying I like this system, just making a nitpick on your criticism. I don't really know how else you could have a human intervene without "breaking down your door."


> ...be decryptable and contain grayscale low-res versions of the original image for confirmation...

But what happens when the false positives are erroneously confirmed as legitimate CSAM? What's the system in place for removing all the security vouchers on someone's account because the vision system flagged a bunch of false positives? What's the process for unfucking a person's life because the employee confirming the CSAM was in a bad mood that day? Is Apple going to pay the legal bills of someone they effectively SWATed?

For the ones of people this system might actually catch with legitimate CSAM there will be at least as many false positives slipping through to ruinous consequences. Law enforcement shouldn't be trusted as a backstop against abuse because LEOs and DAs are incentivized for "good numbers" and "results", not for actually meting out justice. If someone flagged by false positives gets to the stage of law enforcement being involved their lives will be ruined.


> "For the ones of people this system might actually catch with legitimate CSAM there will be at least as many false positives slipping through to ruinous consequences."

How are you calculating this?


Apple claims without math that it's a 1-in-a-trillion chance that your account will be incorrectly flagged.

And even then... what happens? Does the FBI actually smash down your door with a SWAT team for CSAM? In most cases, No. They go and arrest you the normal way. Unless you had a ludicrous amount of false flags (in the thousands) showing you were a dealer and had a previous history.


An Apple employee and a district judge who has been convinced there is probable cause.


I guess they have to call them "employees" in California now. Do they do this before or after becoming Facebook moderators?


> If one of your pictures is a false positive hash collision, you'll have no idea until your front door gets broken down.

This is a completely false statement. Besides the fact that it requires more than one match for an account to be flagged and flagged accounts are reviewed manually by Apple before sending a report off to NCMEC thereby catching false positives, you’re notified if your account is flagged and can file a challenge.


Where is the cryptographic proof that the only thing you are scanning for is CSAM?

This is just nice window dressing.


Its ok, because they're comparing against "a database of known CSAM image hashes provided by NCMEC"

You can trust them. You have to trust them.

Oh, "provided by NCMEC and other child-safety organizations." [1]

Other unnamed organizations. Don't worry about who they are, its not relevant. Stop thinking so hard. Stop Screeching. Trust Apple.

[1] https://www.apple.com/child-safety/pdf/CSAM_Detection_Techni...


That's the big point they don't address. Apple can't possibly check every hash of every file some obscure organization is sending to their database, especially in foreign countries.

If some men in black walk into the offices of some nice little child protecting service in Ohio and demand that they put some additional hashes into the database because otherwise something bad could happen to them or their families, does Apple really think they would decline?

It doesn't matter how secure the system is, the vulnerability is that virtually anyone can input virtually anything into the database without Apple even knowing and expose selected users that way.

This failure point is now at the heart of iOS and macOS and it's baffling that Apple doesn't see that or doesn't want to see it.

My guess is that they're somehow forced to implement this and try to talk their way out of it with some strange PR pieces which are only convincing their most naive users.


Something weird is definitely going on. I'd normally chalk it up to a powermad hubris that seems to be common among technocrats, but this is the latest example of a corporation voluntarily providing politically motivated non-profits the ability to control their platforms. This has been going on for a while on financial networks - in the pursuit of the elusive, but totally real, neonazi resurgence. Paypal recently announced such a "partnership". Microsoft has allowed some UK based group to censor their search results for a long time now. That was originally sold under the banner of "bbbut the children!", but the only reason I know about it is because they started blacklisting sites that archived video related to the war in Syria.


Totally agree. The CASM is just a Trojan horse for something else. What better to disguise it in than some altruistic, ethical, and moral heart string pull that makes any opponent seem like a pedo.


To make things worse, NCMEC is not even a government agency but a non-profit lobby group. So there may lack normal government oversight to them.


I'm sure the crypto part is well implemented.

However, I make the prognostic that within the next 2 years, the Chinese government will force Apple to use its own database of "objectionable" content and will require that the on-device photo roll be scanned, not just iCloud (they already have access to that).

And probably not just China, because any LEA and secret services would love to have the ability to use such a system for ad-hoc searches: dear Apple, for those users, please extend the database source to this new one we maintain and enable scanning of all on-device pictures.


> I'm sure the crypto part is well implemented.

If it is, that's bad for human rights. The crypto part protects Apple and their under-specified list sources from accountability.

Without the crypto the system would just send you the database of bad hashes and snitch on you based on it. Your privacy would be no worse, but at least researchers would have some chance of detecting when the system was being used off-label to enable genocide.


Also of potential interest besides the OP link, there is also the paper "A Concrete-Security Analysis of the Apple PSI Protocol", also known as the "Alternative Security Proof of the Apple PSI System."

https://www.apple.com/child-safety/pdf/Alternative_Security_...

Basically it's a second opinion on the mathematics from a different perspective. The original post link is the formal proof by Apple employees and Stanford, the Alternative Proof is by the University of California.


For clarity: the "official" proof has Dan Boneh's name on it, and the "alternative" proof has Mihir Bellare's name on it.


What does this clarify?


That Dan Boneh and Mihir Bellare worked on proofs of this.


This document is a deflection from the main concern -- while it's commendable that they took the effort to prove the cryptographic properties of their system, what good is that when your hash database is a government-controlled black box? Can't exactly publish a white paper on that, huh?


The crypto seems sound, but a lot of the tricky questions are waived aside with the blanket statement of "mechanisms outside of the cryptographic protocol."

The part that worries me the most is that no one outside of Apple can verify that the hash set they're pushing hasn't been tampered with (Section 4, Remark 5). This allows them for example to add leaked product image hashes to hunt down and prosecute people who share info about their products before release.

In fact, the system seems designed to be impossible to audit, with only a subset of the whole hash set being pushed to clients, so that researchers can't even tell when more hashes have been added. As a consequence of that design, they acknowledge that a "small number" of false negatives will be missed, and justify that with an argument that it improves performance (Section 2, Remark 3)

False positives on the other hand will be common (as detailed in Section 5, "Duplicate images") - simply copying a file on two client devices that don't share a cloud owner ID will count towards the threshold, and again fall back on "mechanisms outside of the cryptographic protocol".

And last but not least, let's spare a thought for the Apple employees that will be required to sift through potentially traumatizing imagery (assuming the company doesn't outsource that to a third party.)


You don't own your apple devices anyways. It is a happy walled garden for those who don't care too much, those who lost all hope and those who can lie to themselves very well.

Not that I am a open source extremist, but the moment where we couldn't control the way our own machines run is the moment where they stopped belonging to us. Supporting a true open source phone OS might be a good idea even if you don't use it. Because one day you might have to.


Police departments are a business. They are incentivized to secure the most convictions for the least amount of work. Finding people that posses CSAM files is easy and gets convictions with relatively little work. It does not however do much to deter child abuse. Police departments and CPS routinely ignore calls for investigations into alleged acts of CSA. Take the Sophie Long case for instance. A question for the reader, is it more important to spend resources stopping CSAM or CSA?

Is it really worth giving up our fundamental privacy rights when the police already routinely ignore CSA?


Yeah what happens when someone abuses this for easy “untraceable” swatting. What if the next iOS malware uploaded some CSAM hashes to iCloud unless you pay 5 btc in the next 24 hours?


There's also the other side - assuming NCMEC and law enforcement are good actors is a flawed premise (as has been proven time and again). What if the CSAM database has non-CSAM images there, or non-CSAM hashes?

One could argue which is more likely, but the fact remains that the entire premise of this system is flawed and it seems the only way to play this game is to not use iCloud photos at all (though it's possible that malware could bypass that and turn on iCloud sync as well as upload the photos too...)


This is to facilitate the easy approval of court orders for the spy agencies to get the content of someones phone. It doesn't matter if the photo is illegal or not.


One theory I've seen advanced why the claimed reports to the NCMEC are many orders of magnitude larger than the prosecutions is that the conjecture that the whole thing is a CIA operation to collect kompromat on massive numbers of people.

The alternative that the database is just stuffed full of non-illegal images seems more likely but I dunno if its that much more comforting.


This is already possible today on both iCloud Photos and Google Photos. Both scan for CSAM server side.


All other flaws asides, that the hashes are not auditables by users on their own computers is deeply wrong and undermine the trust on any algorithm as sound as it could be. Moreover, this blackbox provides Apple plausible deniability if something turns ugly, they will say, we're just the middle man, we only relay the officialy approved database, we can't vet it.


They do vet it. They manually review every flagged account before the process goes further.


You misunderstand the purpose of the review. Per current case law if apple automatically handed the data over rather than reviewing it themselves the reading of the report would constitute an unlawful search.

A human at Apple needs to inspect it in order to quash your fourth amendment protection against a warrantless search.

Apple inspecting your images destroys your privacy, the extra step of review doesn't protect your privacy -- it's there to destroy the constitutional protection of your privacy.


Time to notify the attorneys general. I can tell you personally that plaintiff firms are already gearing up to launch class actions against Apple on this. If you oppose it, notify your AG as well.

HN Link: https://www.naag.org/find-my-ag/


Hashes can be of any file.

How long will it take before your hard drives are scanned for matching hashes of copyrighted material?


Thanks for the read. Pretty cool. The following is my layman's understanding, mostly because it helps me to understand things by typing. Please criticize if I miss stuff.

Noteworthy is Dan Boneh's contribution on this. Given his reputation in the crypto community, it seems he really does believe in this, despite all the controversy as of recent.

They discuss and build upon several security properties, but the most contentious point-- privacy/leakage/scanning of your photos-- is addressed as follows:

Firstly some background-- the private set intersection[1] (PSI) technique in general permits 2 sets to be compared with both parties learning *only* the intersections. In a nut shell, Apple uses this concept such that, if the number of intersecting elements is greater than a threshhold, they're notified.

There are several modifications (Shamir's secret sharing, Cuckoo tables, PKI-ish schemes) to create what Apple calls a ftPSI-AD protocol to optimize desired properties-- performance, integrity, and most notably to me, zero false passes. That is, innocent people will be minimized at the cost of real child-pornographic images slipping by.

Couple noteworthy things that still raise red-flags--

1) they prove that, for honest servers <--> malicious clients, and vice-versa, privacy is not violated for either party, but to me this considers the client as the phone and Apple the server. I'd argue that the phone is actually "Mallory", and you are the client.

You might be honest, but how do you trust the Apple + phone short of reverse engineering it? This is the biggest hole to me, and so I don't fully understand this proof (or, perhaps I have the parties mixed up).

2) Several things are handwaved and/or left "variable to implementation". E.g. Section 5, on "near-duplicate images" that may count twice to this threshhold--

>> Several solutions to this were considered, but ultimately, this issue is addressed by a mechanism outside of the cryptographic protocol

What the?? Hello?? Perhaps this is addressed in another whitepaper, given this is a theory/protocol heavy paper, but this does not instill confidence.

Or, take this bit from remark 3--

>> If needed, these false negatives can be eliminated with a tweak to the data structure used

Uh, I thought not sending innocent people to jail was a pretty critical property. You're telling me the server/Apple, who controls the Cuckoo table, can just change this on a whim? How would I hold them responsible/be notified of this?

These "variations" are remarked on several times in the paper. Again, not exactly confidence building.

Overall, while I really applaud this effort, and I'm not as outraged as I initially was, I'm only slightly less so and have a handful of more questions than before.

Again, please correct me if my annoyance might be misguided, given these technical details.

[1]: https://en.wikipedia.org/wiki/Private_set_intersection



Dan invented some stuff to liberate humanity, and invented some stuff to oppress humanity. Such is the burden of the cryptographer.


What are the storage and processing requirements for this functionality and how will they evolve over time? Can it be so that it is only loaded on my device if I use the relevant services? Other than impact on privacy it also has impact on property even though I know with Apple that ship has sailed a long time ago. My last Apple purchase was a couple days before I heard about this, and it will be my Last. Not like they’ll care, but I do, as I only accepted their walled garden in exchange for privacy, however naive that proves me to be…


Even assuming the protocol is sound, Apple could choose a set of innocent photos that are frequently found on people's devices, in addition to whatever photos they deem unlawful. In case they didn't show the thresholds to the user, they can set it arbitrarily low. The client will never know which photos were used to determine they were guilty.

It's still an interesting read from a cryptography point of view though.


Assuming perfectly trustworthy governments and perfectly flawless programmers, it is all good. Now where do I get me some of those in this imperfect world?


This is absolutely earth shattering work in the world of differential privacy. Kudos to this research team for spending the time to work it out and publish.

Just wow.


Others have done a good job addressing the political and social issues with this sort of endeavour, but I'm puzzled by something else. Isn't this just trivially bypassed? Just changing a few bits in the image won't be noticed by the human eye but the hash will be wildly different.


You're missing the fact that it uses perceptual hashing, not cryptographic hashing. Minor changes to an image result in a similar or identical perceptual hash. Here's an article and a big discussion about it from the other day: https://news.ycombinator.com/item?id=28091750


Good to know, thanks for the link!


How to learn enough crypto to understand this paper?


Dan Boneh’s CS255 course reader and CS355 lecture notes should be a good beginning.


The actual scheme is relatively simple, the exact details and terminology make it hard for a layman to understand.

I'll give you an EL18 description of a basic private set intersection:

I have a database of image fingerprints which I want you to test your images against and tell me if there are matches. I can assume that you're going to faithfully run my protocol because I use DRM to control the software that runs on your computing device. The obvious way to accomplish the matching would be for me to just send you the database of fingerprints-- they are hashes after all and don't tell you anything about the images other than letting you match them-- and for you to tell me about the matches.

But I don't want to tell you the database or tell you when an image matches because if I do you'll realize that I'm targeting images connected with a particular ethnicity, which I intend to mass murder. So, instead I tell you I want to search for child porn and I get you to agree to the following protocol, and you're foolish enough let me keep the hashes secret for no obvious reason.

The first building block we need is an encryption scheme which is additively homomorphic, such as elgamal encryption. With this special encryption scheme the following properties hold: Enc(Data1, key) + Enc(Data2, key) = Enc(Data1+Data2,key) and x*Enc(Data1,key) = Enc(Data1*x,key). Or, in English, the sum of two ciphertexts gives you a ciphertext of the sum of the plaintexts, and a ciphertext multiplied by a value gives you a ciphertext for the plaintext multiplied by that value.

With that in hand we can build a private set intersection.

(1) I pick a private key, send you the public key, and I encrypt each of the hashes in my database. I send you the encryptions-- which, thanks to the encryption, teaches you nothing about the database except an upper bound on its size.

(2) For each database entry you take the hash of image you want to test, encrypt it with the same key and subtract it from the encrypted database entry. If they matched you have an encryption of zero (which you can't tell is zero, due to encryption), if they didn't match you have an encryption of some non-zero value-- the difference between the image hash and the database entry. You then pick a new random number and multiply the result with it. You now either have an encryption of a totally random number (if there was no match) or an encryption of zero (since random*0=0).

(3) You send that to me, I decrypt it... and if it decrypts to zero I add you to the list of people to be executed at some time in the future. If it doesn't decrypt to zero I learn absolutely nothing about your hash, other than it didn't match, because the result is literally a random number.

The Apple scheme makes a number of elaborations on this basic idea to improve efficiency (the database is a cuckoo hash table, so instead of sending you one encryption per database entry per image of mine I only need to send you a few encryption per image-- however much fanout the hash table has), to make it so that matches result in leaking a decryption key so they can decrypt the image, and additional complexity to make it so that the matching isn't fully triggered unless you have more than some threshold number of matching images and to partially obscure the exact number of sub-threshold matches.


Topic should be renamed the Apple PSI System or Apple Private Set Intersection System (not Psi or something related to tire pressure.)


The HN style guide lowercases acronyms like PSI to Psi, unfortunately.


Source? I can think of a number that aren’t, and a bunch currently on the frontpage (as specific as “CSAM”, and as widely used as “IP”).


HN automatically converts to titlecase on submit, if you click edit and change the title to how you want it won’t mess with it again.


(Mods still might, if they’re in the mood.)


Tire pressure is measured in PSI, or Pounds per Square Inch.


show me the code or STFU


need enough crypto understand


Can they scan for face recognition? For specific faces?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: