john-dev - Re: Implementing compatibility with the 3PC protocol

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJ9ii1F6KomRb8jgCmyL4DavQcKKo1sqB6iP-tkPXVwkh_Wm7g@mail.gmail.com>
Date: Wed, 6 Sep 2023 21:40:24 -0400
From: Matt Weir <cweir@...edu>
To: john-dev@...ts.openwall.com
Subject: Re: Implementing compatibility with the 3PC protocol

Hi Håvard,
   I'm still trying to get my head around your paper, so I apologize if you
addressed these questions, but there are two things that I'm really
struggling with:

1) You mentioned bcrypt as a possible worthwhile target hashing algorithm a
couple of times but all of the examples are with unsalted hashes. How does
your vector approach work with salted hashes? Would all the vector
sub-hashes use the same salt?

2) You also mentioned that your algorithm can provide perfect k-anonymity.
I hate to be really nitpicky but based on my understanding of k-anonymity
the way you implement your vectors doesn't provide that property
-- A quick "is this using k-anonymity" test I use is to see if the degree
is specified. Aka this dataset has 5-anonymity where every record occurs at
least five times. I don't see that occurring with your  approach where
every hash is different.
-- The good news is it appears the property you are referencing is closer
to "differential privacy" which is a much stronger security guarantee if
you can prove it. With differential privacy you are basically saying that
if you add another real user's password to the dataset that the resulting
dataset you submit to the server would be indistinguishable from the case
where you didn't add it.
-- Full disclaimer, I'm not confident you actually achieve differential
privacy (modeling human generated password habits is surprisingly
difficult). But that's the proof I'd recommend trying to prove, and that
I'd really be interested in seeing.
-- Total side note: Have I Been Pwned does not do k-anonymity either.
They are the other big name in this space that I know throws around that
term a lot. I talked with one of the architects of it and they said they
realized they are not really using k-anonymity, but they just liked how
that term sounded.

Once again though, this is an interesting paper! There was obviously a ton
of work your team put into this, and I appreciate you continuing to try and
develop proof of concepts and share it with the wider password security
community. Thanks again for sharing it!

Cheers,
Matt

On Tue, Sep 5, 2023 at 11:06 PM Matt Weir <cweir@...edu> wrote:

> Hi Håvard,
>    First of all I want to say thank you for having your team present at
> PasswordsCon. I was bummed I missed that presentation, so I'd also like to
> thank you for posting your paper so I can read it! While I want to really
> dig into that paper and have some questions, I probably need to read it
> (vs. skim it) so I don't just waste your time. For now though I'd like to
> dig into your request since I think it might align with something that has
> been on my wishlist for a while: A dynamic hash mode that does partial
> matches of the hash.
>
> Backing up, it would be very helpful in a number of different use-cases to
> say "only compare X bytes of the target hash, starting at byte Y", and then
> output cracks that match that partial hash.
>
> Use-Cases:
> -- Vanity Hashes: Aka I want to have a hash that starts with "31337"
>
> -- All those poorly formed or collected password files that have "junk" in
> some of their positions (E.g: Original LinkedIn leak), or are partial
> hashes themselves
>
> Now Håvard, I realize this isn't a direct analogy to your vectors, but at
> the same time it could dramatically cut down on the generated hashlist size
> you'd need to create and check against if you didn't have to perform
> cracking attacks against every possible combination generated by that
> vector. So instead of 2^32 possible hashes for a single MD5 "submission"
> you could only check half the hash and have 2^16 possible hashes that JtR
> would need to load up. That's the number of hashes, not the keyspace,
> (which would be 16^16) which should still be good enough to reduce false
> positives. Worst case you have a two-pass system where you validate the
> plaintext against the second half of the hash.
>
> Admittedly, this doesn't get into the feasibility of implementing a
> partial hash check dynamic mode, (Solar highlighted the design challenges),
> but my real point is that this feature might be helpful for 3PC and it
> certainly would be helpful in general for the other use-cases I mentioned.
>
> Cheers,
> Matt Weir
>
> On Tue, Sep 5, 2023 at 5:03 PM Lukas Odzioba <lukas.odzioba@...il.com>
> wrote:
>
>> On September 5th 2023 at 21:22 Solar Designer wrote:
>>
>>>
>>> You can manage to efficiently integrate this into individual formats for
>>> a proof-of-concept or a specialized use case, but not easily integrate
>>> it into JtR as a whole.
>>>
>>
>> In general I agree with the above, but if the PoC is all you need and it
>> should be fairly easy to figure out what to do, I would
>> treat formats as a blackbox backend and create a new frontend with
>> communication, candidate preparation and multithreading support.
>> If you look how formats integrate with John you can use similar approach
>> and connect calculating f(password, salt) to anything you want.
>> GPU stuff would need more plumbing, but is doable. Such PoC frankenstein
>> would be a separate binary living in John's source tree.
>>
>> Thanks,
>> Lukas
>>
>

Content of type "text/html" skipped
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.