Password cracking: Difference between revisions

From Rixort Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
 
(21 intermediate revisions by the same user not shown)
Line 1: Line 1:
== Initial steps ==
Steps required for password cracking software:
Steps required for password cracking software:


# Identify which columns contain the username and the password (hashed or otherwise).
# Identify which columns contain the username and the password (hashed or otherwise). May be easier to convert to a standard internal representation before processing.
# Identify the algorithm used.
# Identify the algorithm used.
# Identify whether a salt is used.
# Identify whether a salt is used.
Line 8: Line 10:


# If no salt is used (e.g. plain MD5), consult a pre-computed lookup table.
# If no salt is used (e.g. plain MD5), consult a pre-computed lookup table.
# If small salt is used (less than say 4 bits), consult a pre-computed lookup table.
# If a sensible algorithm is used (e.g. bcrypt with large salt), check dictionary, then common words, then variants of the previous two, then brute force.


Identifying an algorithm:
== Identifying an algorithm ==


* Length: 32 characters is likely to be MD5.
* Length: 32 characters (16 bytes) is likely to be MD5.
* Characters: 0-9a-zA-Z is likely to be MD5.
* Characters: 0-9a-fA-F is likely to be MD5.


== Lookup tables ==
== Lookup tables ==
Line 19: Line 23:
* What options does the chosen language support?
* What options does the chosen language support?
* Which options are the most efficient?
* Which options are the most efficient?
* Can lookup tables be built entirely in memory and then flushed to disk? Regular flushing as used by SQLite prevents data loss but may take longer due to regular I/O. (answer: Yes, just put the whole thing in a huge transaction and commit at the end).
* Trade-off between size of table (and time to generate) and coverage. May not be worthwhile building lookup tables for anything more than dictionary words and common passwords.
* For very fast hashing such as MD5, it may be quicker to just brute-force everything.
Possible contents of lookup tables:
* Dictionary words
* Common words not in dictionary (e.g. TV shows)
* Simple combinations, such as dictionary word concatenated with '1', '123' etc.
* Every possible combination of case and 0-9a-z from 6-12 characters in length.
== Performance ==
Vectorisation vs concurrency vs parallelisation.
== Languages ==
Language choice is a combination of performance, available libraries and existing knowledge. Obvious initial candidates are:
* C
* CPython (reference implementation of Python)
* PyPy (Python written in Python - supposedly faster than CPython but sometimes behind in terms of version support)
== Libraries ==
Ultimately most crypto libraries end up being a wrapper around [[OpenSSL]].
=== Python ===
* [https://docs.python.org/3/library/hashlib.html hashlib] is the Python wrapper around OpenSSL and appears to be in the standard library.
* [http://lmdb.readthedocs.io/en/release/ Python bindings to LMDB]
[[Category:Python]]

Latest revision as of 13:50, 26 August 2018

Initial steps

Steps required for password cracking software:

  1. Identify which columns contain the username and the password (hashed or otherwise). May be easier to convert to a standard internal representation before processing.
  2. Identify the algorithm used.
  3. Identify whether a salt is used.

From these there are multiple stages:

  1. If no salt is used (e.g. plain MD5), consult a pre-computed lookup table.
  2. If small salt is used (less than say 4 bits), consult a pre-computed lookup table.
  3. If a sensible algorithm is used (e.g. bcrypt with large salt), check dictionary, then common words, then variants of the previous two, then brute force.

Identifying an algorithm

  • Length: 32 characters (16 bytes) is likely to be MD5.
  • Characters: 0-9a-fA-F is likely to be MD5.

Lookup tables

  • How should these be delivered? Plain text file, SQLite database, Lightning Memory-Mapped Database (LMDB), something else?
  • What options does the chosen language support?
  • Which options are the most efficient?
  • Can lookup tables be built entirely in memory and then flushed to disk? Regular flushing as used by SQLite prevents data loss but may take longer due to regular I/O. (answer: Yes, just put the whole thing in a huge transaction and commit at the end).
  • Trade-off between size of table (and time to generate) and coverage. May not be worthwhile building lookup tables for anything more than dictionary words and common passwords.
  • For very fast hashing such as MD5, it may be quicker to just brute-force everything.

Possible contents of lookup tables:

  • Dictionary words
  • Common words not in dictionary (e.g. TV shows)
  • Simple combinations, such as dictionary word concatenated with '1', '123' etc.
  • Every possible combination of case and 0-9a-z from 6-12 characters in length.

Performance

Vectorisation vs concurrency vs parallelisation.

Languages

Language choice is a combination of performance, available libraries and existing knowledge. Obvious initial candidates are:

  • C
  • CPython (reference implementation of Python)
  • PyPy (Python written in Python - supposedly faster than CPython but sometimes behind in terms of version support)

Libraries

Ultimately most crypto libraries end up being a wrapper around OpenSSL.

Python