man-pages/man7/passphrase-encoding.7ssl.html


<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<HTML><HEAD><TITLE>Man page of PASSPHRASE-ENCODING</TITLE>
</HEAD><BODY>
<H1>PASSPHRASE-ENCODING</H1>
Section: OpenSSL (7SSL)<BR>Updated: 2021-03-22<BR><A HREF="#index">Index</A>
<A HREF="/cgi-bin/man/man2html">Return to Main Contents</A><HR>


<A NAME="lbAB">&nbsp;</A>
<H2>NAME</H2>

passphrase-encoding - How diverse parts of OpenSSL treat pass phrases character encoding
<A NAME="lbAC">&nbsp;</A>
<H2>DESCRIPTION</H2>


In a modern world with all sorts of character encodings, the treatment of pass
phrases has become increasingly complex.
This manual page attempts to give an overview over how this problem is
currently addressed in different parts of the OpenSSL library.
<A NAME="lbAD">&nbsp;</A>
<H3>The general case</H3>


The OpenSSL library doesn't treat pass phrases in any special way as a general
rule, and trusts the application or user to choose a suitable character set
and stick to that throughout the lifetime of affected objects.
This means that for an object that was encrypted using a pass phrase encoded in
<FONT SIZE="-1">ISO-8859-1,</FONT> that object needs to be decrypted using a pass phrase encoded in
<FONT SIZE="-1">ISO-8859-1.</FONT>
Using the wrong encoding is expected to cause a decryption failure.
<A NAME="lbAE">&nbsp;</A>
<H3>PKCS#12</H3>


PKCS#12 is a bit different regarding pass phrase encoding.
The standard stipulates that the pass phrase shall be encoded as an <FONT SIZE="-1">ASN.1</FONT>
BMPString, which consists of the code points of the basic multilingual plane,
encoded in big endian (<FONT SIZE="-1">UCS-2 BE</FONT>).
<P>

OpenSSL tries to adapt to this requirements in one of the following manners:
<DL COMPACT>
<DT id="1">1.<DD>
Treats the received pass phrase as <FONT SIZE="-1">UTF-8</FONT> encoded and tries to re-encode it to
<FONT SIZE="-1">UTF-16</FONT> (which is the same as <FONT SIZE="-1">UCS-2</FONT> for characters U+0000 to U+D7FF and U+E000
to U+FFFF, but becomes an expansion for any other character), or failing that,
proceeds with step 2.
<DT id="2">2.<DD>
Assumes that the pass phrase is encoded in <FONT SIZE="-1">ASCII</FONT> or <FONT SIZE="-1">ISO-8859-1</FONT> and
opportunistically prepends each byte with a zero byte to obtain the <FONT SIZE="-1">UCS-2</FONT>
encoding of the characters, which it stores as a BMPString.


<P>


Note that since there is no check of your locale, this may produce <FONT SIZE="-1">UCS-2 /
UTF-16</FONT> characters that do not correspond to the original pass phrase characters
for other character sets, such as any <FONT SIZE="-1">ISO-8859-X</FONT> encoding other than
<FONT SIZE="-1">ISO-8859-1</FONT> (or for Windows, <FONT SIZE="-1">CP 1252</FONT> with exception for the extra ``graphical''
characters in the 0x80-0x9F range).
</DL>
<P>

OpenSSL versions older than 1.1.0 do variant 2 only, and that is the reason why
OpenSSL still does this, to be able to read files produced with older versions.
<P>

It should be noted that this approach isn't entirely fault free.
<P>

A pass phrase encoded in <FONT SIZE="-1">ISO-8859-2</FONT> could very well have a sequence such as
0xC3 0xAF (which is the two characters ``<FONT SIZE="-1">LATIN CAPITAL LETTER A WITH BREVE''</FONT>
and ``<FONT SIZE="-1">LATIN CAPITAL LETTER Z WITH DOT ABOVE''</FONT> in <FONT SIZE="-1">ISO-8859-2</FONT> encoding), but would
be misinterpreted as the perfectly valid <FONT SIZE="-1">UTF-8</FONT> encoded code point U+00EF (<FONT SIZE="-1">LATIN
SMALL LETTER I WITH DIAERESIS</FONT>) <I>if the pass phrase doesn't contain anything that
would be invalid </I><FONT SIZE="-1"><I>UTF-8</I></FONT><I></I>.
A pass phrase that contains this kind of byte sequence will give a different
outcome in OpenSSL 1.1.0 and newer than in OpenSSL older than 1.1.0.
<P>


<PRE>
 0x00 0xC3 0x00 0xAF                    # OpenSSL older than 1.1.0
 0x00 0xEF                              # OpenSSL 1.1.0 and newer

</PRE>


<P>

On the same accord, anything encoded in <FONT SIZE="-1">UTF-8</FONT> that was given to OpenSSL older
than 1.1.0 was misinterpreted as <FONT SIZE="-1">ISO-8859-1</FONT> sequences.
<A NAME="lbAF">&nbsp;</A>
<H3><FONT SIZE="-1">OSSL_STORE</FONT></H3>


<B><A HREF="/cgi-bin/man/man2html?7+ossl_store">ossl_store</A></B>(7) acts as a general interface to access all kinds of objects,
potentially protected with a pass phrase, a <FONT SIZE="-1">PIN</FONT> or something else.
This <FONT SIZE="-1">API</FONT> stipulates that pass phrases should be <FONT SIZE="-1">UTF-8</FONT> encoded, and that any
other pass phrase encoding may give undefined results.
This <FONT SIZE="-1">API</FONT> relies on the application to ensure <FONT SIZE="-1">UTF-8</FONT> encoding, and doesn't check
that this is the case, so what it gets, it will also pass to the underlying
loader.
<A NAME="lbAG">&nbsp;</A>
<H2>RECOMMENDATIONS</H2>


This section assumes that you know what pass phrase was used for encryption,
but that it may have been encoded in a different character encoding than the
one used by your current input method.
For example, the pass phrase may have been used at a time when your default
encoding was <FONT SIZE="-1">ISO-8859-1</FONT> (i.e. ``nai.ve'' resulting in the byte sequence 0x6E 0x61
0xEF 0x76 0x65), and you're now in an environment where your default encoding
is <FONT SIZE="-1">UTF-8</FONT> (i.e. ``nai.ve'' resulting in the byte sequence 0x6E 0x61 0xC3 0xAF 0x76
0x65).
Whenever it's mentioned that you should use a certain character encoding, it
should be understood that you either change the input method to use the
mentioned encoding when you type in your pass phrase, or use some suitable tool
to convert your pass phrase from your default encoding to the target encoding.
<P>

Also note that the sub-sections below discuss human readable pass phrases.
This is particularly relevant for PKCS#12 objects, where human readable pass
phrases are assumed.
For other objects, it's as legitimate to use any byte sequence (such as a
sequence of bytes from `/dev/urandom` that's been saved away), which makes any
character encoding discussion irrelevant; in such cases, simply use the same
byte sequence as it is.
<A NAME="lbAH">&nbsp;</A>
<H3>Creating new objects</H3>


For creating new pass phrase protected objects, make sure the pass phrase is
encoded using <FONT SIZE="-1">UTF-8.</FONT>
This is default on most modern Unixes, but may involve an effort on other
platforms.
Specifically for Windows, setting the environment variable
<TT>&quot;OPENSSL_WIN32_UTF8&quot;</TT> will have anything entered on [Windows] console prompt
converted to <FONT SIZE="-1">UTF-8</FONT> (command line and separately prompted pass phrases alike).
<A NAME="lbAI">&nbsp;</A>
<H3>Opening existing objects</H3>


For opening pass phrase protected objects where you know what character
encoding was used for the encryption pass phrase, make sure to use the same
encoding again.
<P>

For opening pass phrase protected objects where the character encoding that was
used is unknown, or where the producing application is unknown, try one of the
following:
<DL COMPACT>
<DT id="3">1.<DD>
Try the pass phrase that you have as it is in the character encoding of your
environment.
It's possible that its byte sequence is exactly right.
<DT id="4">2.<DD>
Convert the pass phrase to <FONT SIZE="-1">UTF-8</FONT> and try with the result.
Specifically with PKCS#12, this should open up any object that was created
according to the specification.
<DT id="5">3.<DD>
Do a nai.ve (i.e. purely mathematical) <FONT SIZE="-1">ISO-8859-1</FONT> to <FONT SIZE="-1">UTF-8</FONT> conversion and try
with the result.
This differs from the previous attempt because <FONT SIZE="-1">ISO-8859-1</FONT> maps directly to
U+0000 to U+00FF, which other non-UTF-8 character sets do not.


<P>


This also takes care of the case when a <FONT SIZE="-1">UTF-8</FONT> encoded string was used with
OpenSSL older than 1.1.0.
(for example, <TT>&quot;i.&quot;</TT>, which is 0xC3 0xAF when encoded in <FONT SIZE="-1">UTF-8,</FONT> would become 0xC3
0x83 0xC2 0xAF when re-encoded in the nai.ve manner.
The conversion to BMPString would then yield 0x00 0xC3 0x00 0xA4 0x00 0x00, the
erroneous/non-compliant encoding used by OpenSSL older than 1.1.0)
</DL>
<A NAME="lbAJ">&nbsp;</A>
<H2>SEE ALSO</H2>


<B><A HREF="/cgi-bin/man/man2html?7+evp">evp</A></B>(7),
<B><A HREF="/cgi-bin/man/man2html?7+ossl_store">ossl_store</A></B>(7),
<B><A HREF="/cgi-bin/man/man2html?3+EVP_BytesToKey">EVP_BytesToKey</A></B>(3), <B><A HREF="/cgi-bin/man/man2html?3+EVP_DecryptInit">EVP_DecryptInit</A></B>(3),
<B><A HREF="/cgi-bin/man/man2html?3+PEM_do_header">PEM_do_header</A></B>(3),
<B><A HREF="/cgi-bin/man/man2html?3+PKCS12_parse">PKCS12_parse</A></B>(3), <B><A HREF="/cgi-bin/man/man2html?3+PKCS12_newpass">PKCS12_newpass</A></B>(3),
<B><A HREF="/cgi-bin/man/man2html?3+d2i_PKCS8PrivateKey_bio">d2i_PKCS8PrivateKey_bio</A></B>(3)
<A NAME="lbAK">&nbsp;</A>
<H2>COPYRIGHT</H2>


Copyright 2018-2020 The OpenSSL Project Authors. All Rights Reserved.
<P>

Licensed under the OpenSSL license (the ``License'').  You may not use
this file except in compliance with the License.  You can obtain a copy
in the file <FONT SIZE="-1">LICENSE</FONT> in the source distribution or at
&lt;<A HREF="https://www.openssl.org/source/license.html">https://www.openssl.org/source/license.html</A>&gt;.
<P>

<HR>
<A NAME="index">&nbsp;</A><H2>Index</H2>
<DL>
<DT id="6"><A HREF="#lbAB">NAME</A><DD>
<DT id="7"><A HREF="#lbAC">DESCRIPTION</A><DD>
<DL>
<DT id="8"><A HREF="#lbAD">The general case</A><DD>
<DT id="9"><A HREF="#lbAE">PKCS#12</A><DD>
<DT id="10"><A HREF="#lbAF"><FONT SIZE="-1">OSSL_STORE</FONT></A><DD>
</DL>
<DT id="11"><A HREF="#lbAG">RECOMMENDATIONS</A><DD>
<DL>
<DT id="12"><A HREF="#lbAH">Creating new objects</A><DD>
<DT id="13"><A HREF="#lbAI">Opening existing objects</A><DD>
</DL>
<DT id="14"><A HREF="#lbAJ">SEE ALSO</A><DD>
<DT id="15"><A HREF="#lbAK">COPYRIGHT</A><DD>
</DL>
<HR>
This document was created by
<A HREF="/cgi-bin/man/man2html">man2html</A>,
using the manual pages.<BR>
Time: 00:06:09 GMT, March 31, 2021
</BODY>
</HTML>