xtquant.xtbson.bson36

BSON (Binary JSON) encoding and decoding.

The mapping from Python types to BSON types is as follows:

======================================= ============= =================== Python Type BSON Type Supported Direction ======================================= ============= =================== None null both bool boolean both int 1 int32 / int64 py -> bson bson.int64.Int64 int64 both float number (real) both str string both list array both dict / SON object both datetime.datetime 2 3 date both bson.regex.Regex regex both compiled re 4 regex py -> bson bson.binary.Binary binary both bson.objectid.ObjectId oid both bson.dbref.DBRef dbref both None undefined bson -> py bson.code.Code code both str symbol bson -> py bytes 5 binary both ======================================= ============= ===================


  1. A Python int will be saved as a BSON int32 or BSON int64 depending on its size. A BSON int32 will always decode to a Python int. A BSON int64 will always decode to a ~bson.int64.Int64

  2. datetime.datetime instances will be rounded to the nearest millisecond when saved 

  3. all datetime.datetime instances are treated as naive. clients should always use UTC. 

  4. ~bson.regex.Regex instances and regular expression objects from re.compile() are both saved as BSON regular expressions. BSON regular expressions are decoded as ~bson.regex.Regex instances. 

  5. The bytes type is encoded as BSON binary with subtype 0. It will be decoded back to bytes. 

   1# Copyright 2009-present MongoDB, Inc.
   2#
   3# Licensed under the Apache License, Version 2.0 (the "License");
   4# you may not use this file except in compliance with the License.
   5# You may obtain a copy of the License at
   6#
   7# http://www.apache.org/licenses/LICENSE-2.0
   8#
   9# Unless required by applicable law or agreed to in writing, software
  10# distributed under the License is distributed on an "AS IS" BASIS,
  11# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  12# See the License for the specific language governing permissions and
  13# limitations under the License.
  14
  15"""BSON (Binary JSON) encoding and decoding.
  16
  17The mapping from Python types to BSON types is as follows:
  18
  19=======================================  =============  ===================
  20Python Type                              BSON Type      Supported Direction
  21=======================================  =============  ===================
  22None                                     null           both
  23bool                                     boolean        both
  24int [#int]_                              int32 / int64  py -> bson
  25`bson.int64.Int64`                       int64          both
  26float                                    number (real)  both
  27str                                      string         both
  28list                                     array          both
  29dict / `SON`                             object         both
  30datetime.datetime [#dt]_ [#dt2]_         date           both
  31`bson.regex.Regex`                       regex          both
  32compiled re [#re]_                       regex          py -> bson
  33`bson.binary.Binary`                     binary         both
  34`bson.objectid.ObjectId`                 oid            both
  35`bson.dbref.DBRef`                       dbref          both
  36None                                     undefined      bson -> py
  37`bson.code.Code`                         code           both
  38str                                      symbol         bson -> py
  39bytes [#bytes]_                          binary         both
  40=======================================  =============  ===================
  41
  42.. [#int] A Python int will be saved as a BSON int32 or BSON int64 depending
  43   on its size. A BSON int32 will always decode to a Python int. A BSON
  44   int64 will always decode to a :class:`~bson.int64.Int64`.
  45.. [#dt] datetime.datetime instances will be rounded to the nearest
  46   millisecond when saved
  47.. [#dt2] all datetime.datetime instances are treated as *naive*. clients
  48   should always use UTC.
  49.. [#re] :class:`~bson.regex.Regex` instances and regular expression
  50   objects from ``re.compile()`` are both saved as BSON regular expressions.
  51   BSON regular expressions are decoded as :class:`~bson.regex.Regex`
  52   instances.
  53.. [#bytes] The bytes type is encoded as BSON binary with
  54   subtype 0. It will be decoded back to bytes.
  55"""
  56
  57import calendar
  58import datetime
  59import itertools
  60import platform
  61import re
  62import struct
  63import sys
  64import uuid
  65from codecs import utf_8_decode as _utf_8_decode
  66from codecs import utf_8_encode as _utf_8_encode
  67from collections import abc as _abc
  68
  69from .binary import (
  70    ALL_UUID_SUBTYPES,
  71    CSHARP_LEGACY,
  72    JAVA_LEGACY,
  73    OLD_UUID_SUBTYPE,
  74    STANDARD,
  75    UUID_SUBTYPE,
  76    Binary,
  77    UuidRepresentation,
  78)
  79from .code import Code
  80from .codec_options import DEFAULT_CODEC_OPTIONS, CodecOptions, _raw_document_class
  81from .dbref import DBRef
  82from .decimal128 import Decimal128
  83from .errors import InvalidBSON, InvalidDocument, InvalidStringData
  84from .int64 import Int64
  85from .max_key import MaxKey
  86from .min_key import MinKey
  87from .objectid import ObjectId
  88from .regex import Regex
  89from .son import RE_TYPE, SON
  90from .timestamp import Timestamp
  91from .tz_util import utc
  92
  93try:
  94    from . import _cbson
  95
  96    _USE_C = True
  97except ImportError:
  98    _USE_C = False
  99
 100
 101EPOCH_AWARE = datetime.datetime.fromtimestamp(0, utc)
 102EPOCH_NAIVE = datetime.datetime.utcfromtimestamp(0)
 103
 104
 105BSONNUM = b"\x01"  # Floating point
 106BSONSTR = b"\x02"  # UTF-8 string
 107BSONOBJ = b"\x03"  # Embedded document
 108BSONARR = b"\x04"  # Array
 109BSONBIN = b"\x05"  # Binary
 110BSONUND = b"\x06"  # Undefined
 111BSONOID = b"\x07"  # ObjectId
 112BSONBOO = b"\x08"  # Boolean
 113BSONDAT = b"\x09"  # UTC Datetime
 114BSONNUL = b"\x0A"  # Null
 115BSONRGX = b"\x0B"  # Regex
 116BSONREF = b"\x0C"  # DBRef
 117BSONCOD = b"\x0D"  # Javascript code
 118BSONSYM = b"\x0E"  # Symbol
 119BSONCWS = b"\x0F"  # Javascript code with scope
 120BSONINT = b"\x10"  # 32bit int
 121BSONTIM = b"\x11"  # Timestamp
 122BSONLON = b"\x12"  # 64bit int
 123BSONDEC = b"\x13"  # Decimal128
 124BSONMIN = b"\xFF"  # Min key
 125BSONMAX = b"\x7F"  # Max key
 126
 127
 128_UNPACK_FLOAT_FROM = struct.Struct("<d").unpack_from
 129_UNPACK_INT = struct.Struct("<i").unpack
 130_UNPACK_INT_FROM = struct.Struct("<i").unpack_from
 131_UNPACK_LENGTH_SUBTYPE_FROM = struct.Struct("<iB").unpack_from
 132_UNPACK_LONG_FROM = struct.Struct("<q").unpack_from
 133_UNPACK_TIMESTAMP_FROM = struct.Struct("<II").unpack_from
 134
 135
 136def get_data_and_view(data):
 137    if isinstance(data, (bytes, bytearray)):
 138        return data, memoryview(data)
 139    view = memoryview(data)
 140    return view.tobytes(), view
 141
 142
 143def _raise_unknown_type(element_type, element_name):
 144    """Unknown type helper."""
 145    raise InvalidBSON(
 146        "Detected unknown BSON type %r for fieldname '%s'. Are "
 147        "you using the latest driver version?" % (chr(element_type).encode(), element_name)
 148    )
 149
 150
 151def _get_int(data, view, position, dummy0, dummy1, dummy2):
 152    """Decode a BSON int32 to python int."""
 153    return _UNPACK_INT_FROM(data, position)[0], position + 4
 154
 155
 156def _get_c_string(data, view, position, opts):
 157    """Decode a BSON 'C' string to python str."""
 158    end = data.index(b"\x00", position)
 159    return _utf_8_decode(view[position:end], opts.unicode_decode_error_handler, True)[0], end + 1
 160
 161
 162def _get_float(data, view, position, dummy0, dummy1, dummy2):
 163    """Decode a BSON double to python float."""
 164    return _UNPACK_FLOAT_FROM(data, position)[0], position + 8
 165
 166
 167def _get_string(data, view, position, obj_end, opts, dummy):
 168    """Decode a BSON string to python str."""
 169    length = _UNPACK_INT_FROM(data, position)[0]
 170    position += 4
 171    if length < 1 or obj_end - position < length:
 172        raise InvalidBSON("invalid string length")
 173    end = position + length - 1
 174    if data[end] != 0:
 175        raise InvalidBSON("invalid end of string")
 176    return _utf_8_decode(view[position:end], opts.unicode_decode_error_handler, True)[0], end + 1
 177
 178
 179def _get_object_size(data, position, obj_end):
 180    """Validate and return a BSON document's size."""
 181    try:
 182        obj_size = _UNPACK_INT_FROM(data, position)[0]
 183    except struct.error as exc:
 184        raise InvalidBSON(str(exc))
 185    end = position + obj_size - 1
 186    if data[end] != 0:
 187        raise InvalidBSON("bad eoo")
 188    if end >= obj_end:
 189        raise InvalidBSON("invalid object length")
 190    # If this is the top-level document, validate the total size too.
 191    if position == 0 and obj_size != obj_end:
 192        raise InvalidBSON("invalid object length")
 193    return obj_size, end
 194
 195
 196def _get_object(data, view, position, obj_end, opts, dummy):
 197    """Decode a BSON subdocument to opts.document_class or bson.dbref.DBRef."""
 198    obj_size, end = _get_object_size(data, position, obj_end)
 199    if _raw_document_class(opts.document_class):
 200        return (opts.document_class(data[position : end + 1], opts), position + obj_size)
 201
 202    obj = _elements_to_dict(data, view, position + 4, end, opts)
 203
 204    position += obj_size
 205    # If DBRef validation fails, return a normal doc.
 206    if (
 207        isinstance(obj.get("$ref"), str)
 208        and "$id" in obj
 209        and isinstance(obj.get("$db"), (str, type(None)))
 210    ):
 211        return (DBRef(obj.pop("$ref"), obj.pop("$id", None), obj.pop("$db", None), obj), position)
 212    return obj, position
 213
 214
 215def _get_array(data, view, position, obj_end, opts, element_name):
 216    """Decode a BSON array to python list."""
 217    size = _UNPACK_INT_FROM(data, position)[0]
 218    end = position + size - 1
 219    if data[end] != 0:
 220        raise InvalidBSON("bad eoo")
 221
 222    position += 4
 223    end -= 1
 224    result = []
 225
 226    # Avoid doing global and attribute lookups in the loop.
 227    append = result.append
 228    index = data.index
 229    getter = _ELEMENT_GETTER
 230    decoder_map = opts.type_registry._decoder_map
 231
 232    while position < end:
 233        element_type = data[position]
 234        # Just skip the keys.
 235        position = index(b"\x00", position) + 1
 236        try:
 237            value, position = getter[element_type](
 238                data, view, position, obj_end, opts, element_name
 239            )
 240        except KeyError:
 241            _raise_unknown_type(element_type, element_name)
 242
 243        if decoder_map:
 244            custom_decoder = decoder_map.get(type(value))
 245            if custom_decoder is not None:
 246                value = custom_decoder(value)
 247
 248        append(value)
 249
 250    if position != end + 1:
 251        raise InvalidBSON("bad array length")
 252    return result, position + 1
 253
 254
 255def _get_binary(data, view, position, obj_end, opts, dummy1):
 256    """Decode a BSON binary to bson.binary.Binary or python UUID."""
 257    length, subtype = _UNPACK_LENGTH_SUBTYPE_FROM(data, position)
 258    position += 5
 259    if subtype == 2:
 260        length2 = _UNPACK_INT_FROM(data, position)[0]
 261        position += 4
 262        if length2 != length - 4:
 263            raise InvalidBSON("invalid binary (st 2) - lengths don't match!")
 264        length = length2
 265    end = position + length
 266    if length < 0 or end > obj_end:
 267        raise InvalidBSON("bad binary object length")
 268
 269    # Convert UUID subtypes to native UUIDs.
 270    if subtype in ALL_UUID_SUBTYPES:
 271        uuid_rep = opts.uuid_representation
 272        binary_value = Binary(data[position:end], subtype)
 273        if (
 274            (uuid_rep == UuidRepresentation.UNSPECIFIED)
 275            or (subtype == UUID_SUBTYPE and uuid_rep != STANDARD)
 276            or (subtype == OLD_UUID_SUBTYPE and uuid_rep == STANDARD)
 277        ):
 278            return binary_value, end
 279        return binary_value.as_uuid(uuid_rep), end
 280
 281    # Decode subtype 0 to 'bytes'.
 282    if subtype == 0:
 283        value = data[position:end]
 284    else:
 285        value = Binary(data[position:end], subtype)
 286
 287    return value, end
 288
 289
 290def _get_oid(data, view, position, dummy0, dummy1, dummy2):
 291    """Decode a BSON ObjectId to bson.objectid.ObjectId."""
 292    end = position + 12
 293    return ObjectId(data[position:end]), end
 294
 295
 296def _get_boolean(data, view, position, dummy0, dummy1, dummy2):
 297    """Decode a BSON true/false to python True/False."""
 298    end = position + 1
 299    boolean_byte = data[position:end]
 300    if boolean_byte == b"\x00":
 301        return False, end
 302    elif boolean_byte == b"\x01":
 303        return True, end
 304    raise InvalidBSON("invalid boolean value: %r" % boolean_byte)
 305
 306
 307def _get_date(data, view, position, dummy0, opts, dummy1):
 308    """Decode a BSON datetime to python datetime.datetime."""
 309    return _millis_to_datetime(_UNPACK_LONG_FROM(data, position)[0], opts), position + 8
 310
 311
 312def _get_code(data, view, position, obj_end, opts, element_name):
 313    """Decode a BSON code to bson.code.Code."""
 314    code, position = _get_string(data, view, position, obj_end, opts, element_name)
 315    return Code(code), position
 316
 317
 318def _get_code_w_scope(data, view, position, obj_end, opts, element_name):
 319    """Decode a BSON code_w_scope to bson.code.Code."""
 320    code_end = position + _UNPACK_INT_FROM(data, position)[0]
 321    code, position = _get_string(data, view, position + 4, code_end, opts, element_name)
 322    scope, position = _get_object(data, view, position, code_end, opts, element_name)
 323    if position != code_end:
 324        raise InvalidBSON("scope outside of javascript code boundaries")
 325    return Code(code, scope), position
 326
 327
 328def _get_regex(data, view, position, dummy0, opts, dummy1):
 329    """Decode a BSON regex to bson.regex.Regex or a python pattern object."""
 330    pattern, position = _get_c_string(data, view, position, opts)
 331    bson_flags, position = _get_c_string(data, view, position, opts)
 332    bson_re = Regex(pattern, bson_flags)
 333    return bson_re, position
 334
 335
 336def _get_ref(data, view, position, obj_end, opts, element_name):
 337    """Decode (deprecated) BSON DBPointer to bson.dbref.DBRef."""
 338    collection, position = _get_string(data, view, position, obj_end, opts, element_name)
 339    oid, position = _get_oid(data, view, position, obj_end, opts, element_name)
 340    return DBRef(collection, oid), position
 341
 342
 343def _get_timestamp(data, view, position, dummy0, dummy1, dummy2):
 344    """Decode a BSON timestamp to bson.timestamp.Timestamp."""
 345    inc, timestamp = _UNPACK_TIMESTAMP_FROM(data, position)
 346    return Timestamp(timestamp, inc), position + 8
 347
 348
 349def _get_int64(data, view, position, dummy0, dummy1, dummy2):
 350    """Decode a BSON int64 to bson.int64.Int64."""
 351    return Int64(_UNPACK_LONG_FROM(data, position)[0]), position + 8
 352
 353
 354def _get_decimal128(data, view, position, dummy0, dummy1, dummy2):
 355    """Decode a BSON decimal128 to bson.decimal128.Decimal128."""
 356    end = position + 16
 357    return Decimal128.from_bid(data[position:end]), end
 358
 359
 360# Each decoder function's signature is:
 361#   - data: bytes
 362#   - view: memoryview that references `data`
 363#   - position: int, beginning of object in 'data' to decode
 364#   - obj_end: int, end of object to decode in 'data' if variable-length type
 365#   - opts: a CodecOptions
 366_ELEMENT_GETTER = {
 367    ord(BSONNUM): _get_float,
 368    ord(BSONSTR): _get_string,
 369    ord(BSONOBJ): _get_object,
 370    ord(BSONARR): _get_array,
 371    ord(BSONBIN): _get_binary,
 372    ord(BSONUND): lambda u, v, w, x, y, z: (None, w),  # Deprecated undefined
 373    ord(BSONOID): _get_oid,
 374    ord(BSONBOO): _get_boolean,
 375    ord(BSONDAT): _get_date,
 376    ord(BSONNUL): lambda u, v, w, x, y, z: (None, w),
 377    ord(BSONRGX): _get_regex,
 378    ord(BSONREF): _get_ref,  # Deprecated DBPointer
 379    ord(BSONCOD): _get_code,
 380    ord(BSONSYM): _get_string,  # Deprecated symbol
 381    ord(BSONCWS): _get_code_w_scope,
 382    ord(BSONINT): _get_int,
 383    ord(BSONTIM): _get_timestamp,
 384    ord(BSONLON): _get_int64,
 385    ord(BSONDEC): _get_decimal128,
 386    ord(BSONMIN): lambda u, v, w, x, y, z: (MinKey(), w),
 387    ord(BSONMAX): lambda u, v, w, x, y, z: (MaxKey(), w),
 388}
 389
 390
 391if _USE_C:
 392
 393    def _element_to_dict(data, view, position, obj_end, opts):
 394        return _cbson._element_to_dict(data, position, obj_end, opts)
 395
 396else:
 397
 398    def _element_to_dict(data, view, position, obj_end, opts):
 399        """Decode a single key, value pair."""
 400        element_type = data[position]
 401        position += 1
 402        element_name, position = _get_c_string(data, view, position, opts)
 403        try:
 404            value, position = _ELEMENT_GETTER[element_type](
 405                data, view, position, obj_end, opts, element_name
 406            )
 407        except KeyError:
 408            _raise_unknown_type(element_type, element_name)
 409
 410        if opts.type_registry._decoder_map:
 411            custom_decoder = opts.type_registry._decoder_map.get(type(value))
 412            if custom_decoder is not None:
 413                value = custom_decoder(value)
 414
 415        return element_name, value, position
 416
 417
 418def _raw_to_dict(data, position, obj_end, opts, result):
 419    data, view = get_data_and_view(data)
 420    return _elements_to_dict(data, view, position, obj_end, opts, result)
 421
 422
 423def _elements_to_dict(data, view, position, obj_end, opts, result=None):
 424    """Decode a BSON document into result."""
 425    if result is None:
 426        result = opts.document_class()
 427    end = obj_end - 1
 428    while position < end:
 429        key, value, position = _element_to_dict(data, view, position, obj_end, opts)
 430        result[key] = value
 431    if position != obj_end:
 432        raise InvalidBSON("bad object or element length")
 433    return result
 434
 435
 436def _bson_to_dict(data, opts):
 437    """Decode a BSON string to document_class."""
 438    data, view = get_data_and_view(data)
 439    try:
 440        if _raw_document_class(opts.document_class):
 441            return opts.document_class(data, opts)
 442        _, end = _get_object_size(data, 0, len(data))
 443        return _elements_to_dict(data, view, 4, end, opts)
 444    except InvalidBSON:
 445        raise
 446    except Exception:
 447        # Change exception type to InvalidBSON but preserve traceback.
 448        _, exc_value, exc_tb = sys.exc_info()
 449        raise InvalidBSON(str(exc_value)).with_traceback(exc_tb)
 450
 451
 452if _USE_C:
 453    _bson_to_dict = _cbson._bson_to_dict
 454
 455
 456_PACK_FLOAT = struct.Struct("<d").pack
 457_PACK_INT = struct.Struct("<i").pack
 458_PACK_LENGTH_SUBTYPE = struct.Struct("<iB").pack
 459_PACK_LONG = struct.Struct("<q").pack
 460_PACK_TIMESTAMP = struct.Struct("<II").pack
 461_LIST_NAMES = tuple((str(i) + "\x00").encode("utf8") for i in range(1000))
 462
 463
 464def gen_list_name():
 465    """Generate "keys" for encoded lists in the sequence
 466    b"0\x00", b"1\x00", b"2\x00", ...
 467
 468    The first 1000 keys are returned from a pre-built cache. All
 469    subsequent keys are generated on the fly.
 470    """
 471    for name in _LIST_NAMES:
 472        yield name
 473
 474    counter = itertools.count(1000)
 475    while True:
 476        yield (str(next(counter)) + "\x00").encode("utf8")
 477
 478
 479def _make_c_string_check(string):
 480    """Make a 'C' string, checking for embedded NUL characters."""
 481    if isinstance(string, bytes):
 482        if b"\x00" in string:
 483            raise InvalidDocument("BSON keys / regex patterns must not " "contain a NUL character")
 484        try:
 485            _utf_8_decode(string, None, True)
 486            return string + b"\x00"
 487        except UnicodeError:
 488            raise InvalidStringData("strings in documents must be valid " "UTF-8: %r" % string)
 489    else:
 490        if "\x00" in string:
 491            raise InvalidDocument("BSON keys / regex patterns must not " "contain a NUL character")
 492        return _utf_8_encode(string)[0] + b"\x00"
 493
 494
 495def _make_c_string(string):
 496    """Make a 'C' string."""
 497    if isinstance(string, bytes):
 498        try:
 499            _utf_8_decode(string, None, True)
 500            return string + b"\x00"
 501        except UnicodeError:
 502            raise InvalidStringData("strings in documents must be valid " "UTF-8: %r" % string)
 503    else:
 504        return _utf_8_encode(string)[0] + b"\x00"
 505
 506
 507def _make_name(string):
 508    """Make a 'C' string suitable for a BSON key."""
 509    # Keys can only be text in python 3.
 510    if "\x00" in string:
 511        raise InvalidDocument("BSON keys / regex patterns must not " "contain a NUL character")
 512    return _utf_8_encode(string)[0] + b"\x00"
 513
 514
 515def _encode_float(name, value, dummy0, dummy1):
 516    """Encode a float."""
 517    return b"\x01" + name + _PACK_FLOAT(value)
 518
 519
 520def _encode_bytes(name, value, dummy0, dummy1):
 521    """Encode a python bytes."""
 522    # Python3 special case. Store 'bytes' as BSON binary subtype 0.
 523    return b"\x05" + name + _PACK_INT(len(value)) + b"\x00" + value
 524
 525
 526def _encode_mapping(name, value, check_keys, opts):
 527    """Encode a mapping type."""
 528    if _raw_document_class(value):
 529        return b"\x03" + name + value.raw
 530    data = b"".join([_element_to_bson(key, val, check_keys, opts) for key, val in value.items()])
 531    return b"\x03" + name + _PACK_INT(len(data) + 5) + data + b"\x00"
 532
 533
 534def _encode_dbref(name, value, check_keys, opts):
 535    """Encode bson.dbref.DBRef."""
 536    buf = bytearray(b"\x03" + name + b"\x00\x00\x00\x00")
 537    begin = len(buf) - 4
 538
 539    buf += _name_value_to_bson(b"$ref\x00", value.collection, check_keys, opts)
 540    buf += _name_value_to_bson(b"$id\x00", value.id, check_keys, opts)
 541    if value.database is not None:
 542        buf += _name_value_to_bson(b"$db\x00", value.database, check_keys, opts)
 543    for key, val in value._DBRef__kwargs.items():
 544        buf += _element_to_bson(key, val, check_keys, opts)
 545
 546    buf += b"\x00"
 547    buf[begin : begin + 4] = _PACK_INT(len(buf) - begin)
 548    return bytes(buf)
 549
 550
 551def _encode_list(name, value, check_keys, opts):
 552    """Encode a list/tuple."""
 553    lname = gen_list_name()
 554    data = b"".join([_name_value_to_bson(next(lname), item, check_keys, opts) for item in value])
 555    return b"\x04" + name + _PACK_INT(len(data) + 5) + data + b"\x00"
 556
 557
 558def _encode_text(name, value, dummy0, dummy1):
 559    """Encode a python str."""
 560    value = _utf_8_encode(value)[0]
 561    return b"\x02" + name + _PACK_INT(len(value) + 1) + value + b"\x00"
 562
 563
 564def _encode_binary(name, value, dummy0, dummy1):
 565    """Encode bson.binary.Binary."""
 566    subtype = value.subtype
 567    if subtype == 2:
 568        value = _PACK_INT(len(value)) + value
 569    return b"\x05" + name + _PACK_LENGTH_SUBTYPE(len(value), subtype) + value
 570
 571
 572def _encode_uuid(name, value, dummy, opts):
 573    """Encode uuid.UUID."""
 574    uuid_representation = opts.uuid_representation
 575    binval = Binary.from_uuid(value, uuid_representation=uuid_representation)
 576    return _encode_binary(name, binval, dummy, opts)
 577
 578
 579def _encode_objectid(name, value, dummy0, dummy1):
 580    """Encode bson.objectid.ObjectId."""
 581    return b"\x07" + name + value.binary
 582
 583
 584def _encode_bool(name, value, dummy0, dummy1):
 585    """Encode a python boolean (True/False)."""
 586    return b"\x08" + name + (value and b"\x01" or b"\x00")
 587
 588
 589def _encode_datetime(name, value, dummy0, dummy1):
 590    """Encode datetime.datetime."""
 591    millis = _datetime_to_millis(value)
 592    return b"\x09" + name + _PACK_LONG(millis)
 593
 594
 595def _encode_none(name, dummy0, dummy1, dummy2):
 596    """Encode python None."""
 597    return b"\x0A" + name
 598
 599
 600def _encode_regex(name, value, dummy0, dummy1):
 601    """Encode a python regex or bson.regex.Regex."""
 602    flags = value.flags
 603    # Python 3 common case
 604    if flags == re.UNICODE:
 605        return b"\x0B" + name + _make_c_string_check(value.pattern) + b"u\x00"
 606    elif flags == 0:
 607        return b"\x0B" + name + _make_c_string_check(value.pattern) + b"\x00"
 608    else:
 609        sflags = b""
 610        if flags & re.IGNORECASE:
 611            sflags += b"i"
 612        if flags & re.LOCALE:
 613            sflags += b"l"
 614        if flags & re.MULTILINE:
 615            sflags += b"m"
 616        if flags & re.DOTALL:
 617            sflags += b"s"
 618        if flags & re.UNICODE:
 619            sflags += b"u"
 620        if flags & re.VERBOSE:
 621            sflags += b"x"
 622        sflags += b"\x00"
 623        return b"\x0B" + name + _make_c_string_check(value.pattern) + sflags
 624
 625
 626def _encode_code(name, value, dummy, opts):
 627    """Encode bson.code.Code."""
 628    cstring = _make_c_string(value)
 629    cstrlen = len(cstring)
 630    if value.scope is None:
 631        return b"\x0D" + name + _PACK_INT(cstrlen) + cstring
 632    scope = _dict_to_bson(value.scope, False, opts, False)
 633    full_length = _PACK_INT(8 + cstrlen + len(scope))
 634    return b"\x0F" + name + full_length + _PACK_INT(cstrlen) + cstring + scope
 635
 636
 637def _encode_int(name, value, dummy0, dummy1):
 638    """Encode a python int."""
 639    if -2147483648 <= value <= 2147483647:
 640        return b"\x10" + name + _PACK_INT(value)
 641    else:
 642        try:
 643            return b"\x12" + name + _PACK_LONG(value)
 644        except struct.error:
 645            raise OverflowError("BSON can only handle up to 8-byte ints")
 646
 647
 648def _encode_timestamp(name, value, dummy0, dummy1):
 649    """Encode bson.timestamp.Timestamp."""
 650    return b"\x11" + name + _PACK_TIMESTAMP(value.inc, value.time)
 651
 652
 653def _encode_long(name, value, dummy0, dummy1):
 654    """Encode a python long (python 2.x)"""
 655    try:
 656        return b"\x12" + name + _PACK_LONG(value)
 657    except struct.error:
 658        raise OverflowError("BSON can only handle up to 8-byte ints")
 659
 660
 661def _encode_decimal128(name, value, dummy0, dummy1):
 662    """Encode bson.decimal128.Decimal128."""
 663    return b"\x13" + name + value.bid
 664
 665
 666def _encode_minkey(name, dummy0, dummy1, dummy2):
 667    """Encode bson.min_key.MinKey."""
 668    return b"\xFF" + name
 669
 670
 671def _encode_maxkey(name, dummy0, dummy1, dummy2):
 672    """Encode bson.max_key.MaxKey."""
 673    return b"\x7F" + name
 674
 675
 676# Each encoder function's signature is:
 677#   - name: utf-8 bytes
 678#   - value: a Python data type, e.g. a Python int for _encode_int
 679#   - check_keys: bool, whether to check for invalid names
 680#   - opts: a CodecOptions
 681_ENCODERS = {
 682    bool: _encode_bool,
 683    bytes: _encode_bytes,
 684    datetime.datetime: _encode_datetime,
 685    dict: _encode_mapping,
 686    float: _encode_float,
 687    int: _encode_int,
 688    list: _encode_list,
 689    str: _encode_text,
 690    tuple: _encode_list,
 691    type(None): _encode_none,
 692    uuid.UUID: _encode_uuid,
 693    Binary: _encode_binary,
 694    Int64: _encode_long,
 695    Code: _encode_code,
 696    DBRef: _encode_dbref,
 697    MaxKey: _encode_maxkey,
 698    MinKey: _encode_minkey,
 699    ObjectId: _encode_objectid,
 700    Regex: _encode_regex,
 701    RE_TYPE: _encode_regex,
 702    SON: _encode_mapping,
 703    Timestamp: _encode_timestamp,
 704    Decimal128: _encode_decimal128,
 705    # Special case. This will never be looked up directly.
 706    _abc.Mapping: _encode_mapping,
 707}
 708
 709
 710_MARKERS = {
 711    5: _encode_binary,
 712    7: _encode_objectid,
 713    11: _encode_regex,
 714    13: _encode_code,
 715    17: _encode_timestamp,
 716    18: _encode_long,
 717    100: _encode_dbref,
 718    127: _encode_maxkey,
 719    255: _encode_minkey,
 720}
 721
 722
 723_BUILT_IN_TYPES = tuple(t for t in _ENCODERS)
 724
 725
 726def _name_value_to_bson(
 727    name, value, check_keys, opts, in_custom_call=False, in_fallback_call=False
 728):
 729    """Encode a single name, value pair."""
 730    # First see if the type is already cached. KeyError will only ever
 731    # happen once per subtype.
 732    try:
 733        return _ENCODERS[type(value)](name, value, check_keys, opts)
 734    except KeyError:
 735        pass
 736
 737    # Second, fall back to trying _type_marker. This has to be done
 738    # before the loop below since users could subclass one of our
 739    # custom types that subclasses a python built-in (e.g. Binary)
 740    marker = getattr(value, "_type_marker", None)
 741    if isinstance(marker, int) and marker in _MARKERS:
 742        func = _MARKERS[marker]
 743        # Cache this type for faster subsequent lookup.
 744        _ENCODERS[type(value)] = func
 745        return func(name, value, check_keys, opts)
 746
 747    # Third, check if a type encoder is registered for this type.
 748    # Note that subtypes of registered custom types are not auto-encoded.
 749    if not in_custom_call and opts.type_registry._encoder_map:
 750        custom_encoder = opts.type_registry._encoder_map.get(type(value))
 751        if custom_encoder is not None:
 752            return _name_value_to_bson(
 753                name, custom_encoder(value), check_keys, opts, in_custom_call=True
 754            )
 755
 756    # Fourth, test each base type. This will only happen once for
 757    # a subtype of a supported base type. Unlike in the C-extensions, this
 758    # is done after trying the custom type encoder because checking for each
 759    # subtype is expensive.
 760    for base in _BUILT_IN_TYPES:
 761        if isinstance(value, base):
 762            func = _ENCODERS[base]
 763            # Cache this type for faster subsequent lookup.
 764            _ENCODERS[type(value)] = func
 765            return func(name, value, check_keys, opts)
 766
 767    # As a last resort, try using the fallback encoder, if the user has
 768    # provided one.
 769    fallback_encoder = opts.type_registry._fallback_encoder
 770    if not in_fallback_call and fallback_encoder is not None:
 771        return _name_value_to_bson(
 772            name, fallback_encoder(value), check_keys, opts, in_fallback_call=True
 773        )
 774
 775    raise InvalidDocument("cannot encode object: %r, of type: %r" % (value, type(value)))
 776
 777
 778def _element_to_bson(key, value, check_keys, opts):
 779    """Encode a single key, value pair."""
 780    if not isinstance(key, str):
 781        raise InvalidDocument("documents must have only string keys, " "key was %r" % (key,))
 782    if check_keys:
 783        if key.startswith("$"):
 784            raise InvalidDocument("key %r must not start with '$'" % (key,))
 785        if "." in key:
 786            raise InvalidDocument("key %r must not contain '.'" % (key,))
 787
 788    name = _make_name(key)
 789    return _name_value_to_bson(name, value, check_keys, opts)
 790
 791
 792def _dict_to_bson(doc, check_keys, opts, top_level=True):
 793    """Encode a document to BSON."""
 794    if _raw_document_class(doc):
 795        return doc.raw
 796    try:
 797        elements = []
 798        if top_level and "_id" in doc:
 799            elements.append(_name_value_to_bson(b"_id\x00", doc["_id"], check_keys, opts))
 800        for key, value in doc.items():
 801            if not top_level or key != "_id":
 802                elements.append(_element_to_bson(key, value, check_keys, opts))
 803    except AttributeError:
 804        raise TypeError("encoder expected a mapping type but got: %r" % (doc,))
 805
 806    encoded = b"".join(elements)
 807    return _PACK_INT(len(encoded) + 5) + encoded + b"\x00"
 808
 809
 810if _USE_C:
 811    _dict_to_bson = _cbson._dict_to_bson
 812
 813
 814def _millis_to_datetime(millis, opts):
 815    """Convert milliseconds since epoch UTC to datetime."""
 816    diff = ((millis % 1000) + 1000) % 1000
 817    seconds = (millis - diff) // 1000
 818    micros = diff * 1000
 819    if opts.tz_aware:
 820        dt = EPOCH_AWARE + datetime.timedelta(seconds=seconds, microseconds=micros)
 821        if opts.tzinfo:
 822            dt = dt.astimezone(opts.tzinfo)
 823        return dt
 824    else:
 825        return EPOCH_NAIVE + datetime.timedelta(seconds=seconds, microseconds=micros)
 826
 827
 828def _datetime_to_millis(dtm):
 829    """Convert datetime to milliseconds since epoch UTC."""
 830    if dtm.utcoffset() is not None:
 831        dtm = dtm - dtm.utcoffset()
 832    return int(calendar.timegm(dtm.timetuple()) * 1000 + dtm.microsecond // 1000)
 833
 834
 835_CODEC_OPTIONS_TYPE_ERROR = TypeError("codec_options must be an instance of CodecOptions")
 836
 837
 838def encode(document, check_keys=False, codec_options=DEFAULT_CODEC_OPTIONS):
 839    """Encode a document to BSON.
 840
 841    A document can be any mapping type (like :class:`dict`).
 842
 843    Raises :class:`TypeError` if `document` is not a mapping type,
 844    or contains keys that are not instances of
 845    :class:`basestring` (:class:`str` in python 3). Raises
 846    :class:`~bson.errors.InvalidDocument` if `document` cannot be
 847    converted to :class:`BSON`.
 848
 849    :Parameters:
 850      - `document`: mapping type representing a document
 851      - `check_keys` (optional): check if keys start with '$' or
 852        contain '.', raising :class:`~bson.errors.InvalidDocument` in
 853        either case
 854      - `codec_options` (optional): An instance of
 855        :class:`~bson.codec_options.CodecOptions`.
 856
 857    .. versionadded:: 3.9
 858    """
 859    if not isinstance(codec_options, CodecOptions):
 860        raise _CODEC_OPTIONS_TYPE_ERROR
 861
 862    return _dict_to_bson(document, check_keys, codec_options)
 863
 864
 865def decode(data, codec_options=DEFAULT_CODEC_OPTIONS):
 866    """Decode BSON to a document.
 867
 868    By default, returns a BSON document represented as a Python
 869    :class:`dict`. To use a different :class:`MutableMapping` class,
 870    configure a :class:`~bson.codec_options.CodecOptions`::
 871
 872        >>> import collections  # From Python standard library.
 873        >>> import bson
 874        >>> from .codec_options import CodecOptions
 875        >>> data = bson.encode({'a': 1})
 876        >>> decoded_doc = bson.decode(data)
 877        <type 'dict'>
 878        >>> options = CodecOptions(document_class=collections.OrderedDict)
 879        >>> decoded_doc = bson.decode(data, codec_options=options)
 880        >>> type(decoded_doc)
 881        <class 'collections.OrderedDict'>
 882
 883    :Parameters:
 884      - `data`: the BSON to decode. Any bytes-like object that implements
 885        the buffer protocol.
 886      - `codec_options` (optional): An instance of
 887        :class:`~bson.codec_options.CodecOptions`.
 888
 889    .. versionadded:: 3.9
 890    """
 891    if not isinstance(codec_options, CodecOptions):
 892        raise _CODEC_OPTIONS_TYPE_ERROR
 893
 894    return _bson_to_dict(data, codec_options)
 895
 896
 897def decode_all(data, codec_options=DEFAULT_CODEC_OPTIONS):
 898    """Decode BSON data to multiple documents.
 899
 900    `data` must be a bytes-like object implementing the buffer protocol that
 901    provides concatenated, valid, BSON-encoded documents.
 902
 903    :Parameters:
 904      - `data`: BSON data
 905      - `codec_options` (optional): An instance of
 906        :class:`~bson.codec_options.CodecOptions`.
 907
 908    .. versionchanged:: 3.9
 909       Supports bytes-like objects that implement the buffer protocol.
 910
 911    .. versionchanged:: 3.0
 912       Removed `compile_re` option: PyMongo now always represents BSON regular
 913       expressions as :class:`~bson.regex.Regex` objects. Use
 914       :meth:`~bson.regex.Regex.try_compile` to attempt to convert from a
 915       BSON regular expression to a Python regular expression object.
 916
 917       Replaced `as_class`, `tz_aware`, and `uuid_subtype` options with
 918       `codec_options`.
 919    """
 920    data, view = get_data_and_view(data)
 921    if not isinstance(codec_options, CodecOptions):
 922        raise _CODEC_OPTIONS_TYPE_ERROR
 923
 924    data_len = len(data)
 925    docs = []
 926    position = 0
 927    end = data_len - 1
 928    use_raw = _raw_document_class(codec_options.document_class)
 929    try:
 930        while position < end:
 931            obj_size = _UNPACK_INT_FROM(data, position)[0]
 932            if data_len - position < obj_size:
 933                raise InvalidBSON("invalid object size")
 934            obj_end = position + obj_size - 1
 935            if data[obj_end] != 0:
 936                raise InvalidBSON("bad eoo")
 937            if use_raw:
 938                docs.append(
 939                    codec_options.document_class(data[position : obj_end + 1], codec_options)
 940                )
 941            else:
 942                docs.append(_elements_to_dict(data, view, position + 4, obj_end, codec_options))
 943            position += obj_size
 944        return docs
 945    except InvalidBSON:
 946        raise
 947    except Exception:
 948        # Change exception type to InvalidBSON but preserve traceback.
 949        _, exc_value, exc_tb = sys.exc_info()
 950        raise InvalidBSON(str(exc_value)).with_traceback(exc_tb)
 951
 952
 953if _USE_C:
 954    decode_all = _cbson.decode_all
 955
 956
 957def _decode_selective(rawdoc, fields, codec_options):
 958    if _raw_document_class(codec_options.document_class):
 959        # If document_class is RawBSONDocument, use vanilla dictionary for
 960        # decoding command response.
 961        doc = {}
 962    else:
 963        # Else, use the specified document_class.
 964        doc = codec_options.document_class()
 965    for key, value in rawdoc.items():
 966        if key in fields:
 967            if fields[key] == 1:
 968                doc[key] = _bson_to_dict(rawdoc.raw, codec_options)[key]
 969            else:
 970                doc[key] = _decode_selective(value, fields[key], codec_options)
 971        else:
 972            doc[key] = value
 973    return doc
 974
 975
 976def _convert_raw_document_lists_to_streams(document):
 977    cursor = document.get("cursor")
 978    if cursor:
 979        for key in ("firstBatch", "nextBatch"):
 980            batch = cursor.get(key)
 981            if batch:
 982                stream = b"".join(doc.raw for doc in batch)
 983                cursor[key] = [stream]
 984
 985
 986def _decode_all_selective(data, codec_options, fields):
 987    """Decode BSON data to a single document while using user-provided
 988    custom decoding logic.
 989
 990    `data` must be a string representing a valid, BSON-encoded document.
 991
 992    :Parameters:
 993      - `data`: BSON data
 994      - `codec_options`: An instance of
 995        :class:`~bson.codec_options.CodecOptions` with user-specified type
 996        decoders. If no decoders are found, this method is the same as
 997        ``decode_all``.
 998      - `fields`: Map of document namespaces where data that needs
 999        to be custom decoded lives or None. For example, to custom decode a
1000        list of objects in 'field1.subfield1', the specified value should be
1001        ``{'field1': {'subfield1': 1}}``. If ``fields``  is an empty map or
1002        None, this method is the same as ``decode_all``.
1003
1004    :Returns:
1005      - `document_list`: Single-member list containing the decoded document.
1006
1007    .. versionadded:: 3.8
1008    """
1009    if not codec_options.type_registry._decoder_map:
1010        return decode_all(data, codec_options)
1011
1012    if not fields:
1013        return decode_all(data, codec_options.with_options(type_registry=None))
1014
1015    # Decode documents for internal use.
1016    from .raw_bson import RawBSONDocument
1017
1018    internal_codec_options = codec_options.with_options(
1019        document_class=RawBSONDocument, type_registry=None
1020    )
1021    _doc = _bson_to_dict(data, internal_codec_options)
1022    return [
1023        _decode_selective(
1024            _doc,
1025            fields,
1026            codec_options,
1027        )
1028    ]
1029
1030
1031def decode_iter(data, codec_options=DEFAULT_CODEC_OPTIONS):
1032    """Decode BSON data to multiple documents as a generator.
1033
1034    Works similarly to the decode_all function, but yields one document at a
1035    time.
1036
1037    `data` must be a string of concatenated, valid, BSON-encoded
1038    documents.
1039
1040    :Parameters:
1041      - `data`: BSON data
1042      - `codec_options` (optional): An instance of
1043        :class:`~bson.codec_options.CodecOptions`.
1044
1045    .. versionchanged:: 3.0
1046       Replaced `as_class`, `tz_aware`, and `uuid_subtype` options with
1047       `codec_options`.
1048
1049    .. versionadded:: 2.8
1050    """
1051    if not isinstance(codec_options, CodecOptions):
1052        raise _CODEC_OPTIONS_TYPE_ERROR
1053
1054    position = 0
1055    end = len(data) - 1
1056    while position < end:
1057        obj_size = _UNPACK_INT_FROM(data, position)[0]
1058        elements = data[position : position + obj_size]
1059        position += obj_size
1060
1061        yield _bson_to_dict(elements, codec_options)
1062
1063
1064def decode_file_iter(file_obj, codec_options=DEFAULT_CODEC_OPTIONS):
1065    """Decode bson data from a file to multiple documents as a generator.
1066
1067    Works similarly to the decode_all function, but reads from the file object
1068    in chunks and parses bson in chunks, yielding one document at a time.
1069
1070    :Parameters:
1071      - `file_obj`: A file object containing BSON data.
1072      - `codec_options` (optional): An instance of
1073        :class:`~bson.codec_options.CodecOptions`.
1074
1075    .. versionchanged:: 3.0
1076       Replaced `as_class`, `tz_aware`, and `uuid_subtype` options with
1077       `codec_options`.
1078
1079    .. versionadded:: 2.8
1080    """
1081    while True:
1082        # Read size of next object.
1083        size_data = file_obj.read(4)
1084        if not size_data:
1085            break  # Finished with file normaly.
1086        elif len(size_data) != 4:
1087            raise InvalidBSON("cut off in middle of objsize")
1088        obj_size = _UNPACK_INT_FROM(size_data, 0)[0] - 4
1089        elements = size_data + file_obj.read(max(0, obj_size))
1090        yield _bson_to_dict(elements, codec_options)
1091
1092
1093def is_valid(bson):
1094    """Check that the given string represents valid :class:`BSON` data.
1095
1096    Raises :class:`TypeError` if `bson` is not an instance of
1097    :class:`str` (:class:`bytes` in python 3). Returns ``True``
1098    if `bson` is valid :class:`BSON`, ``False`` otherwise.
1099
1100    :Parameters:
1101      - `bson`: the data to be validated
1102    """
1103    if not isinstance(bson, bytes):
1104        raise TypeError("BSON data must be an instance of a subclass of bytes")
1105
1106    try:
1107        _bson_to_dict(bson, DEFAULT_CODEC_OPTIONS)
1108        return True
1109    except Exception:
1110        return False
1111
1112
1113class BSON(bytes):
1114    """BSON (Binary JSON) data.
1115
1116    .. warning:: Using this class to encode and decode BSON adds a performance
1117       cost. For better performance use the module level functions
1118       :func:`encode` and :func:`decode` instead.
1119    """
1120
1121    @classmethod
1122    def encode(cls, document, check_keys=False, codec_options=DEFAULT_CODEC_OPTIONS):
1123        """Encode a document to a new :class:`BSON` instance.
1124
1125        A document can be any mapping type (like :class:`dict`).
1126
1127        Raises :class:`TypeError` if `document` is not a mapping type,
1128        or contains keys that are not instances of
1129        :class:`basestring` (:class:`str` in python 3). Raises
1130        :class:`~bson.errors.InvalidDocument` if `document` cannot be
1131        converted to :class:`BSON`.
1132
1133        :Parameters:
1134          - `document`: mapping type representing a document
1135          - `check_keys` (optional): check if keys start with '$' or
1136            contain '.', raising :class:`~bson.errors.InvalidDocument` in
1137            either case
1138          - `codec_options` (optional): An instance of
1139            :class:`~bson.codec_options.CodecOptions`.
1140
1141        .. versionchanged:: 3.0
1142           Replaced `uuid_subtype` option with `codec_options`.
1143        """
1144        return cls(encode(document, check_keys, codec_options))
1145
1146    def decode(self, codec_options=DEFAULT_CODEC_OPTIONS):
1147        """Decode this BSON data.
1148
1149        By default, returns a BSON document represented as a Python
1150        :class:`dict`. To use a different :class:`MutableMapping` class,
1151        configure a :class:`~bson.codec_options.CodecOptions`::
1152
1153            >>> import collections  # From Python standard library.
1154            >>> import bson
1155            >>> from .codec_options import CodecOptions
1156            >>> data = bson.BSON.encode({'a': 1})
1157            >>> decoded_doc = bson.BSON(data).decode()
1158            <type 'dict'>
1159            >>> options = CodecOptions(document_class=collections.OrderedDict)
1160            >>> decoded_doc = bson.BSON(data).decode(codec_options=options)
1161            >>> type(decoded_doc)
1162            <class 'collections.OrderedDict'>
1163
1164        :Parameters:
1165          - `codec_options` (optional): An instance of
1166            :class:`~bson.codec_options.CodecOptions`.
1167
1168        .. versionchanged:: 3.0
1169           Removed `compile_re` option: PyMongo now always represents BSON
1170           regular expressions as :class:`~bson.regex.Regex` objects. Use
1171           :meth:`~bson.regex.Regex.try_compile` to attempt to convert from a
1172           BSON regular expression to a Python regular expression object.
1173
1174           Replaced `as_class`, `tz_aware`, and `uuid_subtype` options with
1175           `codec_options`.
1176        """
1177        return decode(self, codec_options)
1178
1179
1180def has_c():
1181    """Is the C extension installed?"""
1182    return _USE_C
EPOCH_AWARE = datetime.datetime(1970, 1, 1, 0, 0, tzinfo=<xtquant.xtbson.bson36.tz_util.FixedOffset object>)
EPOCH_NAIVE = datetime.datetime(1970, 1, 1, 0, 0)
BSONNUM = b'\x01'
BSONSTR = b'\x02'
BSONOBJ = b'\x03'
BSONARR = b'\x04'
BSONBIN = b'\x05'
BSONUND = b'\x06'
BSONOID = b'\x07'
BSONBOO = b'\x08'
BSONDAT = b'\t'
BSONNUL = b'\n'
BSONRGX = b'\x0b'
BSONREF = b'\x0c'
BSONCOD = b'\r'
BSONSYM = b'\x0e'
BSONCWS = b'\x0f'
BSONINT = b'\x10'
BSONTIM = b'\x11'
BSONLON = b'\x12'
BSONDEC = b'\x13'
BSONMIN = b'\xff'
BSONMAX = b'\x7f'
def get_data_and_view(data):
137def get_data_and_view(data):
138    if isinstance(data, (bytes, bytearray)):
139        return data, memoryview(data)
140    view = memoryview(data)
141    return view.tobytes(), view
def gen_list_name():
465def gen_list_name():
466    """Generate "keys" for encoded lists in the sequence
467    b"0\x00", b"1\x00", b"2\x00", ...
468
469    The first 1000 keys are returned from a pre-built cache. All
470    subsequent keys are generated on the fly.
471    """
472    for name in _LIST_NAMES:
473        yield name
474
475    counter = itertools.count(1000)
476    while True:
477        yield (str(next(counter)) + "\x00").encode("utf8")

Generate "keys" for encoded lists in the sequence b"0", b"1", b"2", ...

The first 1000 keys are returned from a pre-built cache. All subsequent keys are generated on the fly.

def encode( document, check_keys=False, codec_options=CodecOptions(document_class=dict, tz_aware=False, uuid_representation=UuidRepresentation.UNSPECIFIED, unicode_decode_error_handler='strict', tzinfo=None, type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None))):
839def encode(document, check_keys=False, codec_options=DEFAULT_CODEC_OPTIONS):
840    """Encode a document to BSON.
841
842    A document can be any mapping type (like :class:`dict`).
843
844    Raises :class:`TypeError` if `document` is not a mapping type,
845    or contains keys that are not instances of
846    :class:`basestring` (:class:`str` in python 3). Raises
847    :class:`~bson.errors.InvalidDocument` if `document` cannot be
848    converted to :class:`BSON`.
849
850    :Parameters:
851      - `document`: mapping type representing a document
852      - `check_keys` (optional): check if keys start with '$' or
853        contain '.', raising :class:`~bson.errors.InvalidDocument` in
854        either case
855      - `codec_options` (optional): An instance of
856        :class:`~bson.codec_options.CodecOptions`.
857
858    .. versionadded:: 3.9
859    """
860    if not isinstance(codec_options, CodecOptions):
861        raise _CODEC_OPTIONS_TYPE_ERROR
862
863    return _dict_to_bson(document, check_keys, codec_options)

Encode a document to BSON.

A document can be any mapping type (like dict).

Raises TypeError if document is not a mapping type, or contains keys that are not instances of basestring (str in python 3). Raises ~bson.errors.InvalidDocument if document cannot be converted to BSON.

:Parameters:

  • document: mapping type representing a document
  • check_keys (optional): check if keys start with '$' or contain '.', raising ~bson.errors.InvalidDocument in either case
  • codec_options (optional): An instance of ~bson.codec_options.CodecOptions.

New in version 3.9.

def decode( data, codec_options=CodecOptions(document_class=dict, tz_aware=False, uuid_representation=UuidRepresentation.UNSPECIFIED, unicode_decode_error_handler='strict', tzinfo=None, type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None))):
866def decode(data, codec_options=DEFAULT_CODEC_OPTIONS):
867    """Decode BSON to a document.
868
869    By default, returns a BSON document represented as a Python
870    :class:`dict`. To use a different :class:`MutableMapping` class,
871    configure a :class:`~bson.codec_options.CodecOptions`::
872
873        >>> import collections  # From Python standard library.
874        >>> import bson
875        >>> from .codec_options import CodecOptions
876        >>> data = bson.encode({'a': 1})
877        >>> decoded_doc = bson.decode(data)
878        <type 'dict'>
879        >>> options = CodecOptions(document_class=collections.OrderedDict)
880        >>> decoded_doc = bson.decode(data, codec_options=options)
881        >>> type(decoded_doc)
882        <class 'collections.OrderedDict'>
883
884    :Parameters:
885      - `data`: the BSON to decode. Any bytes-like object that implements
886        the buffer protocol.
887      - `codec_options` (optional): An instance of
888        :class:`~bson.codec_options.CodecOptions`.
889
890    .. versionadded:: 3.9
891    """
892    if not isinstance(codec_options, CodecOptions):
893        raise _CODEC_OPTIONS_TYPE_ERROR
894
895    return _bson_to_dict(data, codec_options)

Decode BSON to a document.

By default, returns a BSON document represented as a Python dict. To use a different MutableMapping class, configure a ~bson.codec_options.CodecOptions::

>>> import collections  # From Python standard library.
>>> import bson
>>> from .codec_options import CodecOptions
>>> data = bson.encode({'a': 1})
>>> decoded_doc = bson.decode(data)
<type 'dict'>
>>> options = CodecOptions(document_class=collections.OrderedDict)
>>> decoded_doc = bson.decode(data, codec_options=options)
>>> type(decoded_doc)
<class 'collections.OrderedDict'>

:Parameters:

  • data: the BSON to decode. Any bytes-like object that implements the buffer protocol.
  • codec_options (optional): An instance of ~bson.codec_options.CodecOptions.

New in version 3.9.

def decode_all( data, codec_options=CodecOptions(document_class=dict, tz_aware=False, uuid_representation=UuidRepresentation.UNSPECIFIED, unicode_decode_error_handler='strict', tzinfo=None, type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None))):
898def decode_all(data, codec_options=DEFAULT_CODEC_OPTIONS):
899    """Decode BSON data to multiple documents.
900
901    `data` must be a bytes-like object implementing the buffer protocol that
902    provides concatenated, valid, BSON-encoded documents.
903
904    :Parameters:
905      - `data`: BSON data
906      - `codec_options` (optional): An instance of
907        :class:`~bson.codec_options.CodecOptions`.
908
909    .. versionchanged:: 3.9
910       Supports bytes-like objects that implement the buffer protocol.
911
912    .. versionchanged:: 3.0
913       Removed `compile_re` option: PyMongo now always represents BSON regular
914       expressions as :class:`~bson.regex.Regex` objects. Use
915       :meth:`~bson.regex.Regex.try_compile` to attempt to convert from a
916       BSON regular expression to a Python regular expression object.
917
918       Replaced `as_class`, `tz_aware`, and `uuid_subtype` options with
919       `codec_options`.
920    """
921    data, view = get_data_and_view(data)
922    if not isinstance(codec_options, CodecOptions):
923        raise _CODEC_OPTIONS_TYPE_ERROR
924
925    data_len = len(data)
926    docs = []
927    position = 0
928    end = data_len - 1
929    use_raw = _raw_document_class(codec_options.document_class)
930    try:
931        while position < end:
932            obj_size = _UNPACK_INT_FROM(data, position)[0]
933            if data_len - position < obj_size:
934                raise InvalidBSON("invalid object size")
935            obj_end = position + obj_size - 1
936            if data[obj_end] != 0:
937                raise InvalidBSON("bad eoo")
938            if use_raw:
939                docs.append(
940                    codec_options.document_class(data[position : obj_end + 1], codec_options)
941                )
942            else:
943                docs.append(_elements_to_dict(data, view, position + 4, obj_end, codec_options))
944            position += obj_size
945        return docs
946    except InvalidBSON:
947        raise
948    except Exception:
949        # Change exception type to InvalidBSON but preserve traceback.
950        _, exc_value, exc_tb = sys.exc_info()
951        raise InvalidBSON(str(exc_value)).with_traceback(exc_tb)

Decode BSON data to multiple documents.

data must be a bytes-like object implementing the buffer protocol that provides concatenated, valid, BSON-encoded documents.

:Parameters:

  • data: BSON data
  • codec_options (optional): An instance of ~bson.codec_options.CodecOptions.

Changed in version 3.9: Supports bytes-like objects that implement the buffer protocol.

Changed in version 3.0: Removed compile_re option: PyMongo now always represents BSON regular expressions as ~bson.regex.Regex objects. Use ~bson.regex.Regex.try_compile() to attempt to convert from a BSON regular expression to a Python regular expression object.

Replaced as_class, tz_aware, and uuid_subtype options with codec_options.

def decode_iter( data, codec_options=CodecOptions(document_class=dict, tz_aware=False, uuid_representation=UuidRepresentation.UNSPECIFIED, unicode_decode_error_handler='strict', tzinfo=None, type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None))):
1032def decode_iter(data, codec_options=DEFAULT_CODEC_OPTIONS):
1033    """Decode BSON data to multiple documents as a generator.
1034
1035    Works similarly to the decode_all function, but yields one document at a
1036    time.
1037
1038    `data` must be a string of concatenated, valid, BSON-encoded
1039    documents.
1040
1041    :Parameters:
1042      - `data`: BSON data
1043      - `codec_options` (optional): An instance of
1044        :class:`~bson.codec_options.CodecOptions`.
1045
1046    .. versionchanged:: 3.0
1047       Replaced `as_class`, `tz_aware`, and `uuid_subtype` options with
1048       `codec_options`.
1049
1050    .. versionadded:: 2.8
1051    """
1052    if not isinstance(codec_options, CodecOptions):
1053        raise _CODEC_OPTIONS_TYPE_ERROR
1054
1055    position = 0
1056    end = len(data) - 1
1057    while position < end:
1058        obj_size = _UNPACK_INT_FROM(data, position)[0]
1059        elements = data[position : position + obj_size]
1060        position += obj_size
1061
1062        yield _bson_to_dict(elements, codec_options)

Decode BSON data to multiple documents as a generator.

Works similarly to the decode_all function, but yields one document at a time.

data must be a string of concatenated, valid, BSON-encoded documents.

:Parameters:

  • data: BSON data
  • codec_options (optional): An instance of ~bson.codec_options.CodecOptions.

Changed in version 3.0: Replaced as_class, tz_aware, and uuid_subtype options with codec_options.

New in version 2.8.

def decode_file_iter( file_obj, codec_options=CodecOptions(document_class=dict, tz_aware=False, uuid_representation=UuidRepresentation.UNSPECIFIED, unicode_decode_error_handler='strict', tzinfo=None, type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None))):
1065def decode_file_iter(file_obj, codec_options=DEFAULT_CODEC_OPTIONS):
1066    """Decode bson data from a file to multiple documents as a generator.
1067
1068    Works similarly to the decode_all function, but reads from the file object
1069    in chunks and parses bson in chunks, yielding one document at a time.
1070
1071    :Parameters:
1072      - `file_obj`: A file object containing BSON data.
1073      - `codec_options` (optional): An instance of
1074        :class:`~bson.codec_options.CodecOptions`.
1075
1076    .. versionchanged:: 3.0
1077       Replaced `as_class`, `tz_aware`, and `uuid_subtype` options with
1078       `codec_options`.
1079
1080    .. versionadded:: 2.8
1081    """
1082    while True:
1083        # Read size of next object.
1084        size_data = file_obj.read(4)
1085        if not size_data:
1086            break  # Finished with file normaly.
1087        elif len(size_data) != 4:
1088            raise InvalidBSON("cut off in middle of objsize")
1089        obj_size = _UNPACK_INT_FROM(size_data, 0)[0] - 4
1090        elements = size_data + file_obj.read(max(0, obj_size))
1091        yield _bson_to_dict(elements, codec_options)

Decode bson data from a file to multiple documents as a generator.

Works similarly to the decode_all function, but reads from the file object in chunks and parses bson in chunks, yielding one document at a time.

:Parameters:

  • file_obj: A file object containing BSON data.
  • codec_options (optional): An instance of ~bson.codec_options.CodecOptions.

Changed in version 3.0: Replaced as_class, tz_aware, and uuid_subtype options with codec_options.

New in version 2.8.

def is_valid(bson):
1094def is_valid(bson):
1095    """Check that the given string represents valid :class:`BSON` data.
1096
1097    Raises :class:`TypeError` if `bson` is not an instance of
1098    :class:`str` (:class:`bytes` in python 3). Returns ``True``
1099    if `bson` is valid :class:`BSON`, ``False`` otherwise.
1100
1101    :Parameters:
1102      - `bson`: the data to be validated
1103    """
1104    if not isinstance(bson, bytes):
1105        raise TypeError("BSON data must be an instance of a subclass of bytes")
1106
1107    try:
1108        _bson_to_dict(bson, DEFAULT_CODEC_OPTIONS)
1109        return True
1110    except Exception:
1111        return False

Check that the given string represents valid BSON data.

Raises TypeError if bson is not an instance of str (bytes in python 3). Returns True if bson is valid BSON, False otherwise.

:Parameters:

  • bson: the data to be validated
class BSON(builtins.bytes):
1114class BSON(bytes):
1115    """BSON (Binary JSON) data.
1116
1117    .. warning:: Using this class to encode and decode BSON adds a performance
1118       cost. For better performance use the module level functions
1119       :func:`encode` and :func:`decode` instead.
1120    """
1121
1122    @classmethod
1123    def encode(cls, document, check_keys=False, codec_options=DEFAULT_CODEC_OPTIONS):
1124        """Encode a document to a new :class:`BSON` instance.
1125
1126        A document can be any mapping type (like :class:`dict`).
1127
1128        Raises :class:`TypeError` if `document` is not a mapping type,
1129        or contains keys that are not instances of
1130        :class:`basestring` (:class:`str` in python 3). Raises
1131        :class:`~bson.errors.InvalidDocument` if `document` cannot be
1132        converted to :class:`BSON`.
1133
1134        :Parameters:
1135          - `document`: mapping type representing a document
1136          - `check_keys` (optional): check if keys start with '$' or
1137            contain '.', raising :class:`~bson.errors.InvalidDocument` in
1138            either case
1139          - `codec_options` (optional): An instance of
1140            :class:`~bson.codec_options.CodecOptions`.
1141
1142        .. versionchanged:: 3.0
1143           Replaced `uuid_subtype` option with `codec_options`.
1144        """
1145        return cls(encode(document, check_keys, codec_options))
1146
1147    def decode(self, codec_options=DEFAULT_CODEC_OPTIONS):
1148        """Decode this BSON data.
1149
1150        By default, returns a BSON document represented as a Python
1151        :class:`dict`. To use a different :class:`MutableMapping` class,
1152        configure a :class:`~bson.codec_options.CodecOptions`::
1153
1154            >>> import collections  # From Python standard library.
1155            >>> import bson
1156            >>> from .codec_options import CodecOptions
1157            >>> data = bson.BSON.encode({'a': 1})
1158            >>> decoded_doc = bson.BSON(data).decode()
1159            <type 'dict'>
1160            >>> options = CodecOptions(document_class=collections.OrderedDict)
1161            >>> decoded_doc = bson.BSON(data).decode(codec_options=options)
1162            >>> type(decoded_doc)
1163            <class 'collections.OrderedDict'>
1164
1165        :Parameters:
1166          - `codec_options` (optional): An instance of
1167            :class:`~bson.codec_options.CodecOptions`.
1168
1169        .. versionchanged:: 3.0
1170           Removed `compile_re` option: PyMongo now always represents BSON
1171           regular expressions as :class:`~bson.regex.Regex` objects. Use
1172           :meth:`~bson.regex.Regex.try_compile` to attempt to convert from a
1173           BSON regular expression to a Python regular expression object.
1174
1175           Replaced `as_class`, `tz_aware`, and `uuid_subtype` options with
1176           `codec_options`.
1177        """
1178        return decode(self, codec_options)

BSON (Binary JSON) data.

Using this class to encode and decode BSON adds a performance

cost. For better performance use the module level functions encode() and decode() instead.

@classmethod
def encode( cls, document, check_keys=False, codec_options=CodecOptions(document_class=dict, tz_aware=False, uuid_representation=UuidRepresentation.UNSPECIFIED, unicode_decode_error_handler='strict', tzinfo=None, type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None))):
1122    @classmethod
1123    def encode(cls, document, check_keys=False, codec_options=DEFAULT_CODEC_OPTIONS):
1124        """Encode a document to a new :class:`BSON` instance.
1125
1126        A document can be any mapping type (like :class:`dict`).
1127
1128        Raises :class:`TypeError` if `document` is not a mapping type,
1129        or contains keys that are not instances of
1130        :class:`basestring` (:class:`str` in python 3). Raises
1131        :class:`~bson.errors.InvalidDocument` if `document` cannot be
1132        converted to :class:`BSON`.
1133
1134        :Parameters:
1135          - `document`: mapping type representing a document
1136          - `check_keys` (optional): check if keys start with '$' or
1137            contain '.', raising :class:`~bson.errors.InvalidDocument` in
1138            either case
1139          - `codec_options` (optional): An instance of
1140            :class:`~bson.codec_options.CodecOptions`.
1141
1142        .. versionchanged:: 3.0
1143           Replaced `uuid_subtype` option with `codec_options`.
1144        """
1145        return cls(encode(document, check_keys, codec_options))

Encode a document to a new BSON instance.

A document can be any mapping type (like dict).

Raises TypeError if document is not a mapping type, or contains keys that are not instances of basestring (str in python 3). Raises ~bson.errors.InvalidDocument if document cannot be converted to BSON.

:Parameters:

  • document: mapping type representing a document
  • check_keys (optional): check if keys start with '$' or contain '.', raising ~bson.errors.InvalidDocument in either case
  • codec_options (optional): An instance of ~bson.codec_options.CodecOptions.

Changed in version 3.0: Replaced uuid_subtype option with codec_options.

def decode( self, codec_options=CodecOptions(document_class=dict, tz_aware=False, uuid_representation=UuidRepresentation.UNSPECIFIED, unicode_decode_error_handler='strict', tzinfo=None, type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None))):
1147    def decode(self, codec_options=DEFAULT_CODEC_OPTIONS):
1148        """Decode this BSON data.
1149
1150        By default, returns a BSON document represented as a Python
1151        :class:`dict`. To use a different :class:`MutableMapping` class,
1152        configure a :class:`~bson.codec_options.CodecOptions`::
1153
1154            >>> import collections  # From Python standard library.
1155            >>> import bson
1156            >>> from .codec_options import CodecOptions
1157            >>> data = bson.BSON.encode({'a': 1})
1158            >>> decoded_doc = bson.BSON(data).decode()
1159            <type 'dict'>
1160            >>> options = CodecOptions(document_class=collections.OrderedDict)
1161            >>> decoded_doc = bson.BSON(data).decode(codec_options=options)
1162            >>> type(decoded_doc)
1163            <class 'collections.OrderedDict'>
1164
1165        :Parameters:
1166          - `codec_options` (optional): An instance of
1167            :class:`~bson.codec_options.CodecOptions`.
1168
1169        .. versionchanged:: 3.0
1170           Removed `compile_re` option: PyMongo now always represents BSON
1171           regular expressions as :class:`~bson.regex.Regex` objects. Use
1172           :meth:`~bson.regex.Regex.try_compile` to attempt to convert from a
1173           BSON regular expression to a Python regular expression object.
1174
1175           Replaced `as_class`, `tz_aware`, and `uuid_subtype` options with
1176           `codec_options`.
1177        """
1178        return decode(self, codec_options)

Decode this BSON data.

By default, returns a BSON document represented as a Python dict. To use a different MutableMapping class, configure a ~bson.codec_options.CodecOptions::

>>> import collections  # From Python standard library.
>>> import bson
>>> from .codec_options import CodecOptions
>>> data = bson.BSON.encode({'a': 1})
>>> decoded_doc = bson.BSON(data).decode()
<type 'dict'>
>>> options = CodecOptions(document_class=collections.OrderedDict)
>>> decoded_doc = bson.BSON(data).decode(codec_options=options)
>>> type(decoded_doc)
<class 'collections.OrderedDict'>

:Parameters:

  • codec_options (optional): An instance of ~bson.codec_options.CodecOptions.

Changed in version 3.0: Removed compile_re option: PyMongo now always represents BSON regular expressions as ~bson.regex.Regex objects. Use ~bson.regex.Regex.try_compile() to attempt to convert from a BSON regular expression to a Python regular expression object.

Replaced as_class, tz_aware, and uuid_subtype options with codec_options.

Inherited Members
builtins.bytes
capitalize
center
count
endswith
expandtabs
find
fromhex
hex
index
isalnum
isalpha
isascii
isdigit
islower
isspace
istitle
isupper
join
ljust
lower
lstrip
maketrans
partition
replace
removeprefix
removesuffix
rfind
rindex
rjust
rpartition
rsplit
rstrip
split
splitlines
startswith
strip
swapcase
title
translate
upper
zfill
def has_c():
1181def has_c():
1182    """Is the C extension installed?"""
1183    return _USE_C

Is the C extension installed?