xtquant.xtbson.bson36
BSON (Binary JSON) encoding and decoding.
The mapping from Python types to BSON types is as follows:
======================================= ============= ===================
Python Type BSON Type Supported Direction
======================================= ============= ===================
None null both
bool boolean both
int 1 int32 / int64 py -> bson
bson.int64.Int64
int64 both
float number (real) both
str string both
list array both
dict / SON
object both
datetime.datetime 2 3 date both
bson.regex.Regex
regex both
compiled re 4 regex py -> bson
bson.binary.Binary
binary both
bson.objectid.ObjectId
oid both
bson.dbref.DBRef
dbref both
None undefined bson -> py
bson.code.Code
code both
str symbol bson -> py
bytes 5 binary both
======================================= ============= ===================
-
A Python int will be saved as a BSON int32 or BSON int64 depending on its size. A BSON int32 will always decode to a Python int. A BSON int64 will always decode to a
~bson.int64.Int64
. ↩ -
datetime.datetime instances will be rounded to the nearest millisecond when saved ↩
-
all datetime.datetime instances are treated as naive. clients should always use UTC. ↩
-
~bson.regex.Regex
instances and regular expression objects fromre.compile()
are both saved as BSON regular expressions. BSON regular expressions are decoded as~bson.regex.Regex
instances. ↩ -
The bytes type is encoded as BSON binary with subtype 0. It will be decoded back to bytes. ↩
1# Copyright 2009-present MongoDB, Inc. 2# 3# Licensed under the Apache License, Version 2.0 (the "License"); 4# you may not use this file except in compliance with the License. 5# You may obtain a copy of the License at 6# 7# http://www.apache.org/licenses/LICENSE-2.0 8# 9# Unless required by applicable law or agreed to in writing, software 10# distributed under the License is distributed on an "AS IS" BASIS, 11# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12# See the License for the specific language governing permissions and 13# limitations under the License. 14 15"""BSON (Binary JSON) encoding and decoding. 16 17The mapping from Python types to BSON types is as follows: 18 19======================================= ============= =================== 20Python Type BSON Type Supported Direction 21======================================= ============= =================== 22None null both 23bool boolean both 24int [#int]_ int32 / int64 py -> bson 25`bson.int64.Int64` int64 both 26float number (real) both 27str string both 28list array both 29dict / `SON` object both 30datetime.datetime [#dt]_ [#dt2]_ date both 31`bson.regex.Regex` regex both 32compiled re [#re]_ regex py -> bson 33`bson.binary.Binary` binary both 34`bson.objectid.ObjectId` oid both 35`bson.dbref.DBRef` dbref both 36None undefined bson -> py 37`bson.code.Code` code both 38str symbol bson -> py 39bytes [#bytes]_ binary both 40======================================= ============= =================== 41 42.. [#int] A Python int will be saved as a BSON int32 or BSON int64 depending 43 on its size. A BSON int32 will always decode to a Python int. A BSON 44 int64 will always decode to a :class:`~bson.int64.Int64`. 45.. [#dt] datetime.datetime instances will be rounded to the nearest 46 millisecond when saved 47.. [#dt2] all datetime.datetime instances are treated as *naive*. clients 48 should always use UTC. 49.. [#re] :class:`~bson.regex.Regex` instances and regular expression 50 objects from ``re.compile()`` are both saved as BSON regular expressions. 51 BSON regular expressions are decoded as :class:`~bson.regex.Regex` 52 instances. 53.. [#bytes] The bytes type is encoded as BSON binary with 54 subtype 0. It will be decoded back to bytes. 55""" 56 57import calendar 58import datetime 59import itertools 60import platform 61import re 62import struct 63import sys 64import uuid 65from codecs import utf_8_decode as _utf_8_decode 66from codecs import utf_8_encode as _utf_8_encode 67from collections import abc as _abc 68 69from .binary import ( 70 ALL_UUID_SUBTYPES, 71 CSHARP_LEGACY, 72 JAVA_LEGACY, 73 OLD_UUID_SUBTYPE, 74 STANDARD, 75 UUID_SUBTYPE, 76 Binary, 77 UuidRepresentation, 78) 79from .code import Code 80from .codec_options import DEFAULT_CODEC_OPTIONS, CodecOptions, _raw_document_class 81from .dbref import DBRef 82from .decimal128 import Decimal128 83from .errors import InvalidBSON, InvalidDocument, InvalidStringData 84from .int64 import Int64 85from .max_key import MaxKey 86from .min_key import MinKey 87from .objectid import ObjectId 88from .regex import Regex 89from .son import RE_TYPE, SON 90from .timestamp import Timestamp 91from .tz_util import utc 92 93try: 94 from . import _cbson 95 96 _USE_C = True 97except ImportError: 98 _USE_C = False 99 100 101EPOCH_AWARE = datetime.datetime.fromtimestamp(0, utc) 102EPOCH_NAIVE = datetime.datetime.utcfromtimestamp(0) 103 104 105BSONNUM = b"\x01" # Floating point 106BSONSTR = b"\x02" # UTF-8 string 107BSONOBJ = b"\x03" # Embedded document 108BSONARR = b"\x04" # Array 109BSONBIN = b"\x05" # Binary 110BSONUND = b"\x06" # Undefined 111BSONOID = b"\x07" # ObjectId 112BSONBOO = b"\x08" # Boolean 113BSONDAT = b"\x09" # UTC Datetime 114BSONNUL = b"\x0A" # Null 115BSONRGX = b"\x0B" # Regex 116BSONREF = b"\x0C" # DBRef 117BSONCOD = b"\x0D" # Javascript code 118BSONSYM = b"\x0E" # Symbol 119BSONCWS = b"\x0F" # Javascript code with scope 120BSONINT = b"\x10" # 32bit int 121BSONTIM = b"\x11" # Timestamp 122BSONLON = b"\x12" # 64bit int 123BSONDEC = b"\x13" # Decimal128 124BSONMIN = b"\xFF" # Min key 125BSONMAX = b"\x7F" # Max key 126 127 128_UNPACK_FLOAT_FROM = struct.Struct("<d").unpack_from 129_UNPACK_INT = struct.Struct("<i").unpack 130_UNPACK_INT_FROM = struct.Struct("<i").unpack_from 131_UNPACK_LENGTH_SUBTYPE_FROM = struct.Struct("<iB").unpack_from 132_UNPACK_LONG_FROM = struct.Struct("<q").unpack_from 133_UNPACK_TIMESTAMP_FROM = struct.Struct("<II").unpack_from 134 135 136def get_data_and_view(data): 137 if isinstance(data, (bytes, bytearray)): 138 return data, memoryview(data) 139 view = memoryview(data) 140 return view.tobytes(), view 141 142 143def _raise_unknown_type(element_type, element_name): 144 """Unknown type helper.""" 145 raise InvalidBSON( 146 "Detected unknown BSON type %r for fieldname '%s'. Are " 147 "you using the latest driver version?" % (chr(element_type).encode(), element_name) 148 ) 149 150 151def _get_int(data, view, position, dummy0, dummy1, dummy2): 152 """Decode a BSON int32 to python int.""" 153 return _UNPACK_INT_FROM(data, position)[0], position + 4 154 155 156def _get_c_string(data, view, position, opts): 157 """Decode a BSON 'C' string to python str.""" 158 end = data.index(b"\x00", position) 159 return _utf_8_decode(view[position:end], opts.unicode_decode_error_handler, True)[0], end + 1 160 161 162def _get_float(data, view, position, dummy0, dummy1, dummy2): 163 """Decode a BSON double to python float.""" 164 return _UNPACK_FLOAT_FROM(data, position)[0], position + 8 165 166 167def _get_string(data, view, position, obj_end, opts, dummy): 168 """Decode a BSON string to python str.""" 169 length = _UNPACK_INT_FROM(data, position)[0] 170 position += 4 171 if length < 1 or obj_end - position < length: 172 raise InvalidBSON("invalid string length") 173 end = position + length - 1 174 if data[end] != 0: 175 raise InvalidBSON("invalid end of string") 176 return _utf_8_decode(view[position:end], opts.unicode_decode_error_handler, True)[0], end + 1 177 178 179def _get_object_size(data, position, obj_end): 180 """Validate and return a BSON document's size.""" 181 try: 182 obj_size = _UNPACK_INT_FROM(data, position)[0] 183 except struct.error as exc: 184 raise InvalidBSON(str(exc)) 185 end = position + obj_size - 1 186 if data[end] != 0: 187 raise InvalidBSON("bad eoo") 188 if end >= obj_end: 189 raise InvalidBSON("invalid object length") 190 # If this is the top-level document, validate the total size too. 191 if position == 0 and obj_size != obj_end: 192 raise InvalidBSON("invalid object length") 193 return obj_size, end 194 195 196def _get_object(data, view, position, obj_end, opts, dummy): 197 """Decode a BSON subdocument to opts.document_class or bson.dbref.DBRef.""" 198 obj_size, end = _get_object_size(data, position, obj_end) 199 if _raw_document_class(opts.document_class): 200 return (opts.document_class(data[position : end + 1], opts), position + obj_size) 201 202 obj = _elements_to_dict(data, view, position + 4, end, opts) 203 204 position += obj_size 205 # If DBRef validation fails, return a normal doc. 206 if ( 207 isinstance(obj.get("$ref"), str) 208 and "$id" in obj 209 and isinstance(obj.get("$db"), (str, type(None))) 210 ): 211 return (DBRef(obj.pop("$ref"), obj.pop("$id", None), obj.pop("$db", None), obj), position) 212 return obj, position 213 214 215def _get_array(data, view, position, obj_end, opts, element_name): 216 """Decode a BSON array to python list.""" 217 size = _UNPACK_INT_FROM(data, position)[0] 218 end = position + size - 1 219 if data[end] != 0: 220 raise InvalidBSON("bad eoo") 221 222 position += 4 223 end -= 1 224 result = [] 225 226 # Avoid doing global and attribute lookups in the loop. 227 append = result.append 228 index = data.index 229 getter = _ELEMENT_GETTER 230 decoder_map = opts.type_registry._decoder_map 231 232 while position < end: 233 element_type = data[position] 234 # Just skip the keys. 235 position = index(b"\x00", position) + 1 236 try: 237 value, position = getter[element_type]( 238 data, view, position, obj_end, opts, element_name 239 ) 240 except KeyError: 241 _raise_unknown_type(element_type, element_name) 242 243 if decoder_map: 244 custom_decoder = decoder_map.get(type(value)) 245 if custom_decoder is not None: 246 value = custom_decoder(value) 247 248 append(value) 249 250 if position != end + 1: 251 raise InvalidBSON("bad array length") 252 return result, position + 1 253 254 255def _get_binary(data, view, position, obj_end, opts, dummy1): 256 """Decode a BSON binary to bson.binary.Binary or python UUID.""" 257 length, subtype = _UNPACK_LENGTH_SUBTYPE_FROM(data, position) 258 position += 5 259 if subtype == 2: 260 length2 = _UNPACK_INT_FROM(data, position)[0] 261 position += 4 262 if length2 != length - 4: 263 raise InvalidBSON("invalid binary (st 2) - lengths don't match!") 264 length = length2 265 end = position + length 266 if length < 0 or end > obj_end: 267 raise InvalidBSON("bad binary object length") 268 269 # Convert UUID subtypes to native UUIDs. 270 if subtype in ALL_UUID_SUBTYPES: 271 uuid_rep = opts.uuid_representation 272 binary_value = Binary(data[position:end], subtype) 273 if ( 274 (uuid_rep == UuidRepresentation.UNSPECIFIED) 275 or (subtype == UUID_SUBTYPE and uuid_rep != STANDARD) 276 or (subtype == OLD_UUID_SUBTYPE and uuid_rep == STANDARD) 277 ): 278 return binary_value, end 279 return binary_value.as_uuid(uuid_rep), end 280 281 # Decode subtype 0 to 'bytes'. 282 if subtype == 0: 283 value = data[position:end] 284 else: 285 value = Binary(data[position:end], subtype) 286 287 return value, end 288 289 290def _get_oid(data, view, position, dummy0, dummy1, dummy2): 291 """Decode a BSON ObjectId to bson.objectid.ObjectId.""" 292 end = position + 12 293 return ObjectId(data[position:end]), end 294 295 296def _get_boolean(data, view, position, dummy0, dummy1, dummy2): 297 """Decode a BSON true/false to python True/False.""" 298 end = position + 1 299 boolean_byte = data[position:end] 300 if boolean_byte == b"\x00": 301 return False, end 302 elif boolean_byte == b"\x01": 303 return True, end 304 raise InvalidBSON("invalid boolean value: %r" % boolean_byte) 305 306 307def _get_date(data, view, position, dummy0, opts, dummy1): 308 """Decode a BSON datetime to python datetime.datetime.""" 309 return _millis_to_datetime(_UNPACK_LONG_FROM(data, position)[0], opts), position + 8 310 311 312def _get_code(data, view, position, obj_end, opts, element_name): 313 """Decode a BSON code to bson.code.Code.""" 314 code, position = _get_string(data, view, position, obj_end, opts, element_name) 315 return Code(code), position 316 317 318def _get_code_w_scope(data, view, position, obj_end, opts, element_name): 319 """Decode a BSON code_w_scope to bson.code.Code.""" 320 code_end = position + _UNPACK_INT_FROM(data, position)[0] 321 code, position = _get_string(data, view, position + 4, code_end, opts, element_name) 322 scope, position = _get_object(data, view, position, code_end, opts, element_name) 323 if position != code_end: 324 raise InvalidBSON("scope outside of javascript code boundaries") 325 return Code(code, scope), position 326 327 328def _get_regex(data, view, position, dummy0, opts, dummy1): 329 """Decode a BSON regex to bson.regex.Regex or a python pattern object.""" 330 pattern, position = _get_c_string(data, view, position, opts) 331 bson_flags, position = _get_c_string(data, view, position, opts) 332 bson_re = Regex(pattern, bson_flags) 333 return bson_re, position 334 335 336def _get_ref(data, view, position, obj_end, opts, element_name): 337 """Decode (deprecated) BSON DBPointer to bson.dbref.DBRef.""" 338 collection, position = _get_string(data, view, position, obj_end, opts, element_name) 339 oid, position = _get_oid(data, view, position, obj_end, opts, element_name) 340 return DBRef(collection, oid), position 341 342 343def _get_timestamp(data, view, position, dummy0, dummy1, dummy2): 344 """Decode a BSON timestamp to bson.timestamp.Timestamp.""" 345 inc, timestamp = _UNPACK_TIMESTAMP_FROM(data, position) 346 return Timestamp(timestamp, inc), position + 8 347 348 349def _get_int64(data, view, position, dummy0, dummy1, dummy2): 350 """Decode a BSON int64 to bson.int64.Int64.""" 351 return Int64(_UNPACK_LONG_FROM(data, position)[0]), position + 8 352 353 354def _get_decimal128(data, view, position, dummy0, dummy1, dummy2): 355 """Decode a BSON decimal128 to bson.decimal128.Decimal128.""" 356 end = position + 16 357 return Decimal128.from_bid(data[position:end]), end 358 359 360# Each decoder function's signature is: 361# - data: bytes 362# - view: memoryview that references `data` 363# - position: int, beginning of object in 'data' to decode 364# - obj_end: int, end of object to decode in 'data' if variable-length type 365# - opts: a CodecOptions 366_ELEMENT_GETTER = { 367 ord(BSONNUM): _get_float, 368 ord(BSONSTR): _get_string, 369 ord(BSONOBJ): _get_object, 370 ord(BSONARR): _get_array, 371 ord(BSONBIN): _get_binary, 372 ord(BSONUND): lambda u, v, w, x, y, z: (None, w), # Deprecated undefined 373 ord(BSONOID): _get_oid, 374 ord(BSONBOO): _get_boolean, 375 ord(BSONDAT): _get_date, 376 ord(BSONNUL): lambda u, v, w, x, y, z: (None, w), 377 ord(BSONRGX): _get_regex, 378 ord(BSONREF): _get_ref, # Deprecated DBPointer 379 ord(BSONCOD): _get_code, 380 ord(BSONSYM): _get_string, # Deprecated symbol 381 ord(BSONCWS): _get_code_w_scope, 382 ord(BSONINT): _get_int, 383 ord(BSONTIM): _get_timestamp, 384 ord(BSONLON): _get_int64, 385 ord(BSONDEC): _get_decimal128, 386 ord(BSONMIN): lambda u, v, w, x, y, z: (MinKey(), w), 387 ord(BSONMAX): lambda u, v, w, x, y, z: (MaxKey(), w), 388} 389 390 391if _USE_C: 392 393 def _element_to_dict(data, view, position, obj_end, opts): 394 return _cbson._element_to_dict(data, position, obj_end, opts) 395 396else: 397 398 def _element_to_dict(data, view, position, obj_end, opts): 399 """Decode a single key, value pair.""" 400 element_type = data[position] 401 position += 1 402 element_name, position = _get_c_string(data, view, position, opts) 403 try: 404 value, position = _ELEMENT_GETTER[element_type]( 405 data, view, position, obj_end, opts, element_name 406 ) 407 except KeyError: 408 _raise_unknown_type(element_type, element_name) 409 410 if opts.type_registry._decoder_map: 411 custom_decoder = opts.type_registry._decoder_map.get(type(value)) 412 if custom_decoder is not None: 413 value = custom_decoder(value) 414 415 return element_name, value, position 416 417 418def _raw_to_dict(data, position, obj_end, opts, result): 419 data, view = get_data_and_view(data) 420 return _elements_to_dict(data, view, position, obj_end, opts, result) 421 422 423def _elements_to_dict(data, view, position, obj_end, opts, result=None): 424 """Decode a BSON document into result.""" 425 if result is None: 426 result = opts.document_class() 427 end = obj_end - 1 428 while position < end: 429 key, value, position = _element_to_dict(data, view, position, obj_end, opts) 430 result[key] = value 431 if position != obj_end: 432 raise InvalidBSON("bad object or element length") 433 return result 434 435 436def _bson_to_dict(data, opts): 437 """Decode a BSON string to document_class.""" 438 data, view = get_data_and_view(data) 439 try: 440 if _raw_document_class(opts.document_class): 441 return opts.document_class(data, opts) 442 _, end = _get_object_size(data, 0, len(data)) 443 return _elements_to_dict(data, view, 4, end, opts) 444 except InvalidBSON: 445 raise 446 except Exception: 447 # Change exception type to InvalidBSON but preserve traceback. 448 _, exc_value, exc_tb = sys.exc_info() 449 raise InvalidBSON(str(exc_value)).with_traceback(exc_tb) 450 451 452if _USE_C: 453 _bson_to_dict = _cbson._bson_to_dict 454 455 456_PACK_FLOAT = struct.Struct("<d").pack 457_PACK_INT = struct.Struct("<i").pack 458_PACK_LENGTH_SUBTYPE = struct.Struct("<iB").pack 459_PACK_LONG = struct.Struct("<q").pack 460_PACK_TIMESTAMP = struct.Struct("<II").pack 461_LIST_NAMES = tuple((str(i) + "\x00").encode("utf8") for i in range(1000)) 462 463 464def gen_list_name(): 465 """Generate "keys" for encoded lists in the sequence 466 b"0\x00", b"1\x00", b"2\x00", ... 467 468 The first 1000 keys are returned from a pre-built cache. All 469 subsequent keys are generated on the fly. 470 """ 471 for name in _LIST_NAMES: 472 yield name 473 474 counter = itertools.count(1000) 475 while True: 476 yield (str(next(counter)) + "\x00").encode("utf8") 477 478 479def _make_c_string_check(string): 480 """Make a 'C' string, checking for embedded NUL characters.""" 481 if isinstance(string, bytes): 482 if b"\x00" in string: 483 raise InvalidDocument("BSON keys / regex patterns must not " "contain a NUL character") 484 try: 485 _utf_8_decode(string, None, True) 486 return string + b"\x00" 487 except UnicodeError: 488 raise InvalidStringData("strings in documents must be valid " "UTF-8: %r" % string) 489 else: 490 if "\x00" in string: 491 raise InvalidDocument("BSON keys / regex patterns must not " "contain a NUL character") 492 return _utf_8_encode(string)[0] + b"\x00" 493 494 495def _make_c_string(string): 496 """Make a 'C' string.""" 497 if isinstance(string, bytes): 498 try: 499 _utf_8_decode(string, None, True) 500 return string + b"\x00" 501 except UnicodeError: 502 raise InvalidStringData("strings in documents must be valid " "UTF-8: %r" % string) 503 else: 504 return _utf_8_encode(string)[0] + b"\x00" 505 506 507def _make_name(string): 508 """Make a 'C' string suitable for a BSON key.""" 509 # Keys can only be text in python 3. 510 if "\x00" in string: 511 raise InvalidDocument("BSON keys / regex patterns must not " "contain a NUL character") 512 return _utf_8_encode(string)[0] + b"\x00" 513 514 515def _encode_float(name, value, dummy0, dummy1): 516 """Encode a float.""" 517 return b"\x01" + name + _PACK_FLOAT(value) 518 519 520def _encode_bytes(name, value, dummy0, dummy1): 521 """Encode a python bytes.""" 522 # Python3 special case. Store 'bytes' as BSON binary subtype 0. 523 return b"\x05" + name + _PACK_INT(len(value)) + b"\x00" + value 524 525 526def _encode_mapping(name, value, check_keys, opts): 527 """Encode a mapping type.""" 528 if _raw_document_class(value): 529 return b"\x03" + name + value.raw 530 data = b"".join([_element_to_bson(key, val, check_keys, opts) for key, val in value.items()]) 531 return b"\x03" + name + _PACK_INT(len(data) + 5) + data + b"\x00" 532 533 534def _encode_dbref(name, value, check_keys, opts): 535 """Encode bson.dbref.DBRef.""" 536 buf = bytearray(b"\x03" + name + b"\x00\x00\x00\x00") 537 begin = len(buf) - 4 538 539 buf += _name_value_to_bson(b"$ref\x00", value.collection, check_keys, opts) 540 buf += _name_value_to_bson(b"$id\x00", value.id, check_keys, opts) 541 if value.database is not None: 542 buf += _name_value_to_bson(b"$db\x00", value.database, check_keys, opts) 543 for key, val in value._DBRef__kwargs.items(): 544 buf += _element_to_bson(key, val, check_keys, opts) 545 546 buf += b"\x00" 547 buf[begin : begin + 4] = _PACK_INT(len(buf) - begin) 548 return bytes(buf) 549 550 551def _encode_list(name, value, check_keys, opts): 552 """Encode a list/tuple.""" 553 lname = gen_list_name() 554 data = b"".join([_name_value_to_bson(next(lname), item, check_keys, opts) for item in value]) 555 return b"\x04" + name + _PACK_INT(len(data) + 5) + data + b"\x00" 556 557 558def _encode_text(name, value, dummy0, dummy1): 559 """Encode a python str.""" 560 value = _utf_8_encode(value)[0] 561 return b"\x02" + name + _PACK_INT(len(value) + 1) + value + b"\x00" 562 563 564def _encode_binary(name, value, dummy0, dummy1): 565 """Encode bson.binary.Binary.""" 566 subtype = value.subtype 567 if subtype == 2: 568 value = _PACK_INT(len(value)) + value 569 return b"\x05" + name + _PACK_LENGTH_SUBTYPE(len(value), subtype) + value 570 571 572def _encode_uuid(name, value, dummy, opts): 573 """Encode uuid.UUID.""" 574 uuid_representation = opts.uuid_representation 575 binval = Binary.from_uuid(value, uuid_representation=uuid_representation) 576 return _encode_binary(name, binval, dummy, opts) 577 578 579def _encode_objectid(name, value, dummy0, dummy1): 580 """Encode bson.objectid.ObjectId.""" 581 return b"\x07" + name + value.binary 582 583 584def _encode_bool(name, value, dummy0, dummy1): 585 """Encode a python boolean (True/False).""" 586 return b"\x08" + name + (value and b"\x01" or b"\x00") 587 588 589def _encode_datetime(name, value, dummy0, dummy1): 590 """Encode datetime.datetime.""" 591 millis = _datetime_to_millis(value) 592 return b"\x09" + name + _PACK_LONG(millis) 593 594 595def _encode_none(name, dummy0, dummy1, dummy2): 596 """Encode python None.""" 597 return b"\x0A" + name 598 599 600def _encode_regex(name, value, dummy0, dummy1): 601 """Encode a python regex or bson.regex.Regex.""" 602 flags = value.flags 603 # Python 3 common case 604 if flags == re.UNICODE: 605 return b"\x0B" + name + _make_c_string_check(value.pattern) + b"u\x00" 606 elif flags == 0: 607 return b"\x0B" + name + _make_c_string_check(value.pattern) + b"\x00" 608 else: 609 sflags = b"" 610 if flags & re.IGNORECASE: 611 sflags += b"i" 612 if flags & re.LOCALE: 613 sflags += b"l" 614 if flags & re.MULTILINE: 615 sflags += b"m" 616 if flags & re.DOTALL: 617 sflags += b"s" 618 if flags & re.UNICODE: 619 sflags += b"u" 620 if flags & re.VERBOSE: 621 sflags += b"x" 622 sflags += b"\x00" 623 return b"\x0B" + name + _make_c_string_check(value.pattern) + sflags 624 625 626def _encode_code(name, value, dummy, opts): 627 """Encode bson.code.Code.""" 628 cstring = _make_c_string(value) 629 cstrlen = len(cstring) 630 if value.scope is None: 631 return b"\x0D" + name + _PACK_INT(cstrlen) + cstring 632 scope = _dict_to_bson(value.scope, False, opts, False) 633 full_length = _PACK_INT(8 + cstrlen + len(scope)) 634 return b"\x0F" + name + full_length + _PACK_INT(cstrlen) + cstring + scope 635 636 637def _encode_int(name, value, dummy0, dummy1): 638 """Encode a python int.""" 639 if -2147483648 <= value <= 2147483647: 640 return b"\x10" + name + _PACK_INT(value) 641 else: 642 try: 643 return b"\x12" + name + _PACK_LONG(value) 644 except struct.error: 645 raise OverflowError("BSON can only handle up to 8-byte ints") 646 647 648def _encode_timestamp(name, value, dummy0, dummy1): 649 """Encode bson.timestamp.Timestamp.""" 650 return b"\x11" + name + _PACK_TIMESTAMP(value.inc, value.time) 651 652 653def _encode_long(name, value, dummy0, dummy1): 654 """Encode a python long (python 2.x)""" 655 try: 656 return b"\x12" + name + _PACK_LONG(value) 657 except struct.error: 658 raise OverflowError("BSON can only handle up to 8-byte ints") 659 660 661def _encode_decimal128(name, value, dummy0, dummy1): 662 """Encode bson.decimal128.Decimal128.""" 663 return b"\x13" + name + value.bid 664 665 666def _encode_minkey(name, dummy0, dummy1, dummy2): 667 """Encode bson.min_key.MinKey.""" 668 return b"\xFF" + name 669 670 671def _encode_maxkey(name, dummy0, dummy1, dummy2): 672 """Encode bson.max_key.MaxKey.""" 673 return b"\x7F" + name 674 675 676# Each encoder function's signature is: 677# - name: utf-8 bytes 678# - value: a Python data type, e.g. a Python int for _encode_int 679# - check_keys: bool, whether to check for invalid names 680# - opts: a CodecOptions 681_ENCODERS = { 682 bool: _encode_bool, 683 bytes: _encode_bytes, 684 datetime.datetime: _encode_datetime, 685 dict: _encode_mapping, 686 float: _encode_float, 687 int: _encode_int, 688 list: _encode_list, 689 str: _encode_text, 690 tuple: _encode_list, 691 type(None): _encode_none, 692 uuid.UUID: _encode_uuid, 693 Binary: _encode_binary, 694 Int64: _encode_long, 695 Code: _encode_code, 696 DBRef: _encode_dbref, 697 MaxKey: _encode_maxkey, 698 MinKey: _encode_minkey, 699 ObjectId: _encode_objectid, 700 Regex: _encode_regex, 701 RE_TYPE: _encode_regex, 702 SON: _encode_mapping, 703 Timestamp: _encode_timestamp, 704 Decimal128: _encode_decimal128, 705 # Special case. This will never be looked up directly. 706 _abc.Mapping: _encode_mapping, 707} 708 709 710_MARKERS = { 711 5: _encode_binary, 712 7: _encode_objectid, 713 11: _encode_regex, 714 13: _encode_code, 715 17: _encode_timestamp, 716 18: _encode_long, 717 100: _encode_dbref, 718 127: _encode_maxkey, 719 255: _encode_minkey, 720} 721 722 723_BUILT_IN_TYPES = tuple(t for t in _ENCODERS) 724 725 726def _name_value_to_bson( 727 name, value, check_keys, opts, in_custom_call=False, in_fallback_call=False 728): 729 """Encode a single name, value pair.""" 730 # First see if the type is already cached. KeyError will only ever 731 # happen once per subtype. 732 try: 733 return _ENCODERS[type(value)](name, value, check_keys, opts) 734 except KeyError: 735 pass 736 737 # Second, fall back to trying _type_marker. This has to be done 738 # before the loop below since users could subclass one of our 739 # custom types that subclasses a python built-in (e.g. Binary) 740 marker = getattr(value, "_type_marker", None) 741 if isinstance(marker, int) and marker in _MARKERS: 742 func = _MARKERS[marker] 743 # Cache this type for faster subsequent lookup. 744 _ENCODERS[type(value)] = func 745 return func(name, value, check_keys, opts) 746 747 # Third, check if a type encoder is registered for this type. 748 # Note that subtypes of registered custom types are not auto-encoded. 749 if not in_custom_call and opts.type_registry._encoder_map: 750 custom_encoder = opts.type_registry._encoder_map.get(type(value)) 751 if custom_encoder is not None: 752 return _name_value_to_bson( 753 name, custom_encoder(value), check_keys, opts, in_custom_call=True 754 ) 755 756 # Fourth, test each base type. This will only happen once for 757 # a subtype of a supported base type. Unlike in the C-extensions, this 758 # is done after trying the custom type encoder because checking for each 759 # subtype is expensive. 760 for base in _BUILT_IN_TYPES: 761 if isinstance(value, base): 762 func = _ENCODERS[base] 763 # Cache this type for faster subsequent lookup. 764 _ENCODERS[type(value)] = func 765 return func(name, value, check_keys, opts) 766 767 # As a last resort, try using the fallback encoder, if the user has 768 # provided one. 769 fallback_encoder = opts.type_registry._fallback_encoder 770 if not in_fallback_call and fallback_encoder is not None: 771 return _name_value_to_bson( 772 name, fallback_encoder(value), check_keys, opts, in_fallback_call=True 773 ) 774 775 raise InvalidDocument("cannot encode object: %r, of type: %r" % (value, type(value))) 776 777 778def _element_to_bson(key, value, check_keys, opts): 779 """Encode a single key, value pair.""" 780 if not isinstance(key, str): 781 raise InvalidDocument("documents must have only string keys, " "key was %r" % (key,)) 782 if check_keys: 783 if key.startswith("$"): 784 raise InvalidDocument("key %r must not start with '$'" % (key,)) 785 if "." in key: 786 raise InvalidDocument("key %r must not contain '.'" % (key,)) 787 788 name = _make_name(key) 789 return _name_value_to_bson(name, value, check_keys, opts) 790 791 792def _dict_to_bson(doc, check_keys, opts, top_level=True): 793 """Encode a document to BSON.""" 794 if _raw_document_class(doc): 795 return doc.raw 796 try: 797 elements = [] 798 if top_level and "_id" in doc: 799 elements.append(_name_value_to_bson(b"_id\x00", doc["_id"], check_keys, opts)) 800 for key, value in doc.items(): 801 if not top_level or key != "_id": 802 elements.append(_element_to_bson(key, value, check_keys, opts)) 803 except AttributeError: 804 raise TypeError("encoder expected a mapping type but got: %r" % (doc,)) 805 806 encoded = b"".join(elements) 807 return _PACK_INT(len(encoded) + 5) + encoded + b"\x00" 808 809 810if _USE_C: 811 _dict_to_bson = _cbson._dict_to_bson 812 813 814def _millis_to_datetime(millis, opts): 815 """Convert milliseconds since epoch UTC to datetime.""" 816 diff = ((millis % 1000) + 1000) % 1000 817 seconds = (millis - diff) // 1000 818 micros = diff * 1000 819 if opts.tz_aware: 820 dt = EPOCH_AWARE + datetime.timedelta(seconds=seconds, microseconds=micros) 821 if opts.tzinfo: 822 dt = dt.astimezone(opts.tzinfo) 823 return dt 824 else: 825 return EPOCH_NAIVE + datetime.timedelta(seconds=seconds, microseconds=micros) 826 827 828def _datetime_to_millis(dtm): 829 """Convert datetime to milliseconds since epoch UTC.""" 830 if dtm.utcoffset() is not None: 831 dtm = dtm - dtm.utcoffset() 832 return int(calendar.timegm(dtm.timetuple()) * 1000 + dtm.microsecond // 1000) 833 834 835_CODEC_OPTIONS_TYPE_ERROR = TypeError("codec_options must be an instance of CodecOptions") 836 837 838def encode(document, check_keys=False, codec_options=DEFAULT_CODEC_OPTIONS): 839 """Encode a document to BSON. 840 841 A document can be any mapping type (like :class:`dict`). 842 843 Raises :class:`TypeError` if `document` is not a mapping type, 844 or contains keys that are not instances of 845 :class:`basestring` (:class:`str` in python 3). Raises 846 :class:`~bson.errors.InvalidDocument` if `document` cannot be 847 converted to :class:`BSON`. 848 849 :Parameters: 850 - `document`: mapping type representing a document 851 - `check_keys` (optional): check if keys start with '$' or 852 contain '.', raising :class:`~bson.errors.InvalidDocument` in 853 either case 854 - `codec_options` (optional): An instance of 855 :class:`~bson.codec_options.CodecOptions`. 856 857 .. versionadded:: 3.9 858 """ 859 if not isinstance(codec_options, CodecOptions): 860 raise _CODEC_OPTIONS_TYPE_ERROR 861 862 return _dict_to_bson(document, check_keys, codec_options) 863 864 865def decode(data, codec_options=DEFAULT_CODEC_OPTIONS): 866 """Decode BSON to a document. 867 868 By default, returns a BSON document represented as a Python 869 :class:`dict`. To use a different :class:`MutableMapping` class, 870 configure a :class:`~bson.codec_options.CodecOptions`:: 871 872 >>> import collections # From Python standard library. 873 >>> import bson 874 >>> from .codec_options import CodecOptions 875 >>> data = bson.encode({'a': 1}) 876 >>> decoded_doc = bson.decode(data) 877 <type 'dict'> 878 >>> options = CodecOptions(document_class=collections.OrderedDict) 879 >>> decoded_doc = bson.decode(data, codec_options=options) 880 >>> type(decoded_doc) 881 <class 'collections.OrderedDict'> 882 883 :Parameters: 884 - `data`: the BSON to decode. Any bytes-like object that implements 885 the buffer protocol. 886 - `codec_options` (optional): An instance of 887 :class:`~bson.codec_options.CodecOptions`. 888 889 .. versionadded:: 3.9 890 """ 891 if not isinstance(codec_options, CodecOptions): 892 raise _CODEC_OPTIONS_TYPE_ERROR 893 894 return _bson_to_dict(data, codec_options) 895 896 897def decode_all(data, codec_options=DEFAULT_CODEC_OPTIONS): 898 """Decode BSON data to multiple documents. 899 900 `data` must be a bytes-like object implementing the buffer protocol that 901 provides concatenated, valid, BSON-encoded documents. 902 903 :Parameters: 904 - `data`: BSON data 905 - `codec_options` (optional): An instance of 906 :class:`~bson.codec_options.CodecOptions`. 907 908 .. versionchanged:: 3.9 909 Supports bytes-like objects that implement the buffer protocol. 910 911 .. versionchanged:: 3.0 912 Removed `compile_re` option: PyMongo now always represents BSON regular 913 expressions as :class:`~bson.regex.Regex` objects. Use 914 :meth:`~bson.regex.Regex.try_compile` to attempt to convert from a 915 BSON regular expression to a Python regular expression object. 916 917 Replaced `as_class`, `tz_aware`, and `uuid_subtype` options with 918 `codec_options`. 919 """ 920 data, view = get_data_and_view(data) 921 if not isinstance(codec_options, CodecOptions): 922 raise _CODEC_OPTIONS_TYPE_ERROR 923 924 data_len = len(data) 925 docs = [] 926 position = 0 927 end = data_len - 1 928 use_raw = _raw_document_class(codec_options.document_class) 929 try: 930 while position < end: 931 obj_size = _UNPACK_INT_FROM(data, position)[0] 932 if data_len - position < obj_size: 933 raise InvalidBSON("invalid object size") 934 obj_end = position + obj_size - 1 935 if data[obj_end] != 0: 936 raise InvalidBSON("bad eoo") 937 if use_raw: 938 docs.append( 939 codec_options.document_class(data[position : obj_end + 1], codec_options) 940 ) 941 else: 942 docs.append(_elements_to_dict(data, view, position + 4, obj_end, codec_options)) 943 position += obj_size 944 return docs 945 except InvalidBSON: 946 raise 947 except Exception: 948 # Change exception type to InvalidBSON but preserve traceback. 949 _, exc_value, exc_tb = sys.exc_info() 950 raise InvalidBSON(str(exc_value)).with_traceback(exc_tb) 951 952 953if _USE_C: 954 decode_all = _cbson.decode_all 955 956 957def _decode_selective(rawdoc, fields, codec_options): 958 if _raw_document_class(codec_options.document_class): 959 # If document_class is RawBSONDocument, use vanilla dictionary for 960 # decoding command response. 961 doc = {} 962 else: 963 # Else, use the specified document_class. 964 doc = codec_options.document_class() 965 for key, value in rawdoc.items(): 966 if key in fields: 967 if fields[key] == 1: 968 doc[key] = _bson_to_dict(rawdoc.raw, codec_options)[key] 969 else: 970 doc[key] = _decode_selective(value, fields[key], codec_options) 971 else: 972 doc[key] = value 973 return doc 974 975 976def _convert_raw_document_lists_to_streams(document): 977 cursor = document.get("cursor") 978 if cursor: 979 for key in ("firstBatch", "nextBatch"): 980 batch = cursor.get(key) 981 if batch: 982 stream = b"".join(doc.raw for doc in batch) 983 cursor[key] = [stream] 984 985 986def _decode_all_selective(data, codec_options, fields): 987 """Decode BSON data to a single document while using user-provided 988 custom decoding logic. 989 990 `data` must be a string representing a valid, BSON-encoded document. 991 992 :Parameters: 993 - `data`: BSON data 994 - `codec_options`: An instance of 995 :class:`~bson.codec_options.CodecOptions` with user-specified type 996 decoders. If no decoders are found, this method is the same as 997 ``decode_all``. 998 - `fields`: Map of document namespaces where data that needs 999 to be custom decoded lives or None. For example, to custom decode a 1000 list of objects in 'field1.subfield1', the specified value should be 1001 ``{'field1': {'subfield1': 1}}``. If ``fields`` is an empty map or 1002 None, this method is the same as ``decode_all``. 1003 1004 :Returns: 1005 - `document_list`: Single-member list containing the decoded document. 1006 1007 .. versionadded:: 3.8 1008 """ 1009 if not codec_options.type_registry._decoder_map: 1010 return decode_all(data, codec_options) 1011 1012 if not fields: 1013 return decode_all(data, codec_options.with_options(type_registry=None)) 1014 1015 # Decode documents for internal use. 1016 from .raw_bson import RawBSONDocument 1017 1018 internal_codec_options = codec_options.with_options( 1019 document_class=RawBSONDocument, type_registry=None 1020 ) 1021 _doc = _bson_to_dict(data, internal_codec_options) 1022 return [ 1023 _decode_selective( 1024 _doc, 1025 fields, 1026 codec_options, 1027 ) 1028 ] 1029 1030 1031def decode_iter(data, codec_options=DEFAULT_CODEC_OPTIONS): 1032 """Decode BSON data to multiple documents as a generator. 1033 1034 Works similarly to the decode_all function, but yields one document at a 1035 time. 1036 1037 `data` must be a string of concatenated, valid, BSON-encoded 1038 documents. 1039 1040 :Parameters: 1041 - `data`: BSON data 1042 - `codec_options` (optional): An instance of 1043 :class:`~bson.codec_options.CodecOptions`. 1044 1045 .. versionchanged:: 3.0 1046 Replaced `as_class`, `tz_aware`, and `uuid_subtype` options with 1047 `codec_options`. 1048 1049 .. versionadded:: 2.8 1050 """ 1051 if not isinstance(codec_options, CodecOptions): 1052 raise _CODEC_OPTIONS_TYPE_ERROR 1053 1054 position = 0 1055 end = len(data) - 1 1056 while position < end: 1057 obj_size = _UNPACK_INT_FROM(data, position)[0] 1058 elements = data[position : position + obj_size] 1059 position += obj_size 1060 1061 yield _bson_to_dict(elements, codec_options) 1062 1063 1064def decode_file_iter(file_obj, codec_options=DEFAULT_CODEC_OPTIONS): 1065 """Decode bson data from a file to multiple documents as a generator. 1066 1067 Works similarly to the decode_all function, but reads from the file object 1068 in chunks and parses bson in chunks, yielding one document at a time. 1069 1070 :Parameters: 1071 - `file_obj`: A file object containing BSON data. 1072 - `codec_options` (optional): An instance of 1073 :class:`~bson.codec_options.CodecOptions`. 1074 1075 .. versionchanged:: 3.0 1076 Replaced `as_class`, `tz_aware`, and `uuid_subtype` options with 1077 `codec_options`. 1078 1079 .. versionadded:: 2.8 1080 """ 1081 while True: 1082 # Read size of next object. 1083 size_data = file_obj.read(4) 1084 if not size_data: 1085 break # Finished with file normaly. 1086 elif len(size_data) != 4: 1087 raise InvalidBSON("cut off in middle of objsize") 1088 obj_size = _UNPACK_INT_FROM(size_data, 0)[0] - 4 1089 elements = size_data + file_obj.read(max(0, obj_size)) 1090 yield _bson_to_dict(elements, codec_options) 1091 1092 1093def is_valid(bson): 1094 """Check that the given string represents valid :class:`BSON` data. 1095 1096 Raises :class:`TypeError` if `bson` is not an instance of 1097 :class:`str` (:class:`bytes` in python 3). Returns ``True`` 1098 if `bson` is valid :class:`BSON`, ``False`` otherwise. 1099 1100 :Parameters: 1101 - `bson`: the data to be validated 1102 """ 1103 if not isinstance(bson, bytes): 1104 raise TypeError("BSON data must be an instance of a subclass of bytes") 1105 1106 try: 1107 _bson_to_dict(bson, DEFAULT_CODEC_OPTIONS) 1108 return True 1109 except Exception: 1110 return False 1111 1112 1113class BSON(bytes): 1114 """BSON (Binary JSON) data. 1115 1116 .. warning:: Using this class to encode and decode BSON adds a performance 1117 cost. For better performance use the module level functions 1118 :func:`encode` and :func:`decode` instead. 1119 """ 1120 1121 @classmethod 1122 def encode(cls, document, check_keys=False, codec_options=DEFAULT_CODEC_OPTIONS): 1123 """Encode a document to a new :class:`BSON` instance. 1124 1125 A document can be any mapping type (like :class:`dict`). 1126 1127 Raises :class:`TypeError` if `document` is not a mapping type, 1128 or contains keys that are not instances of 1129 :class:`basestring` (:class:`str` in python 3). Raises 1130 :class:`~bson.errors.InvalidDocument` if `document` cannot be 1131 converted to :class:`BSON`. 1132 1133 :Parameters: 1134 - `document`: mapping type representing a document 1135 - `check_keys` (optional): check if keys start with '$' or 1136 contain '.', raising :class:`~bson.errors.InvalidDocument` in 1137 either case 1138 - `codec_options` (optional): An instance of 1139 :class:`~bson.codec_options.CodecOptions`. 1140 1141 .. versionchanged:: 3.0 1142 Replaced `uuid_subtype` option with `codec_options`. 1143 """ 1144 return cls(encode(document, check_keys, codec_options)) 1145 1146 def decode(self, codec_options=DEFAULT_CODEC_OPTIONS): 1147 """Decode this BSON data. 1148 1149 By default, returns a BSON document represented as a Python 1150 :class:`dict`. To use a different :class:`MutableMapping` class, 1151 configure a :class:`~bson.codec_options.CodecOptions`:: 1152 1153 >>> import collections # From Python standard library. 1154 >>> import bson 1155 >>> from .codec_options import CodecOptions 1156 >>> data = bson.BSON.encode({'a': 1}) 1157 >>> decoded_doc = bson.BSON(data).decode() 1158 <type 'dict'> 1159 >>> options = CodecOptions(document_class=collections.OrderedDict) 1160 >>> decoded_doc = bson.BSON(data).decode(codec_options=options) 1161 >>> type(decoded_doc) 1162 <class 'collections.OrderedDict'> 1163 1164 :Parameters: 1165 - `codec_options` (optional): An instance of 1166 :class:`~bson.codec_options.CodecOptions`. 1167 1168 .. versionchanged:: 3.0 1169 Removed `compile_re` option: PyMongo now always represents BSON 1170 regular expressions as :class:`~bson.regex.Regex` objects. Use 1171 :meth:`~bson.regex.Regex.try_compile` to attempt to convert from a 1172 BSON regular expression to a Python regular expression object. 1173 1174 Replaced `as_class`, `tz_aware`, and `uuid_subtype` options with 1175 `codec_options`. 1176 """ 1177 return decode(self, codec_options) 1178 1179 1180def has_c(): 1181 """Is the C extension installed?""" 1182 return _USE_C
465def gen_list_name(): 466 """Generate "keys" for encoded lists in the sequence 467 b"0\x00", b"1\x00", b"2\x00", ... 468 469 The first 1000 keys are returned from a pre-built cache. All 470 subsequent keys are generated on the fly. 471 """ 472 for name in _LIST_NAMES: 473 yield name 474 475 counter = itertools.count(1000) 476 while True: 477 yield (str(next(counter)) + "\x00").encode("utf8")
Generate "keys" for encoded lists in the sequence b"0 ", b"1 ", b"2 ", ...
The first 1000 keys are returned from a pre-built cache. All subsequent keys are generated on the fly.
839def encode(document, check_keys=False, codec_options=DEFAULT_CODEC_OPTIONS): 840 """Encode a document to BSON. 841 842 A document can be any mapping type (like :class:`dict`). 843 844 Raises :class:`TypeError` if `document` is not a mapping type, 845 or contains keys that are not instances of 846 :class:`basestring` (:class:`str` in python 3). Raises 847 :class:`~bson.errors.InvalidDocument` if `document` cannot be 848 converted to :class:`BSON`. 849 850 :Parameters: 851 - `document`: mapping type representing a document 852 - `check_keys` (optional): check if keys start with '$' or 853 contain '.', raising :class:`~bson.errors.InvalidDocument` in 854 either case 855 - `codec_options` (optional): An instance of 856 :class:`~bson.codec_options.CodecOptions`. 857 858 .. versionadded:: 3.9 859 """ 860 if not isinstance(codec_options, CodecOptions): 861 raise _CODEC_OPTIONS_TYPE_ERROR 862 863 return _dict_to_bson(document, check_keys, codec_options)
Encode a document to BSON.
A document can be any mapping type (like dict
).
Raises TypeError
if document
is not a mapping type,
or contains keys that are not instances of
basestring
(str
in python 3). Raises
~bson.errors.InvalidDocument
if document
cannot be
converted to BSON
.
:Parameters:
document
: mapping type representing a documentcheck_keys
(optional): check if keys start with '$' or contain '.', raising~bson.errors.InvalidDocument
in either casecodec_options
(optional): An instance of~bson.codec_options.CodecOptions
.
New in version 3.9.
866def decode(data, codec_options=DEFAULT_CODEC_OPTIONS): 867 """Decode BSON to a document. 868 869 By default, returns a BSON document represented as a Python 870 :class:`dict`. To use a different :class:`MutableMapping` class, 871 configure a :class:`~bson.codec_options.CodecOptions`:: 872 873 >>> import collections # From Python standard library. 874 >>> import bson 875 >>> from .codec_options import CodecOptions 876 >>> data = bson.encode({'a': 1}) 877 >>> decoded_doc = bson.decode(data) 878 <type 'dict'> 879 >>> options = CodecOptions(document_class=collections.OrderedDict) 880 >>> decoded_doc = bson.decode(data, codec_options=options) 881 >>> type(decoded_doc) 882 <class 'collections.OrderedDict'> 883 884 :Parameters: 885 - `data`: the BSON to decode. Any bytes-like object that implements 886 the buffer protocol. 887 - `codec_options` (optional): An instance of 888 :class:`~bson.codec_options.CodecOptions`. 889 890 .. versionadded:: 3.9 891 """ 892 if not isinstance(codec_options, CodecOptions): 893 raise _CODEC_OPTIONS_TYPE_ERROR 894 895 return _bson_to_dict(data, codec_options)
Decode BSON to a document.
By default, returns a BSON document represented as a Python
dict
. To use a different MutableMapping
class,
configure a ~bson.codec_options.CodecOptions
::
>>> import collections # From Python standard library.
>>> import bson
>>> from .codec_options import CodecOptions
>>> data = bson.encode({'a': 1})
>>> decoded_doc = bson.decode(data)
<type 'dict'>
>>> options = CodecOptions(document_class=collections.OrderedDict)
>>> decoded_doc = bson.decode(data, codec_options=options)
>>> type(decoded_doc)
<class 'collections.OrderedDict'>
:Parameters:
data
: the BSON to decode. Any bytes-like object that implements the buffer protocol.codec_options
(optional): An instance of~bson.codec_options.CodecOptions
.
New in version 3.9.
898def decode_all(data, codec_options=DEFAULT_CODEC_OPTIONS): 899 """Decode BSON data to multiple documents. 900 901 `data` must be a bytes-like object implementing the buffer protocol that 902 provides concatenated, valid, BSON-encoded documents. 903 904 :Parameters: 905 - `data`: BSON data 906 - `codec_options` (optional): An instance of 907 :class:`~bson.codec_options.CodecOptions`. 908 909 .. versionchanged:: 3.9 910 Supports bytes-like objects that implement the buffer protocol. 911 912 .. versionchanged:: 3.0 913 Removed `compile_re` option: PyMongo now always represents BSON regular 914 expressions as :class:`~bson.regex.Regex` objects. Use 915 :meth:`~bson.regex.Regex.try_compile` to attempt to convert from a 916 BSON regular expression to a Python regular expression object. 917 918 Replaced `as_class`, `tz_aware`, and `uuid_subtype` options with 919 `codec_options`. 920 """ 921 data, view = get_data_and_view(data) 922 if not isinstance(codec_options, CodecOptions): 923 raise _CODEC_OPTIONS_TYPE_ERROR 924 925 data_len = len(data) 926 docs = [] 927 position = 0 928 end = data_len - 1 929 use_raw = _raw_document_class(codec_options.document_class) 930 try: 931 while position < end: 932 obj_size = _UNPACK_INT_FROM(data, position)[0] 933 if data_len - position < obj_size: 934 raise InvalidBSON("invalid object size") 935 obj_end = position + obj_size - 1 936 if data[obj_end] != 0: 937 raise InvalidBSON("bad eoo") 938 if use_raw: 939 docs.append( 940 codec_options.document_class(data[position : obj_end + 1], codec_options) 941 ) 942 else: 943 docs.append(_elements_to_dict(data, view, position + 4, obj_end, codec_options)) 944 position += obj_size 945 return docs 946 except InvalidBSON: 947 raise 948 except Exception: 949 # Change exception type to InvalidBSON but preserve traceback. 950 _, exc_value, exc_tb = sys.exc_info() 951 raise InvalidBSON(str(exc_value)).with_traceback(exc_tb)
Decode BSON data to multiple documents.
data
must be a bytes-like object implementing the buffer protocol that
provides concatenated, valid, BSON-encoded documents.
:Parameters:
data
: BSON datacodec_options
(optional): An instance of~bson.codec_options.CodecOptions
.
Changed in version 3.9: Supports bytes-like objects that implement the buffer protocol.
Changed in version 3.0:
Removed compile_re
option: PyMongo now always represents BSON regular
expressions as ~bson.regex.Regex
objects. Use
~bson.regex.Regex.try_compile()
to attempt to convert from a
BSON regular expression to a Python regular expression object.
Replaced as_class
, tz_aware
, and uuid_subtype
options with
codec_options
.
1032def decode_iter(data, codec_options=DEFAULT_CODEC_OPTIONS): 1033 """Decode BSON data to multiple documents as a generator. 1034 1035 Works similarly to the decode_all function, but yields one document at a 1036 time. 1037 1038 `data` must be a string of concatenated, valid, BSON-encoded 1039 documents. 1040 1041 :Parameters: 1042 - `data`: BSON data 1043 - `codec_options` (optional): An instance of 1044 :class:`~bson.codec_options.CodecOptions`. 1045 1046 .. versionchanged:: 3.0 1047 Replaced `as_class`, `tz_aware`, and `uuid_subtype` options with 1048 `codec_options`. 1049 1050 .. versionadded:: 2.8 1051 """ 1052 if not isinstance(codec_options, CodecOptions): 1053 raise _CODEC_OPTIONS_TYPE_ERROR 1054 1055 position = 0 1056 end = len(data) - 1 1057 while position < end: 1058 obj_size = _UNPACK_INT_FROM(data, position)[0] 1059 elements = data[position : position + obj_size] 1060 position += obj_size 1061 1062 yield _bson_to_dict(elements, codec_options)
Decode BSON data to multiple documents as a generator.
Works similarly to the decode_all function, but yields one document at a time.
data
must be a string of concatenated, valid, BSON-encoded
documents.
:Parameters:
data
: BSON datacodec_options
(optional): An instance of~bson.codec_options.CodecOptions
.
Changed in version 3.0:
Replaced as_class
, tz_aware
, and uuid_subtype
options with
codec_options
.
New in version 2.8.
1065def decode_file_iter(file_obj, codec_options=DEFAULT_CODEC_OPTIONS): 1066 """Decode bson data from a file to multiple documents as a generator. 1067 1068 Works similarly to the decode_all function, but reads from the file object 1069 in chunks and parses bson in chunks, yielding one document at a time. 1070 1071 :Parameters: 1072 - `file_obj`: A file object containing BSON data. 1073 - `codec_options` (optional): An instance of 1074 :class:`~bson.codec_options.CodecOptions`. 1075 1076 .. versionchanged:: 3.0 1077 Replaced `as_class`, `tz_aware`, and `uuid_subtype` options with 1078 `codec_options`. 1079 1080 .. versionadded:: 2.8 1081 """ 1082 while True: 1083 # Read size of next object. 1084 size_data = file_obj.read(4) 1085 if not size_data: 1086 break # Finished with file normaly. 1087 elif len(size_data) != 4: 1088 raise InvalidBSON("cut off in middle of objsize") 1089 obj_size = _UNPACK_INT_FROM(size_data, 0)[0] - 4 1090 elements = size_data + file_obj.read(max(0, obj_size)) 1091 yield _bson_to_dict(elements, codec_options)
Decode bson data from a file to multiple documents as a generator.
Works similarly to the decode_all function, but reads from the file object in chunks and parses bson in chunks, yielding one document at a time.
:Parameters:
file_obj
: A file object containing BSON data.codec_options
(optional): An instance of~bson.codec_options.CodecOptions
.
Changed in version 3.0:
Replaced as_class
, tz_aware
, and uuid_subtype
options with
codec_options
.
New in version 2.8.
1094def is_valid(bson): 1095 """Check that the given string represents valid :class:`BSON` data. 1096 1097 Raises :class:`TypeError` if `bson` is not an instance of 1098 :class:`str` (:class:`bytes` in python 3). Returns ``True`` 1099 if `bson` is valid :class:`BSON`, ``False`` otherwise. 1100 1101 :Parameters: 1102 - `bson`: the data to be validated 1103 """ 1104 if not isinstance(bson, bytes): 1105 raise TypeError("BSON data must be an instance of a subclass of bytes") 1106 1107 try: 1108 _bson_to_dict(bson, DEFAULT_CODEC_OPTIONS) 1109 return True 1110 except Exception: 1111 return False
1114class BSON(bytes): 1115 """BSON (Binary JSON) data. 1116 1117 .. warning:: Using this class to encode and decode BSON adds a performance 1118 cost. For better performance use the module level functions 1119 :func:`encode` and :func:`decode` instead. 1120 """ 1121 1122 @classmethod 1123 def encode(cls, document, check_keys=False, codec_options=DEFAULT_CODEC_OPTIONS): 1124 """Encode a document to a new :class:`BSON` instance. 1125 1126 A document can be any mapping type (like :class:`dict`). 1127 1128 Raises :class:`TypeError` if `document` is not a mapping type, 1129 or contains keys that are not instances of 1130 :class:`basestring` (:class:`str` in python 3). Raises 1131 :class:`~bson.errors.InvalidDocument` if `document` cannot be 1132 converted to :class:`BSON`. 1133 1134 :Parameters: 1135 - `document`: mapping type representing a document 1136 - `check_keys` (optional): check if keys start with '$' or 1137 contain '.', raising :class:`~bson.errors.InvalidDocument` in 1138 either case 1139 - `codec_options` (optional): An instance of 1140 :class:`~bson.codec_options.CodecOptions`. 1141 1142 .. versionchanged:: 3.0 1143 Replaced `uuid_subtype` option with `codec_options`. 1144 """ 1145 return cls(encode(document, check_keys, codec_options)) 1146 1147 def decode(self, codec_options=DEFAULT_CODEC_OPTIONS): 1148 """Decode this BSON data. 1149 1150 By default, returns a BSON document represented as a Python 1151 :class:`dict`. To use a different :class:`MutableMapping` class, 1152 configure a :class:`~bson.codec_options.CodecOptions`:: 1153 1154 >>> import collections # From Python standard library. 1155 >>> import bson 1156 >>> from .codec_options import CodecOptions 1157 >>> data = bson.BSON.encode({'a': 1}) 1158 >>> decoded_doc = bson.BSON(data).decode() 1159 <type 'dict'> 1160 >>> options = CodecOptions(document_class=collections.OrderedDict) 1161 >>> decoded_doc = bson.BSON(data).decode(codec_options=options) 1162 >>> type(decoded_doc) 1163 <class 'collections.OrderedDict'> 1164 1165 :Parameters: 1166 - `codec_options` (optional): An instance of 1167 :class:`~bson.codec_options.CodecOptions`. 1168 1169 .. versionchanged:: 3.0 1170 Removed `compile_re` option: PyMongo now always represents BSON 1171 regular expressions as :class:`~bson.regex.Regex` objects. Use 1172 :meth:`~bson.regex.Regex.try_compile` to attempt to convert from a 1173 BSON regular expression to a Python regular expression object. 1174 1175 Replaced `as_class`, `tz_aware`, and `uuid_subtype` options with 1176 `codec_options`. 1177 """ 1178 return decode(self, codec_options)
BSON (Binary JSON) data.
1122 @classmethod 1123 def encode(cls, document, check_keys=False, codec_options=DEFAULT_CODEC_OPTIONS): 1124 """Encode a document to a new :class:`BSON` instance. 1125 1126 A document can be any mapping type (like :class:`dict`). 1127 1128 Raises :class:`TypeError` if `document` is not a mapping type, 1129 or contains keys that are not instances of 1130 :class:`basestring` (:class:`str` in python 3). Raises 1131 :class:`~bson.errors.InvalidDocument` if `document` cannot be 1132 converted to :class:`BSON`. 1133 1134 :Parameters: 1135 - `document`: mapping type representing a document 1136 - `check_keys` (optional): check if keys start with '$' or 1137 contain '.', raising :class:`~bson.errors.InvalidDocument` in 1138 either case 1139 - `codec_options` (optional): An instance of 1140 :class:`~bson.codec_options.CodecOptions`. 1141 1142 .. versionchanged:: 3.0 1143 Replaced `uuid_subtype` option with `codec_options`. 1144 """ 1145 return cls(encode(document, check_keys, codec_options))
Encode a document to a new BSON
instance.
A document can be any mapping type (like dict
).
Raises TypeError
if document
is not a mapping type,
or contains keys that are not instances of
basestring
(str
in python 3). Raises
~bson.errors.InvalidDocument
if document
cannot be
converted to BSON
.
:Parameters:
document
: mapping type representing a documentcheck_keys
(optional): check if keys start with '$' or contain '.', raising~bson.errors.InvalidDocument
in either casecodec_options
(optional): An instance of~bson.codec_options.CodecOptions
.
Changed in version 3.0:
Replaced uuid_subtype
option with codec_options
.
1147 def decode(self, codec_options=DEFAULT_CODEC_OPTIONS): 1148 """Decode this BSON data. 1149 1150 By default, returns a BSON document represented as a Python 1151 :class:`dict`. To use a different :class:`MutableMapping` class, 1152 configure a :class:`~bson.codec_options.CodecOptions`:: 1153 1154 >>> import collections # From Python standard library. 1155 >>> import bson 1156 >>> from .codec_options import CodecOptions 1157 >>> data = bson.BSON.encode({'a': 1}) 1158 >>> decoded_doc = bson.BSON(data).decode() 1159 <type 'dict'> 1160 >>> options = CodecOptions(document_class=collections.OrderedDict) 1161 >>> decoded_doc = bson.BSON(data).decode(codec_options=options) 1162 >>> type(decoded_doc) 1163 <class 'collections.OrderedDict'> 1164 1165 :Parameters: 1166 - `codec_options` (optional): An instance of 1167 :class:`~bson.codec_options.CodecOptions`. 1168 1169 .. versionchanged:: 3.0 1170 Removed `compile_re` option: PyMongo now always represents BSON 1171 regular expressions as :class:`~bson.regex.Regex` objects. Use 1172 :meth:`~bson.regex.Regex.try_compile` to attempt to convert from a 1173 BSON regular expression to a Python regular expression object. 1174 1175 Replaced `as_class`, `tz_aware`, and `uuid_subtype` options with 1176 `codec_options`. 1177 """ 1178 return decode(self, codec_options)
Decode this BSON data.
By default, returns a BSON document represented as a Python
dict
. To use a different MutableMapping
class,
configure a ~bson.codec_options.CodecOptions
::
>>> import collections # From Python standard library.
>>> import bson
>>> from .codec_options import CodecOptions
>>> data = bson.BSON.encode({'a': 1})
>>> decoded_doc = bson.BSON(data).decode()
<type 'dict'>
>>> options = CodecOptions(document_class=collections.OrderedDict)
>>> decoded_doc = bson.BSON(data).decode(codec_options=options)
>>> type(decoded_doc)
<class 'collections.OrderedDict'>
:Parameters:
codec_options
(optional): An instance of~bson.codec_options.CodecOptions
.
Changed in version 3.0:
Removed compile_re
option: PyMongo now always represents BSON
regular expressions as ~bson.regex.Regex
objects. Use
~bson.regex.Regex.try_compile()
to attempt to convert from a
BSON regular expression to a Python regular expression object.
Replaced as_class
, tz_aware
, and uuid_subtype
options with
codec_options
.
Inherited Members
- builtins.bytes
- capitalize
- center
- count
- endswith
- expandtabs
- find
- fromhex
- hex
- index
- isalnum
- isalpha
- isascii
- isdigit
- islower
- isspace
- istitle
- isupper
- join
- ljust
- lower
- lstrip
- maketrans
- partition
- replace
- removeprefix
- removesuffix
- rfind
- rindex
- rjust
- rpartition
- rsplit
- rstrip
- split
- splitlines
- startswith
- strip
- swapcase
- title
- translate
- upper
- zfill
Is the C extension installed?