What is CBOR? Complete Guide to Concise Binary Object Representation
Overview
CBOR (Concise Binary Object Representation) is a binary data serialization format defined in RFC 7049 and RFC 8949. Like Protocol Buffers, CBOR is designed to be compact, fast, and suitable for constrained environments. While CBOR data itself is binary and not human-readable, it has a diagnostic notation that can represent CBOR data in a human-readable form for debugging and documentation purposes.
What is CBOR?
Basic Definition
CBOR is a binary encoding format that represents structured data in a compact way. It's designed to be:
- Concise: Smaller than JSON and XML
- Fast: Quick to encode and decode
- Self-describing: No schema required
- Extensible: Supports custom data types
- Interoperable: Works across different platforms and languages
- Binary format: Not human-readable (except diagnostic notation)
Key Features
Feature | CBOR | JSON | Protocol Buffers |
---|---|---|---|
Schema Required | No | No | Yes |
Human Readable | No (Binary) | Yes | No |
Size Efficiency | High | Low | Very High |
Parsing Speed | Fast | Medium | Very Fast |
Data Types | Rich (23 types) | Limited (6 types) | Rich (custom) |
CBOR Data Types
CBOR supports a rich set of data types organized into major types:
Major Types
// Major Type 0: Unsigned integers (0-23, 24-255, 16-bit, 32-bit, 64-bit)
0, 1, 23, 24, 255, 65535, 4294967295
// Major Type 1: Negative integers
-1, -24, -256, -65536
// Major Type 2: Byte strings
h'48656c6c6f' // "Hello" in hex
// Major Type 3: Text strings
"Hello, World!"
// Major Type 4: Arrays
[1, 2, 3, "hello", true]
// Major Type 5: Maps (objects)
{"name": "John", "age": 30, "active": true}
// Major Type 6: Semantic tags
1(1609459200) // Unix timestamp tag
// Major Type 7: Floats, simple values, break
true, false, null, undefined, 3.14159
CBOR Diagnostic Notation
While CBOR data is stored in binary format and is not human-readable, the CBOR specification defines a diagnostic notation that provides a human-readable representation of CBOR data. This notation is primarily used for:
- Documentation: Explaining CBOR data structures in specifications
- Debugging: Understanding the content of CBOR messages during development
- Testing: Writing test cases with readable CBOR data representations
Diagnostic Notation Examples
// Binary CBOR data (hex): 0x83010203
// Diagnostic notation: [1, 2, 3]
// Binary CBOR data (hex): 0xA26161016162820203
// Diagnostic notation: {"a": 1, "b": [2, 3]}
// Binary CBOR data with semantic tags
// Diagnostic notation: 1(1609459200) // Unix timestamp
// Diagnostic notation: 32("https://example.com") // URI tag
Important Note
The diagnostic notation is NOT the actual CBOR format - it's just a human-readable way to represent what the binary CBOR data contains. When working with CBOR in applications, you're always dealing with the compact binary representation.
CBOR vs Other Formats
Size Comparison Example
Let's compare the same data in different formats:
// JSON (67 bytes)
{
"name": "Alice",
"age": 25,
"active": true,
"scores": [95, 87, 92]
}
// CBOR (42 bytes) - Diagnostic notation (human-readable representation)
{
"name": "Alice",
"age": 25,
"active": true,
"scores": [95, 87, 92]
}
// Actual CBOR binary data (hex):
// A4646E616D65654C69636563616765186961637469766566F5667363...
// Protocol Buffers (≈20 bytes with schema)
// Requires .proto definition
Working with CBOR
Encoding Example (JavaScript)
const cbor = require('cbor');
// Data to encode
const data = {
name: "Alice",
age: 25,
active: true,
scores: [95, 87, 92],
timestamp: new Date()
};
// Encode to CBOR
const encoded = cbor.encode(data);
console.log('CBOR bytes:', encoded.length);
console.log('CBOR hex:', encoded.toString('hex'));
// Decode from CBOR
const decoded = cbor.decode(encoded);
console.log('Decoded:', decoded);
Encoding Example (Python)
import cbor2
import datetime
# Data to encode
data = {
'name': 'Alice',
'age': 25,
'active': True,
'scores': [95, 87, 92],
'timestamp': datetime.datetime.now()
}
# Encode to CBOR
encoded = cbor2.dumps(data)
print(f'CBOR bytes: {len(encoded)}')
print(f'CBOR hex: {encoded.hex()}')
# Decode from CBOR
decoded = cbor2.loads(encoded)
print(f'Decoded: {decoded}')
Streaming Example
const cbor = require('cbor');
const fs = require('fs');
// Create a CBOR encoder stream
const encoder = new cbor.Encoder();
const output = fs.createWriteStream('data.cbor');
encoder.pipe(output);
// Stream multiple objects
encoder.write({id: 1, name: "Alice"});
encoder.write({id: 2, name: "Bob"});
encoder.write({id: 3, name: "Charlie"});
encoder.end();
// Read back with decoder stream
const decoder = new cbor.Decoder();
const input = fs.createReadStream('data.cbor');
input.pipe(decoder);
decoder.on('data', (obj) => {
console.log('Decoded object:', obj);
});
CBOR Binary Format Structure
Basic Structure
CBOR uses a simple encoding scheme where each data item starts with an initial byte:
Initial Byte = Major Type (3 bits) + Additional Information (5 bits)
Bits: 7 6 5 | 4 3 2 1 0
------+----------
Major | Additional
Type | Information
Encoding Examples
// Positive integer 42
// Major type 0, additional info 24 (1-byte follows)
0x18, 0x2A
// Text string "CBOR"
// Major type 3, length 4
0x64, 0x43, 0x42, 0x4F, 0x52
// Array [1, 2, 3]
// Major type 4, length 3, then elements
0x83, 0x01, 0x02, 0x03
// Map {"a": 1}
// Major type 5, length 1, then key-value pairs
0xA1, 0x61, 0x61, 0x01
Advanced CBOR Features
Semantic Tags
CBOR supports semantic tags for special data types:
// Common semantic tags
const taggedData = {
// Tag 0: Standard date/time string
datetime: cbor.Tagged(0, "2023-12-25T10:30:00Z"),
// Tag 1: Epoch-based date/time
timestamp: cbor.Tagged(1, 1703505000),
// Tag 2: Positive bignum
bigint: cbor.Tagged(2, Buffer.from([0x01, 0x00, 0x00, 0x00, 0x00])),
// Tag 21: Base64url encoding expected
base64url: cbor.Tagged(21, "SGVsbG8gV29ybGQ"),
// Tag 32: URI
uri: cbor.Tagged(32, "https://example.com")
};
Indefinite-Length Items
CBOR supports streaming of indefinite-length arrays and maps:
// Indefinite-length array
const indefiniteArray = cbor.encode([
cbor.BREAK, // Special marker for indefinite length
1, 2, 3, 4, 5
]);
// Indefinite-length map
const indefiniteMap = cbor.encode(new Map([
[cbor.BREAK, null], // Indefinite length marker
["key1", "value1"],
["key2", "value2"]
]));
Use Cases and Applications
IoT and Constrained Devices
// Sensor data transmission
const sensorData = {
deviceId: "sensor-001",
temperature: 23.5,
humidity: 65.2,
battery: 87,
timestamp: Date.now()
};
// CBOR is ideal for IoT due to small size
const cborData = cbor.encode(sensorData);
// Transmit over LoRaWAN, NB-IoT, etc.
Web APIs
// Express.js middleware for CBOR
app.use('/api/cbor', (req, res, next) => {
if (req.headers['content-type'] === 'application/cbor') {
let body = Buffer.alloc(0);
req.on('data', chunk => {
body = Buffer.concat([body, chunk]);
});
req.on('end', () => {
req.body = cbor.decode(body);
next();
});
} else {
next();
}
});
// API endpoint
app.post('/api/cbor/data', (req, res) => {
// Process CBOR data
const result = processData(req.body);
// Respond with CBOR
res.setHeader('Content-Type', 'application/cbor');
res.send(cbor.encode(result));
});
Configuration Files
// config.cbor - Binary configuration
const config = {
server: {
host: "localhost",
port: 8080,
ssl: true
},
database: {
url: "mongodb://localhost:27017",
options: {
maxPoolSize: 10,
serverSelectionTimeoutMS: 5000
}
},
features: {
authentication: true,
logging: true,
metrics: false
}
};
// Save as CBOR
fs.writeFileSync('config.cbor', cbor.encode(config));
// Load CBOR config
const loadedConfig = cbor.decode(fs.readFileSync('config.cbor'));
Performance Considerations
Encoding Performance
const Benchmark = require('benchmark');
const suite = new Benchmark.Suite;
const testData = {
users: Array.from({length: 1000}, (_, i) => ({
id: i,
name: `User ${i}`,
email: `user${i}@example.com`,
active: i % 2 === 0,
scores: [Math.random() * 100, Math.random() * 100]
}))
};
suite
.add('JSON.stringify', () => {
JSON.stringify(testData);
})
.add('CBOR.encode', () => {
cbor.encode(testData);
})
.on('complete', function() {
console.log('Fastest is ' + this.filter('fastest').map('name'));
})
.run();
Memory Usage
// Memory-efficient streaming for large datasets
const stream = require('stream');
class CBORProcessor extends stream.Transform {
constructor() {
super({ objectMode: true });
}
_transform(chunk, encoding, callback) {
try {
// Process each CBOR object
const processed = this.processObject(chunk);
this.push(cbor.encode(processed));
callback();
} catch (error) {
callback(error);
}
}
processObject(obj) {
// Your processing logic here
return obj;
}
}
Best Practices
1. Choose Appropriate Data Types
// Good: Use appropriate numeric types
const data = {
count: 42, // Small integer
price: 19.99, // Float
id: BigInt(123456789012345) // Big integer
};
// Avoid: Everything as strings
const badData = {
count: "42", // Should be number
price: "19.99", // Should be number
id: "123456789012345" // Could be BigInt
};
2. Use Semantic Tags
// Good: Use semantic tags for special types
const eventData = {
eventId: "evt-123",
timestamp: cbor.Tagged(1, Math.floor(Date.now() / 1000)),
location: cbor.Tagged(32, "https://maps.example.com/location/123"),
metadata: cbor.Tagged(21, base64UrlEncode(metadataBuffer))
};
3. Handle Errors Gracefully
function safeCBORDecode(buffer) {
try {
return cbor.decode(buffer);
} catch (error) {
if (error.message.includes('Unexpected end of CBOR data')) {
console.error('Incomplete CBOR data received');
return null;
}
throw error;
}
}
Common Pitfalls
1. Indefinite Length Confusion
// Wrong: Mixing definite and indefinite length
const wrongArray = [cbor.BREAK, 1, 2, 3]; // Don't do this
// Right: Proper indefinite length array
const rightArray = cbor.encodeCanonical([1, 2, 3], {
indefinite: true
});
2. Tag Misuse
// Wrong: Using wrong tag for data type
const wrongDate = cbor.Tagged(2, "2023-12-25"); // Tag 2 is for bignums
// Right: Correct tag for date
const rightDate = cbor.Tagged(0, "2023-12-25T00:00:00Z");
Conclusion
CBOR is an excellent choice for applications that need:
- Compact binary encoding without schema requirements
- Rich data type support beyond JSON's limitations
- Fast parsing in constrained environments
- Self-describing format for flexible data exchange
- Streaming capabilities for large datasets
While Protocol Buffers might be more efficient for high-performance applications with stable schemas, CBOR offers a great balance of efficiency, flexibility, and ease of use, making it ideal for IoT, web APIs, and configuration files.
Further Reading
- RFC 8949: Concise Binary Object Representation (CBOR)
- CBOR.io - Official website
- CBOR Playground - Online CBOR encoder/decoder
- IANA CBOR Tags Registry