JA4H: HTTP Client Fingerprinting
Introduction
JA4H (JA4 HTTP) is a network fingerprinting technique that identifies and classifies HTTP clients based on the structure and ordering of their HTTP request headers, rather than relying on easily-modified header values. This makes JA4H particularly effective for detecting malicious tools, bots, and automated clients that attempt to masquerade as legitimate browsers.
Key Advantage: JA4H operates on the structural patterns of HTTP requests, making it resistant to simple evasion techniques like randomizing User-Agent strings or adding/removing individual headers.
Skill Level: Intermediate
Prerequisites:
- Understanding of HTTP protocol basics
- Familiarity with network packet capture
- Basic Python programming knowledge
- Understanding of JA4 fundamentals (TLS fingerprinting)
Learning Objectives:
- Understand HTTP header structure and ordering
- Construct JA4H fingerprints from HTTP requests
- Implement JA4H fingerprinting in Python
- Detect anomalous HTTP clients
- Integrate JA4H with security tools
Why JA4H Matters
Traditional HTTP client identification relies on:
- User-Agent strings - Easily spoofed
- IP addresses - Can be rotated or proxied
- Cookie values - Session-specific and changeable
JA4H solves these problems by fingerprinting:
- Header order (which headers appear in what sequence)
- Header presence (which headers are included)
- Request method and HTTP version
- Specific header characteristics (Cookie, Referer, Accept-Language)
Real-World Applications
- Bot Detection: Identify automated tools pretending to be browsers
- Malware C2: Detect command-and-control traffic patterns
- API Abuse: Identify unauthorized API clients
- Threat Hunting: Find malicious HTTP clients in network logs
- Compliance: Ensure only approved HTTP clients access services
Understanding JA4H Components
The JA4H Fingerprint Format
<method><version><cookie><referer><header_count><lang>_<header_hash>Example: ge11c110en_53b50f4ec784
Component Breakdown
1. HTTP Method (2 chars)
Abbreviated HTTP request method:
ge= GETpo= POSTpu= PUTde= DELETEhe= HEADop= OPTIONSpa= PATCHco= CONNECTtr= TRACE
2. HTTP Version (2 chars)
Protocol version used:
10= HTTP/1.011= HTTP/1.120= HTTP/230= HTTP/3
3. Cookie Presence (1 char)
Indicates if Cookie header exists:
c= Cookie presentn= No cookie
4. Referer Presence (1 char)
Indicates if Referer header exists:
r= Referer presentn= No referer
5. Header Count (2 chars)
Total number of headers (00-99):
10= 10 headers15= 15 headers- Max is
99(if more than 99, use 99)
6. Accept-Language (2 chars)
First 2 characters of Accept-Language value:
en= English (en-US, en-GB, etc.)fr= Frenchde= German00= No Accept-Language header
7. Header Hash (12 chars)
SHA-256 hash of header names in order (first 12 chars):
- Concatenate all header names
- Compute SHA-256 hash
- Take first 12 hexadecimal characters
Step-by-Step: Constructing a JA4H Fingerprint
Example HTTP Request
GET /api/data HTTP/1.1
Host: example.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36
Accept: application/json, text/plain, */*
Accept-Language: en-US,en;q=0.9
Accept-Encoding: gzip, deflate, br
Referer: https://example.com/
Cookie: session=abc123; user_id=456
Connection: keep-aliveStep 1: Extract Method and Version
- Method: GET →
ge - Version: HTTP/1.1 →
11
Step 2: Check for Cookie and Referer
- Cookie: Present →
c - Referer: Present →
r
Step 3: Count Headers
Headers present:
- Host
- User-Agent
- Accept
- Accept-Language
- Accept-Encoding
- Referer
- Cookie
- Connection
Count: 8 → 08
Step 4: Extract Accept-Language
- Accept-Language:
en-US,en;q=0.9 - First 2 chars:
en
Step 5: Compute Header Hash
Header names in order:
Host,User-Agent,Accept,Accept-Language,Accept-Encoding,Referer,Cookie,ConnectionPython Implementation:
import hashlib
def compute_ja4h_header_hash(headers_list):
"""
Compute JA4H header hash.
Args:
headers_list: List of header names in order
Returns:
First 12 characters of SHA-256 hash
"""
# Join header names with commas
header_string = ','.join(headers_list)
# Compute SHA-256 hash
hash_value = hashlib.sha256(header_string.encode()).hexdigest()
# Return first 12 characters
return hash_value[:12]
# Example
headers = ['Host', 'User-Agent', 'Accept', 'Accept-Language',
'Accept-Encoding', 'Referer', 'Cookie', 'Connection']
header_hash = compute_ja4h_header_hash(headers)
print(f"Header Hash: {header_hash}") # Example: 53b50f4ec784Step 6: Assemble the Fingerprint
ge11cr08en_53b50f4ec784Breakdown:
ge= GET method11= HTTP/1.1c= Cookie presentr= Referer present08= 8 headersen= English language53b50f4ec784= Header hash
Complete Python Implementation
import hashlib
from typing import List, Dict, Optional
from urllib.parse import urlparse
class JA4H:
"""JA4H HTTP Client Fingerprinting Implementation"""
# HTTP method abbreviations
METHOD_MAP = {
'GET': 'ge',
'POST': 'po',
'PUT': 'pu',
'DELETE': 'de',
'HEAD': 'he',
'OPTIONS': 'op',
'PATCH': 'pa',
'CONNECT': 'co',
'TRACE': 'tr'
}
# HTTP version mapping
VERSION_MAP = {
'HTTP/1.0': '10',
'HTTP/1.1': '11',
'HTTP/2': '20',
'HTTP/2.0': '20',
'HTTP/3': '30',
'HTTP/3.0': '30'
}
@staticmethod
def compute_fingerprint(
method: str,
version: str,
headers: List[str],
header_values: Dict[str, str]
) -> str:
"""
Compute complete JA4H fingerprint.
Args:
method: HTTP method (GET, POST, etc.)
version: HTTP version (HTTP/1.1, HTTP/2, etc.)
headers: List of header names in order they appear
header_values: Dict of header name: value pairs
Returns:
JA4H fingerprint string
"""
# Method abbreviation
method_abbr = JA4H.METHOD_MAP.get(method.upper(), 'xx')
# Version code
version_code = JA4H.VERSION_MAP.get(version, '00')
# Cookie presence
cookie_flag = 'c' if 'Cookie' in headers or 'cookie' in headers else 'n'
# Referer presence
referer_flag = 'r' if 'Referer' in headers or 'referer' in headers else 'n'
# Header count (max 99)
header_count = min(len(headers), 99)
header_count_str = f"{header_count:02d}"
# Accept-Language first 2 chars
accept_lang = header_values.get('Accept-Language',
header_values.get('accept-language', ''))
if accept_lang:
lang_code = accept_lang[:2].lower()
else:
lang_code = '00'
# Compute header hash
header_string = ','.join(headers)
header_hash = hashlib.sha256(header_string.encode()).hexdigest()[:12]
# Assemble fingerprint
fingerprint = (
f"{method_abbr}{version_code}{cookie_flag}{referer_flag}"
f"{header_count_str}{lang_code}_{header_hash}"
)
return fingerprint
@staticmethod
def parse_http_request(request_text: str) -> Dict:
"""
Parse HTTP request text and extract components.
Args:
request_text: Raw HTTP request as string
Returns:
Dict with method, version, headers, and header_values
"""
lines = request_text.strip().split('\n')
# Parse request line
request_line = lines[0].strip()
parts = request_line.split()
method = parts[0] if len(parts) > 0 else 'GET'
version = parts[2] if len(parts) > 2 else 'HTTP/1.1'
# Parse headers
headers = []
header_values = {}
for line in lines[1:]:
if ':' in line:
name, value = line.split(':', 1)
name = name.strip()
value = value.strip()
headers.append(name)
header_values[name] = value
return {
'method': method,
'version': version,
'headers': headers,
'header_values': header_values
}
# Example Usage
if __name__ == "__main__":
# Sample HTTP request
http_request = """GET /api/data HTTP/1.1
Host: example.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36
Accept: application/json, text/plain, */*
Accept-Language: en-US,en;q=0.9
Accept-Encoding: gzip, deflate, br
Referer: https://example.com/
Cookie: session=abc123; user_id=456
Connection: keep-alive"""
# Parse request
parsed = JA4H.parse_http_request(http_request)
# Compute fingerprint
ja4h_fp = JA4H.compute_fingerprint(
parsed['method'],
parsed['version'],
parsed['headers'],
parsed['header_values']
)
print(f"JA4H Fingerprint: {ja4h_fp}")
# Output: ge11cr08en_53b50f4ec784 (example)Capturing HTTP Traffic
Using Wireshark
- Start Capture: Select network interface
- Filter:
httportcp.port == 80 - Follow Stream: Right-click packet → Follow → HTTP Stream
- Extract Headers: Copy request headers
Using tcpdump
# Capture HTTP traffic
sudo tcpdump -i eth0 -A 'tcp port 80' -w http_capture.pcap
# Read capture and extract HTTP
tcpdump -r http_capture.pcap -AUsing Python + Scapy
from scapy.all import sniff, TCP, Raw
import re
def process_http_packet(packet):
"""Extract HTTP headers from packet"""
if packet.haslayer(TCP) and packet.haslayer(Raw):
try:
payload = packet[Raw].load.decode('utf-8', errors='ignore')
# Check if HTTP request
if payload.startswith(('GET', 'POST', 'PUT', 'DELETE', 'HEAD')):
print("="*60)
print("HTTP Request Captured:")
print(payload[:500]) # Print first 500 chars
print("="*60)
# Parse and fingerprint
parsed = JA4H.parse_http_request(payload)
ja4h = JA4H.compute_fingerprint(
parsed['method'],
parsed['version'],
parsed['headers'],
parsed['header_values']
)
print(f"JA4H Fingerprint: {ja4h}\n")
except Exception as e:
pass
# Capture HTTP traffic
print("Capturing HTTP traffic... Press Ctrl+C to stop")
sniff(filter="tcp port 80", prn=process_http_packet, store=0)Practical Applications
1. Bot Detection
Different HTTP clients have distinct header orderings:
Browser (Chrome):
Host, Connection, User-Agent, Accept, Accept-Encoding, Accept-LanguagePython Requests Library:
Host, User-Agent, Accept-Encoding, Accept, ConnectioncURL:
Host, User-Agent, Accept2. Malware Identification
Many malware families use consistent HTTP patterns:
# Known malware fingerprints database
MALWARE_FINGERPRINTS = {
'ge11nn05en_a1b2c3d4e5f6': 'Cobalt Strike',
'po11cn08en_f6e5d4c3b2a1': 'Metasploit',
'ge10nn03en_9876543210ab': 'Custom Malware'
}
def check_malware(ja4h_fingerprint):
"""Check if fingerprint matches known malware"""
if ja4h_fingerprint in MALWARE_FINGERPRINTS:
return MALWARE_FINGERPRINTS[ja4h_fingerprint]
return None
# Example
fp = "ge11nn05en_a1b2c3d4e5f6"
malware_name = check_malware(fp)
if malware_name:
print(f"ALERT: Detected {malware_name} traffic!")3. API Client Validation
Ensure only authorized clients access your API:
# Whitelist of approved API clients
APPROVED_FINGERPRINTS = [
'ge11cr10en_authorized1',
'po11cr12en_authorized2',
'ge20cr08en_authorized3'
]
def validate_api_client(ja4h_fingerprint):
"""Validate API client against whitelist"""
return ja4h_fingerprint in APPROVED_FINGERPRINTS
# In API middleware
def api_request_handler(request):
ja4h = compute_ja4h_from_request(request)
if not validate_api_client(ja4h):
return {"error": "Unauthorized client"}, 403
# Process authorized request
return process_request(request)Integration with Security Tools
Zeek/Bro Integration
# JA4H extraction in Zeek
event http_header(c: connection, is_orig: bool, name: string, value: string)
{
if (is_orig) # Client headers
{
# Store headers in order
if (c$http?$ja4h_headers)
c$http$ja4h_headers += "," + name;
else
c$http$ja4h_headers = name;
}
}
event http_message_done(c: connection, is_orig: bool, stat: http_message_stat)
{
if (is_orig && c$http?$ja4h_headers)
{
# Compute JA4H fingerprint
local ja4h = compute_ja4h(c$http$method, c$http$ja4h_headers);
print fmt("JA4H: %s for %s", ja4h, c$id$orig_h);
}
}Suricata Rule Example
alert http any any -> any any (msg:"Suspicious JA4H Fingerprint";
http.method; content:"GET";
metadata:ja4h "ge11nn03en_malicious123";
classtype:trojan-activity;
sid:1000001; rev:1;)Elasticsearch Storage
from elasticsearch import Elasticsearch
def store_ja4h_in_elasticsearch(fingerprint_data):
"""Store JA4H fingerprint in Elasticsearch"""
es = Elasticsearch(['http://localhost:9200'])
doc = {
'timestamp': fingerprint_data['timestamp'],
'source_ip': fingerprint_data['source_ip'],
'destination': fingerprint_data['destination'],
'ja4h': fingerprint_data['ja4h'],
'method': fingerprint_data['method'],
'user_agent': fingerprint_data['user_agent'],
'threat_score': fingerprint_data.get('threat_score', 0)
}
es.index(index='ja4h-fingerprints', document=doc)Advanced Techniques
1. Temporal Analysis
Track how fingerprints change over time:
from collections import defaultdict
from datetime import datetime
class JA4HTracker:
def __init__(self):
self.fingerprints = defaultdict(list)
def track(self, ip_address, ja4h, timestamp=None):
"""Track fingerprints per IP over time"""
if timestamp is None:
timestamp = datetime.now()
self.fingerprints[ip_address].append({
'ja4h': ja4h,
'timestamp': timestamp
})
def detect_fingerprint_switching(self, ip_address, time_window_seconds=300):
"""Detect if an IP switches fingerprints rapidly"""
entries = self.fingerprints[ip_address]
if len(entries) < 2:
return False
# Check for multiple unique fingerprints in time window
recent = [e for e in entries
if (datetime.now() - e['timestamp']).seconds < time_window_seconds]
unique_fps = set(e['ja4h'] for e in recent)
return len(unique_fps) > 3 # More than 3 different fingerprints is suspicious
# Usage
tracker = JA4HTracker()
tracker.track('192.168.1.100', 'ge11cr08en_abc123')
tracker.track('192.168.1.100', 'po11cr10en_def456')
tracker.track('192.168.1.100', 'ge20cr05en_ghi789')
tracker.track('192.168.1.100', 'de11nn03en_jkl012')
if tracker.detect_fingerprint_switching('192.168.1.100'):
print("ALERT: Rapid fingerprint switching detected!")2. Machine Learning Classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import LabelEncoder
import pandas as pd
def train_ja4h_classifier(training_data):
"""
Train ML model to classify HTTP clients.
training_data format:
[
{'ja4h': 'ge11cr08en_abc123', 'label': 'legitimate'},
{'ja4h': 'po11nn03en_def456', 'label': 'malware'},
...
]
"""
df = pd.DataFrame(training_data)
# Extract features from JA4H
df['method'] = df['ja4h'].str[:2]
df['version'] = df['ja4h'].str[2:4]
df['has_cookie'] = df['ja4h'].str[4:5]
df['has_referer'] = df['ja4h'].str[5:6]
df['header_count'] = df['ja4h'].str[6:8]
df['language'] = df['ja4h'].str[8:10]
# Encode categorical features
label_encoder = LabelEncoder()
for col in ['method', 'version', 'has_cookie', 'has_referer', 'language']:
df[col] = label_encoder.fit_transform(df[col])
# Prepare training data
X = df[['method', 'version', 'has_cookie', 'has_referer',
'header_count', 'language']]
y = label_encoder.fit_transform(df['label'])
# Train model
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X, y)
return model, label_encoder
# Example: Predict new fingerprint
def classify_ja4h(model, label_encoder, ja4h):
"""Classify a JA4H fingerprint"""
features = {
'method': label_encoder.transform([ja4h[:2]])[0],
'version': label_encoder.transform([ja4h[2:4]])[0],
'has_cookie': label_encoder.transform([ja4h[4:5]])[0],
'has_referer': label_encoder.transform([ja4h[5:6]])[0],
'header_count': int(ja4h[6:8]),
'language': label_encoder.transform([ja4h[8:10]])[0]
}
prediction = model.predict([list(features.values())])
return label_encoder.inverse_transform(prediction)[0]Common Fingerprint Examples
Popular Web Browsers
Chrome/Chromium:
ge11cr15en_a1b2c3d4e5f6
- GET, HTTP/1.1, Cookie, Referer, 15 headers, EnglishFirefox:
ge11cr13en_f6e5d4c3b2a1
- GET, HTTP/1.1, Cookie, Referer, 13 headers, EnglishSafari:
ge11cr14en_9876543210ab
- GET, HTTP/1.1, Cookie, Referer, 14 headers, EnglishCommon HTTP Libraries
Python Requests:
ge11nn05en_abc123def456
- GET, HTTP/1.1, No cookie, No referer, 5 headers, EnglishcURL:
ge11nn03en_fedcba987654
- GET, HTTP/1.1, No cookie, No referer, 3 headers, EnglishWget:
ge10nn04en_147258369abc
- GET, HTTP/1.0, No cookie, No referer, 4 headers, EnglishTroubleshooting
Common Issues
Issue 1: Inconsistent fingerprints from same client
- Cause: Dynamic header insertion (e.g., A/B testing, feature flags)
- Solution: Focus on stable headers, create fingerprint variants
Issue 2: Too many unique fingerprints
- Cause: Headers include timestamps or session IDs
- Solution: Use JA4H_r (raw) variant that excludes variable headers
Issue 3: False positives
- Cause: Legitimate clients with unusual configurations
- Solution: Build comprehensive baseline, use whitelisting
Best Practices
-
Baseline Normal Traffic
- Capture fingerprints from known-good clients
- Document expected fingerprints for each service
- Update baseline as client software updates
-
Combine with Other Signals
- Use JA4H + JA4 (TLS) for stronger identification
- Cross-reference with IP reputation
- Check User-Agent string consistency
-
Monitor for Changes
- Alert on new fingerprints
- Track fingerprint evolution over time
- Investigate rapid switching between fingerprints
-
Privacy Considerations
- JA4H can uniquely identify clients
- Implement data retention policies
- Follow applicable privacy regulations
Key Takeaways
✅ JA4H fingerprints HTTP clients by header structure, not content ✅ Resistant to simple evasion (User-Agent spoofing doesn't work) ✅ Effective for bot detection and malware identification ✅ Combine with JA4 (TLS) for robust client identification ✅ Implement proper baseline and whitelist management ✅ Useful for API security and threat hunting
Next Steps
- Practice: Capture and fingerprint your own HTTP traffic
- Experiment: Try spoofing and see what changes the fingerprint
- Integrate: Add JA4H to your security monitoring pipeline
- Advanced: Combine with JA4T (TCP) and JA4SSH for multi-layer analysis
- Contribute: Share fingerprints with the community
Related Techniques: