Quick-Labs
JA4
JA4H - HTTP Client Fingerprinting

JA4H: HTTP Client Fingerprinting

Introduction

JA4H (JA4 HTTP) is a network fingerprinting technique that identifies and classifies HTTP clients based on the structure and ordering of their HTTP request headers, rather than relying on easily-modified header values. This makes JA4H particularly effective for detecting malicious tools, bots, and automated clients that attempt to masquerade as legitimate browsers.

Key Advantage: JA4H operates on the structural patterns of HTTP requests, making it resistant to simple evasion techniques like randomizing User-Agent strings or adding/removing individual headers.

Skill Level: Intermediate

Prerequisites:

  • Understanding of HTTP protocol basics
  • Familiarity with network packet capture
  • Basic Python programming knowledge
  • Understanding of JA4 fundamentals (TLS fingerprinting)

Learning Objectives:

  • Understand HTTP header structure and ordering
  • Construct JA4H fingerprints from HTTP requests
  • Implement JA4H fingerprinting in Python
  • Detect anomalous HTTP clients
  • Integrate JA4H with security tools

Why JA4H Matters

Traditional HTTP client identification relies on:

  • User-Agent strings - Easily spoofed
  • IP addresses - Can be rotated or proxied
  • Cookie values - Session-specific and changeable

JA4H solves these problems by fingerprinting:

  • Header order (which headers appear in what sequence)
  • Header presence (which headers are included)
  • Request method and HTTP version
  • Specific header characteristics (Cookie, Referer, Accept-Language)

Real-World Applications

  1. Bot Detection: Identify automated tools pretending to be browsers
  2. Malware C2: Detect command-and-control traffic patterns
  3. API Abuse: Identify unauthorized API clients
  4. Threat Hunting: Find malicious HTTP clients in network logs
  5. Compliance: Ensure only approved HTTP clients access services

Understanding JA4H Components

The JA4H Fingerprint Format

<method><version><cookie><referer><header_count><lang>_<header_hash>

Example: ge11c110en_53b50f4ec784

Component Breakdown

1. HTTP Method (2 chars)

Abbreviated HTTP request method:

  • ge = GET
  • po = POST
  • pu = PUT
  • de = DELETE
  • he = HEAD
  • op = OPTIONS
  • pa = PATCH
  • co = CONNECT
  • tr = TRACE

2. HTTP Version (2 chars)

Protocol version used:

  • 10 = HTTP/1.0
  • 11 = HTTP/1.1
  • 20 = HTTP/2
  • 30 = HTTP/3

Indicates if Cookie header exists:

  • c = Cookie present
  • n = No cookie

4. Referer Presence (1 char)

Indicates if Referer header exists:

  • r = Referer present
  • n = No referer

5. Header Count (2 chars)

Total number of headers (00-99):

  • 10 = 10 headers
  • 15 = 15 headers
  • Max is 99 (if more than 99, use 99)

6. Accept-Language (2 chars)

First 2 characters of Accept-Language value:

  • en = English (en-US, en-GB, etc.)
  • fr = French
  • de = German
  • 00 = No Accept-Language header

7. Header Hash (12 chars)

SHA-256 hash of header names in order (first 12 chars):

  • Concatenate all header names
  • Compute SHA-256 hash
  • Take first 12 hexadecimal characters

Step-by-Step: Constructing a JA4H Fingerprint

Example HTTP Request

GET /api/data HTTP/1.1
Host: example.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36
Accept: application/json, text/plain, */*
Accept-Language: en-US,en;q=0.9
Accept-Encoding: gzip, deflate, br
Referer: https://example.com/
Cookie: session=abc123; user_id=456
Connection: keep-alive

Step 1: Extract Method and Version

  • Method: GET → ge
  • Version: HTTP/1.1 → 11
  • Cookie: Present → c
  • Referer: Present → r

Step 3: Count Headers

Headers present:

  1. Host
  2. User-Agent
  3. Accept
  4. Accept-Language
  5. Accept-Encoding
  6. Referer
  7. Cookie
  8. Connection

Count: 8 → 08

Step 4: Extract Accept-Language

  • Accept-Language: en-US,en;q=0.9
  • First 2 chars: en

Step 5: Compute Header Hash

Header names in order:

Host,User-Agent,Accept,Accept-Language,Accept-Encoding,Referer,Cookie,Connection

Python Implementation:

import hashlib
 
def compute_ja4h_header_hash(headers_list):
    """
    Compute JA4H header hash.
    
    Args:
        headers_list: List of header names in order
    
    Returns:
        First 12 characters of SHA-256 hash
    """
    # Join header names with commas
    header_string = ','.join(headers_list)
    
    # Compute SHA-256 hash
    hash_value = hashlib.sha256(header_string.encode()).hexdigest()
    
    # Return first 12 characters
    return hash_value[:12]
 
# Example
headers = ['Host', 'User-Agent', 'Accept', 'Accept-Language', 
           'Accept-Encoding', 'Referer', 'Cookie', 'Connection']
 
header_hash = compute_ja4h_header_hash(headers)
print(f"Header Hash: {header_hash}")  # Example: 53b50f4ec784

Step 6: Assemble the Fingerprint

ge11cr08en_53b50f4ec784

Breakdown:

  • ge = GET method
  • 11 = HTTP/1.1
  • c = Cookie present
  • r = Referer present
  • 08 = 8 headers
  • en = English language
  • 53b50f4ec784 = Header hash

Complete Python Implementation

import hashlib
from typing import List, Dict, Optional
from urllib.parse import urlparse
 
class JA4H:
    """JA4H HTTP Client Fingerprinting Implementation"""
    
    # HTTP method abbreviations
    METHOD_MAP = {
        'GET': 'ge',
        'POST': 'po',
        'PUT': 'pu',
        'DELETE': 'de',
        'HEAD': 'he',
        'OPTIONS': 'op',
        'PATCH': 'pa',
        'CONNECT': 'co',
        'TRACE': 'tr'
    }
    
    # HTTP version mapping
    VERSION_MAP = {
        'HTTP/1.0': '10',
        'HTTP/1.1': '11',
        'HTTP/2': '20',
        'HTTP/2.0': '20',
        'HTTP/3': '30',
        'HTTP/3.0': '30'
    }
    
    @staticmethod
    def compute_fingerprint(
        method: str,
        version: str,
        headers: List[str],
        header_values: Dict[str, str]
    ) -> str:
        """
        Compute complete JA4H fingerprint.
        
        Args:
            method: HTTP method (GET, POST, etc.)
            version: HTTP version (HTTP/1.1, HTTP/2, etc.)
            headers: List of header names in order they appear
            header_values: Dict of header name: value pairs
        
        Returns:
            JA4H fingerprint string
        """
        # Method abbreviation
        method_abbr = JA4H.METHOD_MAP.get(method.upper(), 'xx')
        
        # Version code
        version_code = JA4H.VERSION_MAP.get(version, '00')
        
        # Cookie presence
        cookie_flag = 'c' if 'Cookie' in headers or 'cookie' in headers else 'n'
        
        # Referer presence
        referer_flag = 'r' if 'Referer' in headers or 'referer' in headers else 'n'
        
        # Header count (max 99)
        header_count = min(len(headers), 99)
        header_count_str = f"{header_count:02d}"
        
        # Accept-Language first 2 chars
        accept_lang = header_values.get('Accept-Language', 
                                       header_values.get('accept-language', ''))
        if accept_lang:
            lang_code = accept_lang[:2].lower()
        else:
            lang_code = '00'
        
        # Compute header hash
        header_string = ','.join(headers)
        header_hash = hashlib.sha256(header_string.encode()).hexdigest()[:12]
        
        # Assemble fingerprint
        fingerprint = (
            f"{method_abbr}{version_code}{cookie_flag}{referer_flag}"
            f"{header_count_str}{lang_code}_{header_hash}"
        )
        
        return fingerprint
    
    @staticmethod
    def parse_http_request(request_text: str) -> Dict:
        """
        Parse HTTP request text and extract components.
        
        Args:
            request_text: Raw HTTP request as string
        
        Returns:
            Dict with method, version, headers, and header_values
        """
        lines = request_text.strip().split('\n')
        
        # Parse request line
        request_line = lines[0].strip()
        parts = request_line.split()
        method = parts[0] if len(parts) > 0 else 'GET'
        version = parts[2] if len(parts) > 2 else 'HTTP/1.1'
        
        # Parse headers
        headers = []
        header_values = {}
        
        for line in lines[1:]:
            if ':' in line:
                name, value = line.split(':', 1)
                name = name.strip()
                value = value.strip()
                headers.append(name)
                header_values[name] = value
        
        return {
            'method': method,
            'version': version,
            'headers': headers,
            'header_values': header_values
        }
 
# Example Usage
if __name__ == "__main__":
    # Sample HTTP request
    http_request = """GET /api/data HTTP/1.1
Host: example.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36
Accept: application/json, text/plain, */*
Accept-Language: en-US,en;q=0.9
Accept-Encoding: gzip, deflate, br
Referer: https://example.com/
Cookie: session=abc123; user_id=456
Connection: keep-alive"""
    
    # Parse request
    parsed = JA4H.parse_http_request(http_request)
    
    # Compute fingerprint
    ja4h_fp = JA4H.compute_fingerprint(
        parsed['method'],
        parsed['version'],
        parsed['headers'],
        parsed['header_values']
    )
    
    print(f"JA4H Fingerprint: {ja4h_fp}")
    # Output: ge11cr08en_53b50f4ec784 (example)

Capturing HTTP Traffic

Using Wireshark

  1. Start Capture: Select network interface
  2. Filter: http or tcp.port == 80
  3. Follow Stream: Right-click packet → Follow → HTTP Stream
  4. Extract Headers: Copy request headers

Using tcpdump

# Capture HTTP traffic
sudo tcpdump -i eth0 -A 'tcp port 80' -w http_capture.pcap
 
# Read capture and extract HTTP
tcpdump -r http_capture.pcap -A

Using Python + Scapy

from scapy.all import sniff, TCP, Raw
import re
 
def process_http_packet(packet):
    """Extract HTTP headers from packet"""
    if packet.haslayer(TCP) and packet.haslayer(Raw):
        try:
            payload = packet[Raw].load.decode('utf-8', errors='ignore')
            
            # Check if HTTP request
            if payload.startswith(('GET', 'POST', 'PUT', 'DELETE', 'HEAD')):
                print("="*60)
                print("HTTP Request Captured:")
                print(payload[:500])  # Print first 500 chars
                print("="*60)
                
                # Parse and fingerprint
                parsed = JA4H.parse_http_request(payload)
                ja4h = JA4H.compute_fingerprint(
                    parsed['method'],
                    parsed['version'],
                    parsed['headers'],
                    parsed['header_values']
                )
                print(f"JA4H Fingerprint: {ja4h}\n")
        except Exception as e:
            pass
 
# Capture HTTP traffic
print("Capturing HTTP traffic... Press Ctrl+C to stop")
sniff(filter="tcp port 80", prn=process_http_packet, store=0)

Practical Applications

1. Bot Detection

Different HTTP clients have distinct header orderings:

Browser (Chrome):

Host, Connection, User-Agent, Accept, Accept-Encoding, Accept-Language

Python Requests Library:

Host, User-Agent, Accept-Encoding, Accept, Connection

cURL:

Host, User-Agent, Accept

2. Malware Identification

Many malware families use consistent HTTP patterns:

# Known malware fingerprints database
MALWARE_FINGERPRINTS = {
    'ge11nn05en_a1b2c3d4e5f6': 'Cobalt Strike',
    'po11cn08en_f6e5d4c3b2a1': 'Metasploit',
    'ge10nn03en_9876543210ab': 'Custom Malware'
}
 
def check_malware(ja4h_fingerprint):
    """Check if fingerprint matches known malware"""
    if ja4h_fingerprint in MALWARE_FINGERPRINTS:
        return MALWARE_FINGERPRINTS[ja4h_fingerprint]
    return None
 
# Example
fp = "ge11nn05en_a1b2c3d4e5f6"
malware_name = check_malware(fp)
if malware_name:
    print(f"ALERT: Detected {malware_name} traffic!")

3. API Client Validation

Ensure only authorized clients access your API:

# Whitelist of approved API clients
APPROVED_FINGERPRINTS = [
    'ge11cr10en_authorized1',
    'po11cr12en_authorized2',
    'ge20cr08en_authorized3'
]
 
def validate_api_client(ja4h_fingerprint):
    """Validate API client against whitelist"""
    return ja4h_fingerprint in APPROVED_FINGERPRINTS
 
# In API middleware
def api_request_handler(request):
    ja4h = compute_ja4h_from_request(request)
    
    if not validate_api_client(ja4h):
        return {"error": "Unauthorized client"}, 403
    
    # Process authorized request
    return process_request(request)

Integration with Security Tools

Zeek/Bro Integration

# JA4H extraction in Zeek
event http_header(c: connection, is_orig: bool, name: string, value: string)
{
    if (is_orig)  # Client headers
    {
        # Store headers in order
        if (c$http?$ja4h_headers)
            c$http$ja4h_headers += "," + name;
        else
            c$http$ja4h_headers = name;
    }
}

event http_message_done(c: connection, is_orig: bool, stat: http_message_stat)
{
    if (is_orig && c$http?$ja4h_headers)
    {
        # Compute JA4H fingerprint
        local ja4h = compute_ja4h(c$http$method, c$http$ja4h_headers);
        print fmt("JA4H: %s for %s", ja4h, c$id$orig_h);
    }
}

Suricata Rule Example

alert http any any -> any any (msg:"Suspicious JA4H Fingerprint"; 
    http.method; content:"GET"; 
    metadata:ja4h "ge11nn03en_malicious123"; 
    classtype:trojan-activity; 
    sid:1000001; rev:1;)

Elasticsearch Storage

from elasticsearch import Elasticsearch
 
def store_ja4h_in_elasticsearch(fingerprint_data):
    """Store JA4H fingerprint in Elasticsearch"""
    es = Elasticsearch(['http://localhost:9200'])
    
    doc = {
        'timestamp': fingerprint_data['timestamp'],
        'source_ip': fingerprint_data['source_ip'],
        'destination': fingerprint_data['destination'],
        'ja4h': fingerprint_data['ja4h'],
        'method': fingerprint_data['method'],
        'user_agent': fingerprint_data['user_agent'],
        'threat_score': fingerprint_data.get('threat_score', 0)
    }
    
    es.index(index='ja4h-fingerprints', document=doc)

Advanced Techniques

1. Temporal Analysis

Track how fingerprints change over time:

from collections import defaultdict
from datetime import datetime
 
class JA4HTracker:
    def __init__(self):
        self.fingerprints = defaultdict(list)
    
    def track(self, ip_address, ja4h, timestamp=None):
        """Track fingerprints per IP over time"""
        if timestamp is None:
            timestamp = datetime.now()
        
        self.fingerprints[ip_address].append({
            'ja4h': ja4h,
            'timestamp': timestamp
        })
    
    def detect_fingerprint_switching(self, ip_address, time_window_seconds=300):
        """Detect if an IP switches fingerprints rapidly"""
        entries = self.fingerprints[ip_address]
        
        if len(entries) < 2:
            return False
        
        # Check for multiple unique fingerprints in time window
        recent = [e for e in entries 
                  if (datetime.now() - e['timestamp']).seconds < time_window_seconds]
        
        unique_fps = set(e['ja4h'] for e in recent)
        
        return len(unique_fps) > 3  # More than 3 different fingerprints is suspicious
 
# Usage
tracker = JA4HTracker()
tracker.track('192.168.1.100', 'ge11cr08en_abc123')
tracker.track('192.168.1.100', 'po11cr10en_def456')
tracker.track('192.168.1.100', 'ge20cr05en_ghi789')
tracker.track('192.168.1.100', 'de11nn03en_jkl012')
 
if tracker.detect_fingerprint_switching('192.168.1.100'):
    print("ALERT: Rapid fingerprint switching detected!")

2. Machine Learning Classification

from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import LabelEncoder
import pandas as pd
 
def train_ja4h_classifier(training_data):
    """
    Train ML model to classify HTTP clients.
    
    training_data format:
    [
        {'ja4h': 'ge11cr08en_abc123', 'label': 'legitimate'},
        {'ja4h': 'po11nn03en_def456', 'label': 'malware'},
        ...
    ]
    """
    df = pd.DataFrame(training_data)
    
    # Extract features from JA4H
    df['method'] = df['ja4h'].str[:2]
    df['version'] = df['ja4h'].str[2:4]
    df['has_cookie'] = df['ja4h'].str[4:5]
    df['has_referer'] = df['ja4h'].str[5:6]
    df['header_count'] = df['ja4h'].str[6:8]
    df['language'] = df['ja4h'].str[8:10]
    
    # Encode categorical features
    label_encoder = LabelEncoder()
    for col in ['method', 'version', 'has_cookie', 'has_referer', 'language']:
        df[col] = label_encoder.fit_transform(df[col])
    
    # Prepare training data
    X = df[['method', 'version', 'has_cookie', 'has_referer', 
            'header_count', 'language']]
    y = label_encoder.fit_transform(df['label'])
    
    # Train model
    model = RandomForestClassifier(n_estimators=100, random_state=42)
    model.fit(X, y)
    
    return model, label_encoder
 
# Example: Predict new fingerprint
def classify_ja4h(model, label_encoder, ja4h):
    """Classify a JA4H fingerprint"""
    features = {
        'method': label_encoder.transform([ja4h[:2]])[0],
        'version': label_encoder.transform([ja4h[2:4]])[0],
        'has_cookie': label_encoder.transform([ja4h[4:5]])[0],
        'has_referer': label_encoder.transform([ja4h[5:6]])[0],
        'header_count': int(ja4h[6:8]),
        'language': label_encoder.transform([ja4h[8:10]])[0]
    }
    
    prediction = model.predict([list(features.values())])
    return label_encoder.inverse_transform(prediction)[0]

Common Fingerprint Examples

Chrome/Chromium:

ge11cr15en_a1b2c3d4e5f6
- GET, HTTP/1.1, Cookie, Referer, 15 headers, English

Firefox:

ge11cr13en_f6e5d4c3b2a1
- GET, HTTP/1.1, Cookie, Referer, 13 headers, English

Safari:

ge11cr14en_9876543210ab
- GET, HTTP/1.1, Cookie, Referer, 14 headers, English

Common HTTP Libraries

Python Requests:

ge11nn05en_abc123def456
- GET, HTTP/1.1, No cookie, No referer, 5 headers, English

cURL:

ge11nn03en_fedcba987654
- GET, HTTP/1.1, No cookie, No referer, 3 headers, English

Wget:

ge10nn04en_147258369abc
- GET, HTTP/1.0, No cookie, No referer, 4 headers, English

Troubleshooting

Common Issues

Issue 1: Inconsistent fingerprints from same client

  • Cause: Dynamic header insertion (e.g., A/B testing, feature flags)
  • Solution: Focus on stable headers, create fingerprint variants

Issue 2: Too many unique fingerprints

  • Cause: Headers include timestamps or session IDs
  • Solution: Use JA4H_r (raw) variant that excludes variable headers

Issue 3: False positives

  • Cause: Legitimate clients with unusual configurations
  • Solution: Build comprehensive baseline, use whitelisting

Best Practices

  1. Baseline Normal Traffic

    • Capture fingerprints from known-good clients
    • Document expected fingerprints for each service
    • Update baseline as client software updates
  2. Combine with Other Signals

    • Use JA4H + JA4 (TLS) for stronger identification
    • Cross-reference with IP reputation
    • Check User-Agent string consistency
  3. Monitor for Changes

    • Alert on new fingerprints
    • Track fingerprint evolution over time
    • Investigate rapid switching between fingerprints
  4. Privacy Considerations

    • JA4H can uniquely identify clients
    • Implement data retention policies
    • Follow applicable privacy regulations

Key Takeaways

✅ JA4H fingerprints HTTP clients by header structure, not content ✅ Resistant to simple evasion (User-Agent spoofing doesn't work) ✅ Effective for bot detection and malware identification ✅ Combine with JA4 (TLS) for robust client identification ✅ Implement proper baseline and whitelist management ✅ Useful for API security and threat hunting

Next Steps

  1. Practice: Capture and fingerprint your own HTTP traffic
  2. Experiment: Try spoofing and see what changes the fingerprint
  3. Integrate: Add JA4H to your security monitoring pipeline
  4. Advanced: Combine with JA4T (TCP) and JA4SSH for multi-layer analysis
  5. Contribute: Share fingerprints with the community

Related Techniques: