โ† Back to Demo Index

๐ŸŒ Beyond Language Models

Galton Flow Everywhere: Classification, Attention, RL, and More

๐Ÿ–ผ๏ธ Use Case 1

Image Classification

๐Ÿ” Use Case 2

Attention Mechanism

๐ŸŽฎ Use Case 3

RL Policy

๐ŸŒŠ Universal

The Pattern

๐Ÿ–ผ๏ธ Image Classification via 2D Flow Fields

Instead of CNN โ†’ softmax over 10 digits, we create a 2D probability landscape where probes naturally flow toward the correct digit's region.

Watch as probes drop randomly and flow through the learned geometry toward digit classes arranged in a circle!

0
Probes Dropped
โ€”
Predicted Digit
0%
Confidence

๐Ÿ’ก How It Works

2D Flow Field: Each image creates a unique 2D velocity field via a learned SDF network.

Class Centers: The 10 digit classes are arranged in a circle (learnable positions).

Probe Integration: Probes start at random positions and flow toward the correct digit's region using RK2 integration.

Soft Assignment: Gaussian windows around class centers convert probe positions โ†’ class probabilities.

๐Ÿ”— Key Benefits:
  • Uncertainty Quantification: Probe spread = classification confidence
  • Interpretability: Visualize the decision landscape
  • Smooth Gradients: Continuous flow โ†’ stable training
  • Same Loss: Standard cross-entropy still works!

๐Ÿ’ป Run It Yourself

See the full implementation with CNN encoder, 2D SDF flow field, and training loop:

๐Ÿ“‚ examples/mnist_classifier.py
python examples/mnist_classifier.py

# Creates model with:
# - SimpleCNN encoder (image โ†’ embedding)
# - FlowField2D (embedding โ†’ 2D velocity field)
# - Learnable class centers in a circle
# - RK2 integration for probe trajectories
# - Visualization functions included!

๐Ÿ” Attention Mechanism as Geometric Routing

Attention is fundamentally about routing information. Instead of computing routing weights via softmax(QK^T), let them emerge from geometric flow.

Probes flow from a uniform distribution across the sequence toward relevant key positions!

0
Probes Integrated
โ€”
Top Attended Key
0%
Attention Weight

๐Ÿ’ก Flow-Based Attention

Standard Attention:

Q, K, V = projections(x)
scores = Q @ K.T / โˆšd       # O(Lยฒ) dot products
weights = softmax(scores)    # Normalize
output = weights @ V

Flow Attention:

Q, K, V = projections(x)
for each query:
    field = sdf_network(query, K)  # Create routing field
    probes = integrate(field)       # Flow to relevant keys
    weights = probe_density(probes) # Where did they land?
output = weights @ V                # Same as standard!
๐Ÿ”— Why This Helps:
  • Sparse Routing: Probes only visit relevant keys
  • Hierarchical: Coarse routing โ†’ fine routing
  • Adaptive: More probes for hard queries
  • Interpretable: Visualize probe trajectories!
  • Drop-in Replacement: Same API as standard attention

๐Ÿ’ป Run It Yourself

Full implementation with Q/K/V projections and probe-based routing:

๐Ÿ“‚ examples/attention_flow.py
python examples/attention_flow.py

# Creates FlowAttention module:
# - Standard Q, K, V projections
# - AttentionFlowField (SDF for routing)
# - Per-query probe integration
# - Soft bucket assignment to keys
# - Visualize attention patterns!

๐ŸŽฎ RL Policy as Flow Through Action Space

Policy networks map states to action distributions. Instead of state โ†’ logits โ†’ softmax, create a flow field over action space.

The agent "feels out" the action landscapeโ€”probes flow toward good actions, uncertainty guides exploration!

Medium (spread = moderate)
0
Probes Dropped
โ€”
Chosen Action
0%
Action Probability

๐Ÿ’ก Policy as Flow

Standard Policy:

logits = policy_network(state)
action_probs = softmax(logits)
action = sample(action_probs)

Flow Policy:

field = sdf_network(state, action_space)
probes = integrate(field)         # Flow to good actions
action_probs = probe_density()    # Where did they land?
action = sample(action_probs)     # Same sampling!
๐Ÿ”— RL-Specific Benefits:
  • Uncertainty โ†’ Exploration: Probe spread = state uncertainty โ†’ explore more
  • Smooth Gradients: Continuous flow โ†’ stable policy updates
  • Interpretable: "Why did the agent choose that action?" โ†’ visualize the flow!
  • Adaptive Compute: Simple decisions = few probes (fast), critical decisions = more probes (careful)
  • Natural Smoothing: Flow naturally spreads to nearby actions

๐Ÿ’ป Run It Yourself

Complete RL policy implementation with GridWorld environment:

๐Ÿ“‚ examples/rl_policy_flow.py
python examples/rl_policy_flow.py

# Creates FlowPolicy with:
# - State encoder network
# - PolicyFlowField (1D action space)
# - Probe integration with RK2
# - GridWorld environment included
# - Training loop sketch for REINFORCE/PPO

๐ŸŒŠ The Universal Pattern

The core insightโ€”probability as geometric flowโ€”applies anywhere you use softmax.

All three examples follow the same template. This isn't just for LLMsโ€”it's a fundamental rethinking of categorical probability!

๐ŸŽฏ The Template

// Traditional approach (everywhere)
logits = neural_network(input)
probabilities = softmax(logits)
choice = sample(probabilities)

// Galton approach (universal)
context = neural_network(input)
field = sdf_network(context, choice_space)
probes = integrate(field)
probabilities = probe_density(probes)
choice = sample(probabilities)
๐Ÿ’ก What You Gain Across All Domains:
Benefit How It Helps
๐ŸŽฏ Uncertainty Probe spread = confidence (no post-hoc entropy calculations)
๐Ÿ” Interpretability Visualize decision landscapes, understand why choices were made
โšก Adaptive Compute Confident decisions = few probes (fast), uncertain = more probes (careful)
๐ŸŽจ Smooth Optimization Continuous flow = stable gradients, better training dynamics
๐ŸŒŠ Physical Intuition Decisions as flow (not algebra), natural and interpretable

๐Ÿš€ More Use Cases

The pattern extends to:

  • Mixture of Experts: Router network โ†’ flow โ†’ expert selection
  • Hierarchical Classification: Cascaded flow (coarse โ†’ fine)
  • Structured Prediction: Flow over structured spaces (parsing, graphs)
  • Neural Architecture Search: Flow through architecture space
  • Multi-Modal Fusion: Multiple modalities create combined flow field
  • Seq2Seq Decoding: Replace decoder softmax with flow
  • Recommendation Systems: User state โ†’ flow over item space
  • Graph Neural Networks: Message passing via geometric flow

See docs/use-cases.md for detailed patterns and code!

๐Ÿงช Try It on Your Domain

Step-by-step:

  1. Identify where you use softmax (classification, routing, sampling, etc.)
  2. Define your choice space (classes, tokens, actions, items, etc.)
  3. Create an SDF network that takes your context as input
  4. Integrate probes through the learned field (RK2, adaptive steps)
  5. Assign probes to choices via soft buckets (Gaussian windows)
  6. Train with same loss (cross-entropy, policy gradient, etc.)

The geometry handles the rest!

๐Ÿ’ฌ Final Thought:

"For decades, we've calculated probabilities algebraically. But in the physical world, probability flows. Water finds its level. Particles settle. Energy minimizes.

Galton Lab isn't just for language models. It's a new way of thinking about uncertainty in AIโ€”one where the geometry does the work, the physics guides the bits, and probability emerges naturally from flow."

โ€” Explore the code, run the examples, try it on your domain!

๐Ÿ”— GitHub Repository ๐Ÿ“‚ All Examples ๐Ÿ“– Use Cases Guide โ† Back to Foundation Demo