Learning World Representations: A Complete Research Journey

This is an unfiltered "think aloud" trace of a complete research project—capturing every question, pivot, disaster, and breakthrough across 1,603 timestamped events. This interactive flow diagram shows how an initial attempt to reproduce a real LLM phenomenon in a controlled synthetic system evolved into a broader investigation of how learned representations influence fine-tuning generalization.

Research Journey

The project began with a straightforward goal: create a 2D equirectangular map of cities with populations over 100,000. Initially inspired by the paper "Language Models Represent Space and Time," it quickly evolved into investigating a deeper question: What world representations emerge when training on data downstream of the world, and how do these representations adapt during fine-tuning?

What followed was a six-week journey involving:

Data engineering: Building custom datasets for distance prediction and binary classification tasks
Model training: Configuring and training Qwen2.5 transformer models with custom tokenizers
Debugging nightmares: Discovering mode collapse, fixing class imbalance, and battling parsing bugs
Disaster recovery: Losing uncommitted implementations for 11 task metrics to a bad git checkout and rebuilding them
System evolution: Expanding to 10+ different spatial reasoning tasks (distance, compass, crossing, inside, nearest, circlecount, triangle, randomwalk, threshold, location)
Code refactoring: Restructuring the evaluation framework to be task-agnostic (reads task types from the dataset instead of hardcoding them)
Critical breakthroughs: Finding fundamental evaluation bugs where the system was comparing prompts against answers
Scientific process: The complete cycle of observing unexpected results, forming hypotheses, designing experiments to test them, analyzing outcomes, and iteratively refining understanding

This visualization contains the complete trajectory across all 1,603 events.

Event Types

Research progress is tracked using the following event types:

QuestionHypothesisExperimentObservationThoughtWorkPivotIdeaFindingLiterature ReviewAnalysisEurekaGoal SetMilestone SetVisualize

Training transformer models to learn spatial representations of world geography—from city visualizations to geodesic distance prediction and multi-task spatial reasoning. Hover over nodes to see the evolving research context at each moment.

Loading content...

Research Journey

What followed was a six-week journey involving:

Data engineering: Building custom datasets for distance prediction and binary classification tasks

Model training: Configuring and training Qwen2.5 transformer models with custom tokenizers

Debugging nightmares: Discovering mode collapse, fixing class imbalance, and battling parsing bugs

Disaster recovery: Losing uncommitted implementations for 11 task metrics to a bad git checkout and rebuilding them

System evolution: Expanding to 10+ different spatial reasoning tasks (distance, compass, crossing, inside, nearest, circlecount, triangle, randomwalk, threshold, location)

Code refactoring: Restructuring the evaluation framework to be task-agnostic (reads task types from the dataset instead of hardcoding them)

Critical breakthroughs: Finding fundamental evaluation bugs where the system was comparing prompts against answers

Scientific process: The complete cycle of observing unexpected results, forming hypotheses, designing experiments to test them, analyzing outcomes, and iteratively refining understanding

This visualization contains the complete trajectory across all 1,603 events.

Event Types

Research progress is tracked using the following event types:

QuestionHypothesisExperimentObservationThoughtWorkPivotIdeaFindingLiterature ReviewAnalysisEurekaGoal SetMilestone SetVisualize