Growing AI like a Child, at Scale


Humans never "learn" intelligence. Humans develop intelligence. Biological lives on this planet take heavy advantage of intelligent primitives embedded in their genes. Cats never "learn" to backflip. Birds never "learn" to fly. In the same way, humans never "learn" to cognize. Humans are born with a set of core cognition, that sets the foundation for our perception and action in the physical world.

Our core cognition unravels through a specific developmental trajectory as we grow into adulthood. Here, we seek to do the same for our machines, leveraging heavy cognitive literature in developmental psychology, and in particular, Piagetian theory of cognitive development, to design our growing up curriculum. In addition, we also want to learn from the current success of machine intelligence, specifically the scaling law.

Instead of putting growing up and scaling up into opposite camps, we argue the next step towards human-like artificial general intelligence is to grow AI like a child, at scale, and call for an open-source collaborative community effort to launch the next step.

Updates

Vision Language Models Know Law of Conservation without Understanding More-or-Less

TLDR: We find that Vision Language Models know the law of conservation but fail at quantitative understanding in a way opposite to human intuitive biases.

Vision Language Models See What You Want but not What You See

TLDR: VLMs score high on intentionality understanding but low on perspective-taking, challenging traditional cognitive developmental views on the relationship between these two theory-of-mind abilities.

Probing Mechanical Reasoning in Large Vision Language Models

TLDR: We investigated understanding of system stability, pulley and gear systems, seesaw-like systems and leverage principle, and fluid systems, and we observed diverse yet consistent behaviors in VLMs.

In-Progress

  • We reason that in order to scale up to an infinite number of training materials for embodied VLMs, we need to build the embodied core cognition scenarios suite to generate data, which we call the GalacSuite. We identify 2517 core cognition scenarios from our previous work. We are leveraging mujuco to build embodied, physics-based embodied core cognition scenarios suite GalacSuite.

Feel free to connect with us if you are interested in these projects and wanna join, or you have some really amazing ideas to propose and invite us to work on it/collaborate. We are always happy to chat!