In the last decade, deep learning has revolutionized various components of the conventional robot autonomy stack including aspects of perception, navigation and manipulation. There have been numerous advances in perfecting individual tasks such as scene understanding, visual localization, end-to-end navigation and grasping, which has given us a critical understanding on how to create individual architectures for a specific task. This now brings us to the question, as to whether this disjoint learning of models for robotic tasks, effective in the real-world and whether it is scalable? And more generally, is training task specific models on task specific datasets beneficial to architecting robot intelligence as a whole? In this paper, we argue that multimodel learning or joint multi-task learning is an effective strategy for enabling robots to excel across multiple domains. We describe how multimodel learning can facilitate generalization to unseen scenarios by utilizing domain-specific cues from auxiliary tasks and discuss some of the current mechanisms that can be employed to design multimodel frameworks for robot autonomy.