Reinforcement learning algorithms can autonomously learn to search a design space for high-performance solutions. However, modern engineering often entails the use of computationally intensive simulation, which can lead to slower design timelines with highly iterative approaches such as reinforcement learning. This work provides a reinforcement learning framework that leverages models of varying fidelity to enable an effective solution search while reducing overall computational needs. Specifically, it utilizes models of varying fidelity while training the agent, iteratively progressing from low- to high fidelity. To demonstrate the effectiveness of the proposed framework, we apply it to two multimodal multi-objective constrained mixed integer nonlinear design problems involving the components of a ground and aerial vehicle. Specifically, for each problem, we utilize a high-fidelity and a low-fidelity deep neural network surrogate model, trained on performance data generated from underlying ground truth models. A tradeoff between solution quality and the proportion of low-fidelity surrogate model usage is observed. Specifically, high-quality solutions are achieved with substantial reductions in computational expense, showcasing the effectiveness of the framework for design problems where the use of just a high-fidelity model is infeasible. This solution quality-computational efficiency tradeoff is contextualized by visualizing the exploration behavior of the design agents.