In this paper we propose an information theoretic framework for studying coding and throughput optimization for multi-layered packet transmission systems. Our approach assumes that the system is divided into two separate layers: One code word forms a packet at the physical layer and the code at the network layer spans over these packets. At the receiver, the network layer assumes that the decoded packets arriving from the physical layer either have no errors or are marked as deleted. Thus, albeit the packet loss may be caused, for example, by decoding error, congestion or channel conditions, the network layer treats all decoding errors as erasures regardless of the cause. This allows us to view the system at the network layer as transmission over memoryless erasure channel. We study the throughput optimization and code design across the layers under a total code length constraint while taking also into account the network layer imperfections in the transmission. We use random coding error exponents to achieve results that do not depend on specific coding scheme used. The proposed scheme provides also means for investigating important physical layer phenomena, such as, channel model and lower layer error correction coding in the packet erasure models. Our approach extends to fading channels and networks of multiple nodes and by viewing the two layers of coding as a concatenated coding scheme, a comparison between layer-by-layer and joint cross-layer rate optimization can be made, as outlined in this paper.