GPT, or Generative Pre-trained Transformer, is a large language model (LLM) that generates human-like text. LLMs are instances of foundation models pre-trained on large amounts of unlabeled and self-supervised data, learning patterns to produce adaptable output. Specifically applied to text and text-like data such as code, LLMs are trained on vast datasets, potentially reaching petabytes in size, with models like GPT-3 utilizing 45 terabytes of data and 175 billion parameters. The three key components of an LLM are data, architecture, and training. The enormous text data is processed through the transformer architecture, which is a type of neural network designed to manage sequences of data while understanding context by relating each word to the others in a sentence. This allows for a comprehensive grasp of sentence structure and meaning. During the training phase, the model learns to predict the next word in a sequence, refining its internal parameters with every attempt to better align predictions with the actual text. This iterative learning process enhances the model's ability to generate coherent and relevant text.