../

Writing a Quine the Dumb Way

2024-09-21

What is a Quine

A Quine is a very interesting type of program; when executed, its output is its own source code. This type of program is also known as a “self-replicating program” and possesses a fascinating self-referential characteristic similar to biological organisms.

The following two examples of Quines were copied from Wikipedia.

Example 1: Python

c = 'c = %r; print(c %% c)'; print(c % c)

Example 2: Java

public class Quine
{
  public static void main(String[] args)
  {
    char q = 34;      // Quotation mark character
    String[] l = {    // Array of source code
    "public class Quine",
    "{",
    "  public static void main(String[] args)",
    "  {",
    "    char q = 34;      // Quotation mark character",
    "    String[] l = {    // Array of source code",
    "    ",
    "    };",
    "    for (int i = 0; i < 6; i++)           // Print opening code",
    "        System.out.println(l[i]);",
    "    for (int i = 0; i < l.length; i++)    // Print string array",
    "        System.out.println(l[6] + q + l[i] + q + ',');",
    "    for (int i = 7; i < l.length; i++)    // Print this code",
    "        System.out.println(l[i]);",
    "  }",
    "}",
    };
    for (int i = 0; i < 6; i++)           // Print opening code
        System.out.println(l[i]);
    for (int i = 0; i < l.length; i++)    // Print string array
        System.out.println(l[6] + q + l[i] + q + ',');
    for (int i = 7; i < l.length; i++)    // Print this code
        System.out.println(l[i]);
  }
}

However, the code in these two examples isn’t very easy to understand. If I wanted to write a Quine in a third language, such as C++ or JavaScript, I would need to rethink the logic.

Therefore, I wanted to find a universal method to write a Quine in the most intuitive and easiest way possible. Although this type of Quine might not be the shortest or most efficient, it is the most intuitive and the easiest to apply to other languages.

The Quine Pattern

Observing the two Quines in the previous section, although they look very different, the process can be summarized into these steps:

  1. Header: There might be some imports or class definitions.
  2. A String: Since it acts somewhat like genetic material, I will refer to it here as DNA. Its content should contain the program code before this string, plus the program code after this string.
  3. Footer:
    1. Extract the header from the DNA string and print it.
    2. Print the DNA string itself.
    3. Extract the footer from the DNA string and print it.

Although it sounds simple, in common programming languages, representing strings requires extra escaping for newlines and quotation marks. For example, a newline in a string must be written as \n, and quotation marks need a backslash in front of them. These add unnecessary trouble, which is why the code above is hard to read.

However, these escape symbols are essentially a form of encoding. So, my idea is to simply encode the DNA string into the simplest hexadecimal format directly, and then decode it when it needs to be printed.

This way, no weird hacks are needed.

Furthermore, it doesn’t strictly have to be hexadecimal; Base64, Base32, or even using nucleotide letters like real DNA would work.

Nifty Little Tool

To encode the string into hexadecimal, I wrote a small Python tool that reads a string and outputs its hex form:

import sys

s = sys.stdin.read()
if s[-1] == '\n':
  s = s[:-1]
print(s.encode('utf-8').hex())

Save the above code as hexencode.py, and you can use it in the Shell like this:

cat input.txt | python3 hexencode.py

Start Coding

Python is relatively simple to write, so let’s start with Python.

First, define the DNA string. Since we don’t know the content of the “DNA” yet, let’s use emoji symbols as placeholders. Since this string contains two parts: the head and the tail, let’s assume the head is a cat and the tail is a snake:

dna = '🐱,🐍'

Then extract the head and tail:

head, tail = dna.split(',')

Since we plan to use hex encoding, we need to decode the head and tail from hex back to their original form here:

head = bytes.fromhex(head).decode('utf-8')
tail = bytes.fromhex(tail).decode('utf-8')

Finally, we join the head, DNA, and tail together and output them:

print(head + dna + tail)

The program now looks like this:

dna = '🐱,🐍'
head, tail = dna.split(',')
head = bytes.fromhex(head).decode('utf-8')
tail = bytes.fromhex(tail).decode('utf-8')
print(head + dna + tail)

The part before the cat 🐱 head is:

dna = '

Using the nifty little tool to encode it, we get:

646e61203d2027

Replace the 🐱 with this string.

Next, the part after the snake 🐍 tail is:

'
head, tail = dna.split(',')
head = bytes.fromhex(head).decode('utf-8')
tail = bytes.fromhex(tail).decode('utf-8')
print(head + dna + tail)

Using the nifty little tool to encode it, we get:

270a686561642c207461696c203d20646e612e73706c697428272c27290a68656164203d2062797465732e66726f6d6865782868656164292e6465636f646528277574662d3827290a7461696c203d2062797465732e66726f6d686578287461696c292e6465636f646528277574662d3827290a7072696e742868656164202b20646e61202b207461696c29

Replace the 🐍 with this string.

Finally, we get the following code:

dna = '646e61203d2027,270a686561642c207461696c203d20646e612e73706c697428272c27290a68656164203d2062797465732e66726f6d6865782868656164292e6465636f646528277574662d3827290a7461696c203d2062797465732e66726f6d686578287461696c292e6465636f646528277574662d3827290a7072696e742868656164202b20646e61202b207461696c29'
head, tail = dna.split(',')
head = bytes.fromhex(head).decode('utf-8')
tail = bytes.fromhex(tail).decode('utf-8')
print(head + dna + tail)

At this point, a Quine is complete. Run it and give it a try.

Extending to Other Languages

Let’s use C++ as an example here.

First, write a similar template:

#include <iostream>
#include <string>

void split(std::string input, std::string &first, std::string &second);
std::string hex_decode(std::string hex);

int main() {
  std::string dna = "🐱,🐍";

  std::string head, tail;
  split(dna, head, tail);
  head = hex_decode(head);
  tail = hex_decode(tail);

  std::cout << head << dna << tail << std::endl;
}

// The annoying thing is that the C++ standard library doesn't provide split and hex decode functions.
// I'm too lazy to write them myself, so I'll just let AI generate them:

void split(std::string input, std::string &first, std::string &second) {
    size_t commaPos = input.find(',');
    if (commaPos != std::string::npos) {
        first = input.substr(0, commaPos);
        second = input.substr(commaPos + 1);
    } else {
        first = input;
        second = "";
    }
}

std::string hex_decode(std::string input) {
    std::string output;
    output.reserve(input.size() / 2);
    for (size_t i = 0; i < input.size(); i += 2) {
        std::string byteString = input.substr(i, 2);
        char byte = static_cast<char>(std::strtol(byteString.c_str(), nullptr, 16));
        output.push_back(byte);
    }
    return output;
}

Then, take the program text before the 🐱 and the program text after the 🐍, encode them into hex using the nifty little tool, and replace them. We get:

#include <iostream>
#include <string>

void split(std::string input, std::string &first, std::string &second);
std::string hex_decode(std::string hex);

int main() {
  std::string dna = "23696e636c756465203c696f73747265616d3e0a23696e636c756465203c737472696e673e0a0a766f69642073706c6974287374643a3a737472696e6720696e7075742c207374643a3a737472696e67202666697273742c207374643a3a737472696e6720267365636f6e64293b0a7374643a3a737472696e67206865785f6465636f6465287374643a3a737472696e6720686578293b0a0a696e74206d61696e2829207b0a20207374643a3a737472696e6720646e61203d2022,223b0a20200a20207374643a3a737472696e6720686561642c207461696c3b0a202073706c697428646e612c20686561642c207461696c293b0a202068656164203d206865785f6465636f64652868656164293b0a20207461696c203d206865785f6465636f6465287461696c293b0a20200a20207374643a3a636f7574203c3c2068656164203c3c20646e61203c3c207461696c203c3c207374643a3a656e646c3b0a7d0a0a2f2f20e9babbe783a6e79a84e59cb0e696b9e698af432b2be6a087e58786e5ba93e9878ce99da2e4b88de68f90e4be9be58886e589b2e5928c3136e8bf9be588b6e8a7a3e7a081e587bde695b00a2f2f20e68891e4b99fe68792e5be97e58699e4ba86efbc8ce79bb4e68ea54149e7949fe68890e4b880e4b88b3a0a0a766f69642073706c6974287374643a3a737472696e6720696e7075742c207374643a3a737472696e67202666697273742c207374643a3a737472696e6720267365636f6e6429207b0a2020202073697a655f7420636f6d6d61506f73203d20696e7075742e66696e6428272c27293b0a2020202069662028636f6d6d61506f7320213d207374643a3a737472696e673a3a6e706f7329207b0a20202020202020206669727374203d20696e7075742e73756273747228302c20636f6d6d61506f73293b0a20202020202020207365636f6e64203d20696e7075742e73756273747228636f6d6d61506f73202b2031293b0a202020207d20656c7365207b0a20202020202020206669727374203d20696e7075743b0a20202020202020207365636f6e64203d2022223b0a202020207d0a7d0a0a7374643a3a737472696e67206865785f6465636f6465287374643a3a737472696e6720696e70757429207b0a202020207374643a3a737472696e67206f75747075743b0a202020206f75747075742e7265736572766528696e7075742e73697a652829202f2032293b0a20202020666f72202873697a655f742069203d20303b2069203c20696e7075742e73697a6528293b2069202b3d203229207b0a20202020202020207374643a3a737472696e672062797465537472696e67203d20696e7075742e73756273747228692c2032293b0a2020202020202020636861722062797465203d207374617469635f636173743c636861723e287374643a3a737472746f6c2862797465537472696e672e635f73747228292c206e756c6c7074722c20313629293b0a20202020202020206f75747075742e707573685f6261636b2862797465293b0a202020207d0a2020202072657475726e206f75747075743b0a7d";

  std::string head, tail;
  split(dna, head, tail);
  head = hex_decode(head);
  tail = hex_decode(tail);

  std::cout << head << dna << tail << std::endl;
}

// The annoying thing is that the C++ standard library doesn't provide split and hex decode functions.
// I'm too lazy to write them myself, so I'll just let AI generate them:

void split(std::string input, std::string &first, std::string &second) {
    size_t commaPos = input.find(',');
    if (commaPos != std::string::npos) {
        first = input.substr(0, commaPos);
        second = input.substr(commaPos + 1);
    } else {
        first = input;
        second = "";
    }
}

std::string hex_decode(std::string input) {
    std::string output;
    output.reserve(input.size() / 2);
    for (size_t i = 0; i < input.size(); i += 2) {
        std::string byteString = input.substr(i, 2);
        char byte = static_cast<char>(std::strtol(byteString.c_str(), nullptr, 16));
        output.push_back(byte);
    }
    return output;
}

Save this C++ source code file as quine.cpp, and then you can use the following command to verify if its output is identical to the original source code:

g++ quine.cpp && ./a.out | diff quine.cpp -

Multi-language Quine

There is another type of Quine where a program written in Language A outputs source code in Language B, and this Language B source code, when run, outputs the original Language A code, forming a cycle from A to B back to A. This process can even include more languages, such as A -> B -> C -> D -> E -> A.

There is a project on GitHub that constructs a cycle of over 100 languages.

After using our “dumb method” with hexadecimal encoding, since we no longer need to consider escape character issues, we can easily construct this for any number of languages.

Next, let’s combine the Python and C++ examples above to construct a C++ program that outputs a Python program, which when run, outputs the original C++ program.

First, write the Python version template, which is the same as above:

dna = '🐱,🐍'
head, tail = dna.split(',')
head = bytes.fromhex(head).decode('utf-8')
tail = bytes.fromhex(tail).decode('utf-8')
print(head + dna + tail)

Then, write a C++ program that outputs the above template:

#include <iostream>
#include <string>

std::string py =
    "dna = '🐱,🐍'\n"
    "head, tail = dna.split(',')\n"
    "head = bytes.fromhex(head).decode('utf-8')\n"
    "tail = bytes.fromhex(tail).decode('utf-8')\n"
    "print(head + dna + tail)\n";

int main() {
    std::cout << py;
    return 0;
}

Then, take the program text before the 🐱 and the program text after the 🐍 in this C++ code, encode them into hex using the nifty little tool, and replace them. We get:

#include <iostream>
#include <string>

std::string py =
    "dna = '23696e636c756465203c696f73747265616d3e0a23696e636c756465203c737472696e673e0a0a7374643a3a737472696e67207079203d0a2020202022646e61203d2027,275c6e220a2020202022686561642c207461696c203d20646e612e73706c697428272c27295c6e220a202020202268656164203d2062797465732e66726f6d6865782868656164292e6465636f646528277574662d3827295c6e220a20202020227461696c203d2062797465732e66726f6d686578287461696c292e6465636f646528277574662d3827295c6e220a20202020227072696e742868656164202b20646e61202b207461696c295c6e223b0a0a696e74206d61696e2829207b0a202020207374643a3a636f7574203c3c2070793b0a2020202072657475726e20303b0a7d'\n"
    "head, tail = dna.split(',')\n"
    "head = bytes.fromhex(head).decode('utf-8')\n"
    "tail = bytes.fromhex(tail).decode('utf-8')\n"
    "print(head + dna + tail)\n";

int main() {
    std::cout << py;
    return 0;
}

Save this code as quine.cpp, and verify the result:

g++ quine.cpp
./a.out > quine.py
python3 quine.py > quine2.cpp
diff quine.cpp quine2.cpp

Done.


Mistivia - https://mistivia.com