Write Quine The Hard Way
2024-09-21What Is Quine
A Quine is a fascinating type of program that, when executed, produces its own source code as the output. This kind of program is also known as a "self-replicating program" and possesses a curious self-referential trait which is similar to living creatures, making it quite intriguing.
The following two examples of Quine programs are copied from Wikipedia.
Example 1: Python
c = 'c = %r; print(c %% c)'; print(c % c)
Example 2: Java
public class Quine
{
public static void main(String[] args)
{
char q = 34; // Quotation mark character
String[] l = { // Array of source code
"public class Quine",
"{",
" public static void main(String[] args)",
" {",
" char q = 34; // Quotation mark character",
" String[] l = { // Array of source code",
" ",
" };",
" for (int i = 0; i < 6; i++) // Print opening code",
" System.out.println(l[i]);",
" for (int i = 0; i < l.length; i++) // Print string array",
" System.out.println(l[6] + q + l[i] + q + ',');",
" for (int i = 7; i < l.length; i++) // Print this code",
" System.out.println(l[i]);",
" }",
"}",
};
for (int i = 0; i < 6; i++) // Print opening code
System.out.println(l[i]);
for (int i = 0; i < l.length; i++) // Print string array
System.out.println(l[6] + q + l[i] + q + ',');
for (int i = 7; i < l.length; i++) // Print this code
System.out.println(l[i]);
}
}
However, the code in these two examples is not easy to understand. If I want to write a quine in another language, such as C++ or JavaScript, I will need to redraft it from scratch.
Therefore, I hope to find a universal way to write a quine in the most intuitive and easiest way possible. Although this quine may not be the shortest or the most efficient, it should be the most straightforward and easiest to understand, allowing for easy application to nearly all programming languages.
The Rules of Quines
Observing the two quines from the previous section, although they appear quite different, they can actually be summarized into several steps:
- At the beginning, there may be some imports or class definitions and similar things;
- A string, which acts somewhat like gene, so I will refer to it as DNA here. Its content should include the program before this string, along with the program after this string;
- The ending:
- Extract the beginning from the string DNA and print the beginning;
- Print this string itself;
- Extract the ending from the string DNA and print the ending.
Although it looks simple, in common programming languages, when expressing strings, newline characters and quotation marks require additional escaping. For example, newline characters in strings need to be written as \n, and quotation marks need to be preceded by a backslash. These aspects add unnecessary complexity, which is why the above code from Wikipedia is difficult to understand.
However, these escape characters are essentially a form of encoding. So, my approach is to simply encode the string DNA into the simplest hexadecimal form, and then decode it when it needs to be printed.
This way, there is no need for strange hacks.
Moreover, it doesn't necessarily have to be hexadecimal encoding; using base64, base32, or even encoding with nucleotide symbols like real DNA are all possible options.
A Handy Tool
"In order to encode the string into hexadecimal, I wrote a small Python tool that can read the string and output its hexadecimal form:"
import sys
s = sys.stdin.read()
if s[-1] == '\n':
s = s[:-1]
print(s.encode('utf-8').hex())
Save the above code as hexencode.py
, and then you can use it in shell like this:
cat input.txt | python3 hexencode.py
Start Writing Code
Python is relatively easy to write, so let's start with Python.
First, define the string DNA. Here, we don't know the content of "DNA", so we use emoji symbols as placeholders. Since this string contains two parts: the head and the tail, let's assume the head is a tiger and the tail is a snake:
dna = '🐱,🐍'
Then extract the head and tail:
head, tail = dna.split(',')
Since we intend to use hexadecimal encoding, we need to decode both the head and the tail from hexadecimal back to their original form:
head = bytes.fromhex(head).decode('utf-8')
tail = bytes.fromhex(tail).decode('utf-8')
Finally, we concatenate the head, DNA, and tail, and output them together:
print(head + dna + tail)
The current program is as follows:
dna = '🐱,🐍'
head, tail = dna.split(',')
head = bytes.fromhex(head).decode('utf-8')
tail = bytes.fromhex(tail).decode('utf-8')
print(head + dna + tail)
The part before the tiger's (🐱's) head is:
dna = '
Using the handy tool to encode, we get:
646e61203d2027
Replace the 🐱 with this string.
Next, the part after the snake 🐍 tail is:
'
head, tail = dna.split(',')
head = bytes.fromhex(head).decode('utf-8')
tail = bytes.fromhex(tail).decode('utf-8')
print(head + dna + tail)
Using the handy tool to encode, we get:
270a686561642c207461696c203d20646e612e73706c697428272c27290a68656164203d2062797465732e66726f6d6865782868656164292e6465636f646528277574662d3827290a7461696c203d2062797465732e66726f6d686578287461696c292e6465636f646528277574662d3827290a7072696e742868656164202b20646e61202b207461696c29
Replace the 🐍 with this string.
Finally, we get the following code:
dna = '646e61203d2027,270a686561642c207461696c203d20646e612e73706c697428272c27290a68656164203d2062797465732e66726f6d6865782868656164292e6465636f646528277574662d3827290a7461696c203d2062797465732e66726f6d686578287461696c292e6465636f646528277574662d3827290a7072696e742868656164202b20646e61202b207461696c29'
head, tail = dna.split(',')
head = bytes.fromhex(head).decode('utf-8')
tail = bytes.fromhex(tail).decode('utf-8')
print(head + dna + tail)
At this point, a quine is already completed. Give it a run and see!
Extend to Other Languages
Let's use C++ as an example.
First, write a similar template:
#include <iostream>
#include <string>
void split(std::string input, std::string &first, std::string &second);
std::string hex_decode(std::string hex);
int main() {
std::string dna = "🐱,🐍";
std::string head, tail;
split(dna, head, tail);
head = hex_decode(head);
tail = hex_decode(tail);
std::cout << head << dna << tail << std::endl;
}
void split(std::string input, std::string &first, std::string &second) {
size_t commaPos = input.find(',');
if (commaPos != std::string::npos) {
first = input.substr(0, commaPos);
second = input.substr(commaPos + 1);
} else {
first = input;
second = "";
}
}
std::string hex_decode(std::string input) {
std::string output;
output.reserve(input.size() / 2);
for (size_t i = 0; i < input.size(); i += 2) {
std::string byteString = input.substr(i, 2);
char byte = static_cast<char>(std::strtol(byteString.c_str(), nullptr, 16));
output.push_back(byte);
}
return output;
}
Then, encode the program text before the 🐱 and the program text after the 🐍 into hexadecimal using the handy tool, and replace it to get:
#include <iostream>
#include <string>
void split(std::string input, std::string &first, std::string &second);
std::string hex_decode(std::string hex);
int main() {
std::string dna = "23696e636c756465203c696f73747265616d3e0a23696e636c756465203c737472696e673e0a0a766f69642073706c6974287374643a3a737472696e6720696e7075742c207374643a3a737472696e67202666697273742c207374643a3a737472696e6720267365636f6e64293b0a7374643a3a737472696e67206865785f6465636f6465287374643a3a737472696e6720686578293b0a0a696e74206d61696e2829207b0a20207374643a3a737472696e6720646e61203d2022,223b0a20200a20207374643a3a737472696e6720686561642c207461696c3b0a202073706c697428646e612c20686561642c207461696c293b0a202068656164203d206865785f6465636f64652868656164293b0a20207461696c203d206865785f6465636f6465287461696c293b0a20200a20207374643a3a636f7574203c3c2068656164203c3c20646e61203c3c207461696c203c3c207374643a3a656e646c3b0a7d0a0a766f69642073706c6974287374643a3a737472696e6720696e7075742c207374643a3a737472696e67202666697273742c207374643a3a737472696e6720267365636f6e6429207b0a2020202073697a655f7420636f6d6d61506f73203d20696e7075742e66696e6428272c27293b0a2020202069662028636f6d6d61506f7320213d207374643a3a737472696e673a3a6e706f7329207b0a20202020202020206669727374203d20696e7075742e73756273747228302c20636f6d6d61506f73293b0a20202020202020207365636f6e64203d20696e7075742e73756273747228636f6d6d61506f73202b2031293b0a202020207d20656c7365207b0a20202020202020206669727374203d20696e7075743b0a20202020202020207365636f6e64203d2022223b0a202020207d0a7d0a0a7374643a3a737472696e67206865785f6465636f6465287374643a3a737472696e6720696e70757429207b0a202020207374643a3a737472696e67206f75747075743b0a202020206f75747075742e7265736572766528696e7075742e73697a652829202f2032293b0a20202020666f72202873697a655f742069203d20303b2069203c20696e7075742e73697a6528293b2069202b3d203229207b0a20202020202020207374643a3a737472696e672062797465537472696e67203d20696e7075742e73756273747228692c2032293b0a2020202020202020636861722062797465203d207374617469635f636173743c636861723e287374643a3a737472746f6c2862797465537472696e672e635f73747228292c206e756c6c7074722c20313629293b0a20202020202020206f75747075742e707573685f6261636b2862797465293b0a202020207d0a2020202072657475726e206f75747075743b0a7d";
std::string head, tail;
split(dna, head, tail);
head = hex_decode(head);
tail = hex_decode(tail);
std::cout << head << dna << tail << std::endl;
}
void split(std::string input, std::string &first, std::string &second) {
size_t commaPos = input.find(',');
if (commaPos != std::string::npos) {
first = input.substr(0, commaPos);
second = input.substr(commaPos + 1);
} else {
first = input;
second = "";
}
}
std::string hex_decode(std::string input) {
std::string output;
output.reserve(input.size() / 2);
for (size_t i = 0; i < input.size(); i += 2) {
std::string byteString = input.substr(i, 2);
char byte = static_cast<char>(std::strtol(byteString.c_str(), nullptr, 16));
output.push_back(byte);
}
return output;
}
Save this C++ source code file as quine.cpp, then use the following command to verify whether its output matches the original source code:
g++ quine.cpp && ./a.out | diff quine.cpp -
Ouroboros
Another type of quine works like this: A program written in language A outputs source code in language B, which executed will output the original code in language A, forming a loop from A to B and then back to A. This process can even include more languages, such as A to B to C to D to E and back to A.
There is a project on GitHub that has constructed a loop with more than 100 languages.
With this "simple method" of hexadecimal encoding, we don't need to worry about escape characters anymore, making it easy to construct quines relayed in any number of languages.
Next, let's combine Python and C++, and create an example where a C++ program outputs a Python program, which outputs the original C++ code.
First, write the Python version of the template:
dna = '🐱,🐍'
head, tail = dna.split(',')
head = bytes.fromhex(head).decode('utf-8')
tail = bytes.fromhex(tail).decode('utf-8')
print(head + dna + tail)
Then, write a C++ program that outputs the template above:
#include <iostream>
#include <string>
std::string py =
"dna = '🐱,🐍'\n"
"head, tail = dna.split(',')\n"
"head = bytes.fromhex(head).decode('utf-8')\n"
"tail = bytes.fromhex(tail).decode('utf-8')\n"
"print(head + dna + tail)\n";
int main() {
std::cout << py;
return 0;
}
Then, encode the program text before the 🐱 and after the 🐍 in this C++ code into hexadecimal using the handy tool, and replace it to get:
#include <iostream>
#include <string>
std::string py =
"dna = '23696e636c756465203c696f73747265616d3e0a23696e636c756465203c737472696e673e0a0a7374643a3a737472696e67207079203d0a2020202022646e61203d2027,275c6e220a2020202022686561642c207461696c203d20646e612e73706c697428272c27295c6e220a202020202268656164203d2062797465732e66726f6d6865782868656164292e6465636f646528277574662d3827295c6e220a20202020227461696c203d2062797465732e66726f6d686578287461696c292e6465636f646528277574662d3827295c6e220a20202020227072696e742868656164202b20646e61202b207461696c295c6e223b0a0a696e74206d61696e2829207b0a202020207374643a3a636f7574203c3c2070793b0a2020202072657475726e20303b0a7d'\n"
"head, tail = dna.split(',')\n"
"head = bytes.fromhex(head).decode('utf-8')\n"
"tail = bytes.fromhex(tail).decode('utf-8')\n"
"print(head + dna + tail)\n";
int main() {
std::cout << py;
return 0;
}
Save this code as quine.cpp and verify the result:
g++ quine.cpp
./a.out > quine.py
python3 quine.py > quine2.cpp
diff quine.cpp quine2.cpp
Job done.
Email: i (at) mistivia (dot) com