UsnigA
ca
ifn
Iirtg
le
tilenceo
tWe
triSe-fl
Modyfn
igm
I/ provnigPrograms
2013-01-27 modified 2013-03-04 | by KORY BECKER
This article is the first in a series of three. See also: Part 1, Part 2, Part 3.
Introduction
Is it possible for a computer program to write its own programs? Could human software developers be replaced
one day by the very computers that they master? Just like the farmer, the assembly line worker, and the
telephone operator, could software developers be next? While this kind of idea seems far-fetched, it may actually
be closer than we think. This article describes an experiment to produce an AI program, capable of developing its
own programs, using a genetic algorithm implementation with self-modifying and self-improving code.
+-+-+>-<[++++>+++++<+<>++]>[-[---.--[[-.++++[+++..].]]]]
hello
The above programming code was created by an artificial intelligence program, designed to write programs with
self-modifying and self-improving code. The program created the above result in 29 minutes. The programming
language is brainfuck. Why this programming language? Read on.
All code for the AI program is available at GitHub.
An AI Hobby
Its been somewhat of a hobby for me, dabbling with artificial intelligence programs in an attempt to write a
program that can, itself, write programs. Of course, Im not referring to programs that take subsets of program
instructions or code blocks and combine them together or otherwise optimize to produce an end result. Im
referring to starting from complete scratch, with the AI having absolutely no knowledge whatsoever about how to
program in the target language. The AI must learn, on its own, how to create a fully functioning program for a
specific purpose.
I initially began this venture in the late 1990s by attempting to create programs with simple if/then/else
statements to output programs in the BASIC programming language. This was a difficult task for a multitude of
reasons. First, using if/then/else conditionals to write a random program doesnt seem very intelligent at all.
Second, the number of computer instructions available in BASIC is far too great. Even more troublesome, some
of the instructions are downright dangerous (Shell(format c:))! I also attempted to generate programs using C,
C++, and a few other languages. However, this naive approach never produced a working child program.
Although this was due not just from using simple if/then/else statements, but also because the selected
programming language was intended to be usable by humans - not computers, and thus, far more complicated
for an AI to automate.
While the ultimate goal would be to produce a computer program capable of writing its own word processing
software, image editing tool, web browser, or disk defragmenter, I was more-so interested in a simple proof-ofconcept that demonstrated the idea was possible.
instruction is just 1 byte, its easy to map each gene to a programming code (note, 1 double = 8 bytes; still
equivalent to one slot in the array).
How It Works
The AI program works, as follows:
1. A genome consists of an array of doubles.
2. Each gene corresponds to an instruction in the brainf-ck programming language.
3. Start with a population of random genomes.
4. Decode each genome into a resulting program by converting each double into its corresponding instruction and
execute the program.
5. Get each programs fitness score, based upon the output it writes to the console (if any), and rank them.
6. Mate the best genomes together using roulette selection, crossover, and mutation to produce a new
generation.
7. Repeat the process with the new generation until the target fitness score is achieved.
Since the fitness method is the most computationally expensive part (it has to execute the program code for each
member in the population, which probably includes infinite loops and other nasty stuff), the AI program uses the
Parallel.ForEach method, found in .NET 4.5. In this manner, it can execute multiple fitness algorithms for
multiple genomes in the population, upon each generation. This allows the program to utilize maximal CPU
resources and take advantage of multiple CPU cores. The program also saves its state every 10,000
generations, in case the program or PC is shutdown, and it can continue searching from where it left off.
Of course, initially most generated programs wont even compile, let alone output text to the console. These are
simply discarded, favoring programs that at least output something; and further guided and evolved until the
output result is closer and closer to the desired solution.
>
<
Jump forward past the matching ] if the byte at the pointer is zero.
Jump backward to the matching [ unless the byte at the pointer is zero.
Results?
hi
The AI successfully wrote a program to output hi after 5,700 generations in about 1 minute. It produced the
following code:
+[+++++-+>++>++-++++++<<]>++.[+.]-.,-#>>]<]
While the above code contains parsing errors, such as non-matching brackets, our simulation interpreter
computes the result up until the program fails, so in the above case, the syntax error (which is later on in the
code, after a solution is found) doesnt impact the fitness.
You can try pasting the above code into a brainf-ck interpreter. Click Start Debugger, ignore the warnings, then
click Run To Breakpoint. Note the output.
If we trim off the excess code, we see the following syntactically-valid code:
+[+++++-+>++>++-++++++<<]>++.[+.]
You can view the screenshots below, taken while the program was running:
This is the history graph, plotting the fitness score over time. You can see how the AI learned how to program in
the target language and achieve the desired solution.
hello
The AI successfully wrote a program to output hello after 252,0000 generations in about 29 minutes. It
produced the following code:
+-+-+>-<[++++>+++++<+<>++]>[-[---.--[[-.++++[+++..].+]],]<-+<+,.+>[[.,],+<.+-<,--+.]],+]
[[[.+.,,+].-
During the generation process, the AI came pretty close to a solution, but a couple letters were bound to each
other, within a loop. The AI was able to overcome this by creating an inner-loop, within the problematic one, that
successfully output the correct character, and continued processing.
Hi!
The AI successfully wrote a program to output Hi! after 1,219,400 generations in about 2 hours and 7 minutes.
It produced the following code:
>-----------<++[[++>++<+][]>-.+[+++++++++++++++++++++++++++++><+++.<><-->>>+].]
This is actually one of my favorites. Run it and you can see why (click Start Debugger and Run to Breakpoint).
Its almost as if the computer knows what its doing. Its interesting to note how generating this program took quite
a bit longer than the prior two. This is likely due to the characters used, which include a capital letter and a
symbol. The other two examples used characters that are much closer in value in the ASCII system, which would
be easier for the AI to find.
reddit
The AI successfully wrote a program to output reddit after 195,000 generations in about 22 minutes. It
produced the following code:
+[+++++>++<]+>++++++[[++++++.-------------.-.-+.+++++.+++++],.,+,-+-,+>+.++<<+<><+]
-[-<>.]>+.-.+..]<
This one was a challenge. It may have been tricky due to its length, or possibly due to the location of the ds. The
AI would repeatedly get stuck within a local maximum. A local maximum is when a genetic algorithm finds the
best fitness that it can see within its current parameters, even though a better fitness may exist. The AI is unable
to get out of its hole and achieve the better fitness because doing so would require the fitness to drop before
increasing again, which is generally against the rules of a genetic algorithm.
I was able to resolve this issue by including additional diversity in the mutation function. Previously, the mutation
worked by simply altering a single instruction in the genome. Mutation was enhanced to include not just mutating
a single bit (replacement mutation), but also shifting the bits up (insertion mutation), and shifting down (deletion
mutation). This extra diversity allowed the AI to keep moving.
hello world
This was produced after 580,900 generations in about 2 hours. It produced the following code:
-><[>-<+++]->>++++[++++++++++++++++++<+]>.---.+-+++++++..+++.+>+<><+[+><><>+++++++++.+-<
-++++[++[.--------.+++.------],.-----],,.>.+<<<[><<>]<++>+.[]+[.[+]],[[.]..,<]]],]<+]],[
]],[[+[,+[]-<.,.],--+]-++-[,<.+-<[-<]-><>-]-<>+[-,-[<.>][--+<>+<><++<><-,,-,[,[.>]]<-+[.
>+[<.<],]<<<>].[--+[<<->--],-+>]-,[,
If you trim off the excess, the actual code that prints the text is much shorter:
-><[>-<+++]->>++++[++++++++++++++++++<+]>.---.+-+++++++..+++.+>+<><+[+><><>+++++++++.+-<
-++++[++[.--------.+++.------],.-----]]
+[>+<+++]+>------------.+<+++++++++++++++++++++++++++++++.>++++++++++++++++++++++++++++++
++++.+++.+++++++.-----------------.--<.>--.+++++++++++..---<.>-.+++++++++++++.--------.--
----------.+++++++++++++.+++++.]-+,.-<[><[[[[[[<-[+[>[<-<-[+[,]-,,-[>[+[-<-,.<]]]<-+<[]+<
.,,>[<,<[.]>[<,<<-.]><,,,--[.--.-
If you trim off the excess, the actual code that prints the text is shorter:
+[>+<+++]+>------------.+<+++++++++++++++++++++++++++++++.>++++++++++++++++++++++++++++++
++++.+++.+++++++.-----------------.--<.>--.+++++++++++..---<.>-.+++++++++++++.--------.--
----------.+++++++++++++.+++++.
In the above run, the AI was supplied with a starting program array size of 300 instructions (ie., 300 bytes, or
rather 2,400 bytes since 1 double = 8 bytes). The full length of the program code was not needed by the AI. It
was able to write the program with just 209 instructions.
Note, this solution took 10 hours to complete. However, keep in mind, with an AI program doing the
programming, rather than a human, the amount of time required to complete a program is of less concern. The AI
can simply be left running in the background, while the human works on other tasks. I also expect the
computation time to be dramatically reduced as computers get faster in the coming years.
The Future
This experiment was a proof-of-concept that an AI program could develop its own computer programs that
perform a specific task. In that regard, it was a success. The AI was able to start with absolutely no knowledge of
the target programming language and successfully learn how to generate a valid computer program, which when
executed, solved a particular task.
As with all genetic algorithms, there was work involved with designing the fitness function. The fitness function is
equivalent to describing to the AI what youre looking for. In this way, creating the fitness function itself, is a bit
like programming (on behalf of the human). If it were possible for the AI to develop its own fitness function, this
would be a step forward. In the meantime, it may still be possible to grow this project to create more complex
child programs, such as those that take user input and compute results.
Ten years ago, this program would not have succeeded within any reasonable amount of time. Five years ago,
this program would likely have taken days to weeks, possibly even longer. Today, the execution took a matter of
minutes. Tomorrow, the program might run in milliseconds. As computers grow faster and more powerful, larger
and larger search spaces can be computed. I cant wait.
If youve found this interesting and want to learn more, download the full source code at GitHub or contact Kory
Becker. Read my tutorial on using genetic algorithms and neural networks in C# .NET. Program executables in
this article are compiled with Brainfuck.NET Compilator.
Update 1/5/2015
Want to see what else the AI can do? Me too! Read more in the follow-up articles: Pushing the Limits of SelfProgramming Artificial Intelligence and Self-Programming Artificial Intelligence Learns to Use Functions.
variety of domains for clients in both the business and consumer sectors.
FOLLOW:
NEWER
OLDER
RECENTS
PROGRAMMING JAVASCRIPT
BUILDING YOUR FIRST REACT JAVASCRIPT APP
2016-09-19
CATEGORIES
Programming (99)
.NET (70)
Javascript (10)
LINKS
Share
Archive