Computer Programming

Computers can only understand binary language (sequences of instructions made of 1s and 0s) called machine code or machine language.

To command a computer you need to speak its language.
Not all the computers “speak the same way”, there are different technical implementations and representation of instructions.

The instructions that a machine can understand is called the instruction set (range of instructions that a CPU can execute).

The Central Processing Unit (CPU), also called processor, is the electronic component that executes instructions.
It is one of the most important parts of any computer.
Every CPU has a set of built-in commands (the instruction set), these “basic” operations are hardwired into the CPU.
CPUs only understand those operations encoded in binary code, the low-level machine code language (native code).
Instructions are sequentially mixed together to make what is known a program.

In computer science, an Instruction Set Architecture (ISA) is an abstract model of a computer.
A device that executes instructions described by an ISA, such as a CPU, is called an implementation.

The only way you can interact with the hardware is through the instruction set of the processor.
The ISA specifies what the processor is capable of doing.

It is basically the interface between the hardware and the software.
It defines the supported data types, the registers, how the hardware manages main memory, key features (such as the memory consistency, addressing modes, virtual memory), which instructions a microprocessor can execute, and the input/output model of a family of implementations of the ISA.

It can be viewed as a “programmer’s manual”, the technical description of how it works and what you can do with it.

Each operation to perform from an instruction set is identified by a binary code known as an opcode (Operation Code).
The opcode is the first part of an instruction (the first bits).
It’s a unique code that identifies a specific operation.

On traditional architectures, an instruction includes an opcode that specifies the operation to perform AND zero or more operand specifiers, which may be registers, memory addresses, or literal data the operation will use or manipulate.

In Very Long Instruction Word (VLIW) architectures, multiple simultaneous opcodes and operands are specified in a single instruction.

The number of operands is one of the factors that may give an indication about the performance of the instruction set.

A word is the fixed-sized piece of data handled as a unit by the processor.

The number of bits in a word (word size) is an important characteristic of any specific processor design or computer architecture, it implies how many operations the computer is capable in a single word.

Computer Architecture

The von Neumann architecture is a computer architecture based on a 1945 description by John von Neumann, and by others, in the First Draft of a Report on the EDVAC (Electronic Discrete Variable Automatic Computer) one of the earliest electronic computers.

The report is an incomplete 101-page document written by hand by John von Neumann.
It contains the first published description of the logical design of a computer using the stored-program concept, which has controversially come to be known as the von Neumann architecture.

The document describes a design architecture for an electronic digital computer with these components:
– A Processing Unit with both an Arithmetic Logic Unit and processor registers
– A Control Unit that includes an Instruction Register and a Program Counter
– Memory that stores data and instructions
– External mass storage
– Input and output mechanisms

The von Neumann architecture is not perfect, an instruction fetch and a data operation cannot occur at the same time since they share a common bus. This is referred to as the von Neumann bottleneck, which limits the performance of the corresponding system.

A stored-program digital computer keeps both program instructions and data in read–write, random-access memory (RAM)

The vast majority of modern computers use the same memory for both data and program instructions, but have caches between the CPU and memory, and, for the caches closest to the CPU, have separate caches for instructions and data, so that most instruction and data fetches use separate buses (split cache architecture)

If based on the von Neumann architecture, processors contain at least a Control Unit (CU), an Arithmetic Logic Unit (ALU), and processor registers.

Every modern processor includes very small super-fast memory banks, called registers.
The registers are the fastest accessible memory location for the CPU and sit on the top of the memory hierarchy.
They can be read and written at high speed since they are internal to the CPU.
They are much smaller in size than local memory (size of a word: usually 64 or 32 bits) and are used to store machine instructions, memory addresses, and certain other values.

Data is loaded from the main memory to the registers (via the CPU cache) after which it undergoes various arithmetic operations.

The manipulated data is then written back to the memory via the CPU cache.

CPU’s cache memory is dedicated to hold (inside or close to the CPU) the most commonly used memory words, in order to avoid slower accesses to main memory (RAM).

Most CPUs have a hierarchy of multiple cache levels, with specific instruction and data caches at Level 1.
The L1 cache or first-level cache is the closest to the CPU, making it the type of cache with the highest speed and lowest latency of the entire cache hierarchy.

Instruction cache: used to speed up executable instruction fetch
Data cache: used to speed up data fetch and store

Instruction Cycle

A program is a sequence of instructions in memory.

The CPU executes operations through a cycle known as “Fetch, Decode, and Execute”.

The most important registers (Control Unit) are :
Program Counter (PC), which points (holds the memory address) to the next instruction to be fetched for execution
Instruction Register (IR), which holds the instruction currently being executed

1. Fetch the instruction from memory into the Instruction Register
2. Change the Program Counter register to point to the next instruction
3. Decode the instruction
      Determine the type of instruction (opcode)
      If the instruction operand is a word in memory: 
         Determine where it is located (memory address)
         Retrieve the data from memory into a register
4. Execute the instruction (ALU)
5. Go to step 1 to begin executing the next instruction

The operation code tells the ALU what operation to perform, the operands are used in the operation

Technology Evolution

Since the invention of the transistor (electronic switch) in 1947 by John Bardeen, Walter Brattain, William Shockley
AND the Silicon Integrated Circuit in 1958 by Jack Kilby and Robert Noyce
The computer industry development has never stopped, advances in technology has revolutionized computers, leading to smaller, faster, better products at lower prices.

Manufacturers have packed more and more transistors per chip every year, meaning larger memories and more powerful processors.

Latest processors contains billions of transistors.

Moore’s law is the observation that the number of transistors in an Integrated Circuit doubles about every two years.
It is an observation and projection of a historical trend since 1965.

While Moore’s law will probably continue to be proven for some years, it has a limit:
First, you cannot shrink a transistor size more than you can.
Second, you have problems of power consumption and heat dissipation.

Smaller transistors make it possible to run at higher clock frequencies, but also requires using a higher voltage.
That is, going faster (clock speed) means having more heat to get rid of.

The solution is the multi-core processor architecture: two identical CPUs on a chip consume less power than one CPU at twice the speed.
That is one of the reasons why processors have more and more cores and larger caches rather than higher clock speeds.

Taking advantage of these multiprocessors poses great challenges to programmers, it requires knowledge to explicitly control/manage parallel execution.

CPU Core

Before multi-core processor architecture, computers only had one CPU: the processor could only perform one instruction at a time.
A CPU core is a physical hardware processor with all the architecture that comes with it.
We now have multiple processors grouped inside one Integrated Circuit (single chip), running independently: This is real hardware parallelism (as long as the Operating System uses it).
The design is far more advanced, it requires a different architecture to orchestrate all this (controllers, buses, memory access, etc.).

This technology has allowed Machine Virtualization (standard practice in enterprise IT architecture), which is the foundation of Cloud Computing.

It allows the hardware elements of a single computer (processors, memory, storage, and more) to be divided into multiple virtual computers, commonly called Virtual Machines (VM). Each VM runs its own Operating System (OS) and behaves like an independent computer, even though it is running on just a portion of the actual underlying computer hardware.

The more cores there are in a CPU, the more efficient it is and the more you can do.

CPU Thread

Simultaneous MultiThreading (SMT) is a technique for improving the overall efficiency of CPUs with hardware multithreading.
SMT allows to better use the resources provided by modern processor architectures.

When SMT is operational, the Operating System sees the processor as having “double the cores” (Logical Processors).
Two logical cores can work through tasks more efficiently than a native single-threaded core, by taking advantage of idle time when the core would formerly be waiting for other tasks to complete.
It improves CPU throughput (usage optimization).

CPU Clock Speed

The clock speed measures the number of cycles your CPU executes per second, measured in GHz (gigahertz).
A cycle is the basic unit that measures a CPU’s speed.
During each cycle, billions of transistors within the processor open and close.

A CPU with a clock speed of 3.4 GHz executes 3.4 billion cycles per second. (Older CPUs had speeds measured in MegaHertz, or millions of cycles per second)

Sometimes, multiple instructions are completed in a single clock cycle.
In other cases, one instruction might be handled over multiple clock cycles.

How Do We Communicate With The Processor ?

Unless you are a supernatural alien coming from another galaxy, we use programming languages
(created by skillful and talented people)

Programming languages are often categorized as low-level, mid-level or high-level depending on
“how close you are from the hardware”.

Low-Level Programming Languages

low-level programming languages are hardware-dependent and machine-centered (tied to the hardware, providing operations matching the hardware’s capabilities).

low-level programs execute faster than high-level programs, with a small memory footprint.

Assembly language (asm), is any low-level programming language with a very strong correspondence between the instructions in the language and the processor’s instruction set.

Assembly is very close to machine code but is “more readable” and uses mnemonics.
You need to have a strong technical knowledge to use it (interaction with the hardware), Assembly is not easy.

The statements are made up of opcodes and operands (processor registers, memory addresses, etc.), which are translated into machine code (instructions that the processor understands).

One line of assembly equals one line of machine code.

Assembly code is converted into executable machine code by a utility program referred to as an assembler.

Each assembly language is specific to a particular computer architecture, it is not portable to a different type of architecture.

Mid-Level, High-Level Programming Languages

Most programming is done using high-level compiled or interpreted languages, which are easier for humans to understand, write, debug and do not require knowledge of the the system (hardware) running the program.

These languages need to be compiled (translated into system-specific machine code) by a compiler, or run through other system-specific compiled programs.

High-level programming languages are generally hardware-independent and problem-centered (providing operations supporting general problem-solving).
Programmers can move hardware-independent code from one computer to another fairly easily.

Delroy A. Brinkerhoff, Ph.D :

The C programming language is deemed a mid-level language because it allows programmers more access to the hardware than other higher-level languages.

We can locate C++ at two different places in this spectrum.

First, it represents a mid-level language because it retains C’s access to the hardware.
But second, it also represents a high-level language because it supports object-orientation, a problem-centered approach to programming.

The combination of high- and mid-level features makes C++ a popular choice for writing Operating Systems, games and large industrial applications.

Computers can’t directly execute programs written in high-level languages,
so there must be some way of translating a program written in a high-level language into machine language.

Two kinds of computer programs perform the necessary translation: compilers and interpreters.

A compiler is a program that translates other programs written in a high-level programming language like C or C++ into machine code or machine language.

Some languages such as Java and C# take a different route.
Compilers for these languages translate the high-level source code into an intermediate form (a representation that lies somewhere between the high-level and true machine code) called virtual machine code.

The virtual machine code then becomes the input to another program called an interpreter or Virtual Machine (VM), a program that simulates a hardware CPU. Note here that VM is a software component dedicated to run virtual machine code (runtime environment for applications), it is different from Hardware Virtualization.

Other languages, such as Javascript and Perl, are completely interpreted.
These languages don’t use compilers at all.
The interpreter reads the source code, written in the high-level language, and interprets the instructions one at a time.
That is, the interpreter itself carries out each instruction in the program.

Compiling and running a program written in a language that produces machine code
The compiler reads the C/C++ source code from a file that ends with .c or .cpp and produces a machine code file that is executable.
See C/C++ Compiler Operations

Compiling and running a program written in a language that produces virtual machine code
Languages like Java and C# are hybrid languages because they use both a compiler and a Virtual Machine.
They first compile the source code to virtual machine code, that is, to machine code for a virtual computer (a computer that doesn’t exist but is simulated by another computer).
After compiling the source code, a Virtual Machine (VM) executes the code by simulating the actions of a real computer.
The Operating System loads the VM into main memory and runs it.
It is the VM that reads and runs the virtual machine code.

Running a program written in a purely interpreted language
Languages like Javascript and Perl do not compile the source code at all.
Like the hybrid languages (Java and C#), the Operating System run the interpreter or VM.
The interpreter reads the source code file and executes the program one statement at a time without translating the whole program to any other language.
Web browsers incorporate interpreters for some languages (like Javascript) while the Operating System runs the interpreters for other languages (like Perl) as application programs.

High-Level Programming Languages Advantages and Disadvantages

Each approach to running a program written in a high-level programming language has advantages and disadvantages.

Programs written in fully compiled languages (e.g., C and C++) execute faster than programs written in partially compiled languages (e.g., Java and C#) and run much faster than programs written in fully interpreted languages (e.g., Javascript and Perl).

To give some idea of the difference in performance, let’s say that a C++ program, once compiled, executes in time 1.
A program in a hybrid language (compiled and interpreted) will generally run in time 3 to 10.
In a purely interpreted language, the same program runs in a time of about 100.

Contemporary versions of the Java and C# VMs use a Just In Time (JIT) interpreter that compiles some of the virtual code to machine code while processing it.
JIT processors reduce run time to about 1.5 times that of purely compiled language systems.

“How does Java compare in terms of speed to C or C++ or C# or Python? The answer depends greatly on the type of application you’re running. No benchmark is perfect, but The Computer Language Benchmarks Game is a good starting point.”

On the other hand, once we compile a program written in purely compiled languages, we can’t easily move the resulting executable machine code to a different platform (e.g., you can’t run a Windows program to an Apple computer).

In contrast, we can easily move programs we write in interpreted languages between different computers.

Interpreted programs are portable because they run on a VM or interpreter.
From the hardware and Operating System’s perspective, the interpreter is the running program.
Interpreters and VMs are written in purely compiled languages, so they are not portable, but the programs that they run are.
Once we install the interpreter on a system, we can move interpretable programs to the system and run them without further processing.

Execution speed is not the only criteria to take into consideration, there is also the speed/ease of development.

here is an article about Python speed.