Linear Algebra in R, create and invert matrices

This is a simple test case of creating a random 2×2 matrix, performing its inversion, and multiplying them. We will also use MS Excel to check our computations
Create a 2×2 matrix

> mNP <- matrix(rnorm(4),nrow=2,ncol=2)

This command used the matrix() function . This first argument, the datasets, uses rnorm() function to generate 4 random variates, and the next two arguments explain that those are to be ordered in a matrix of 2 rows by 2 columns. Now let’s display the matrix

> mNP
[,1] [,2]
[1,] 0.5644179 -0.4694577
[2,] 0.7707571 0.1500823

Next, we use solve() function to invert the matrix, and display the output

> mNP_inv <- solve(mNP)
> mNP_inv
[,1] [,2]
[1,] 0.3360953 1.051306
[2,] -1.7260381 1.263961

Finally, we use the %*% operator to to the algebraic multiplication of the matrices and check if we arrive to I

> mNP %*% mNP_inv
[,1] [,2]
[1,] 1 0
[2,] 0 1

If you would want to do the last operation in Excel, as a check, then let’s suppose that the first matrix sits in fields A1,B1,A2,B2 and the second one sits in fields D1,E1,D2,E2. Their multiplication product would be:
Top left, top right
=A1*D1+B1*D2 =A1*E1+B1*E2
Bottom left, bottom right
=A2*D1+B2*D2 =A2*E1+B2*E2

What I learned by casually studying Python for ten days

How difficult is Python

Looking back to what my generation considers “programming”, a term later changed to “development”, we see a gradual shift of programming languages, from tools that help us talk to a machine in its native language (which is the instruction set of its processor(s)), to a toolset that comes ever closer to understanding more business terms, and needs less delving into the binary reality of a processor. What has remained the same? The need to implement certain functionality, whether it be displaying a scatter plot on screen, or calculating a standard deviation, or handling files, anything that may be required.

From that point of view, programming has become “easier”, while development has become “harder”. This is not a controversial statement. It is easier by now, to create an array that will hold data, for example. Less and less complexity has to be dealt with, whether this is working with files, memory, or parallelizing. At the same time, the plethora of available tools, and the complexity of the modern IT ecosystem, along with the simplicity of tools to a degree, center the developer to direct a fully sized orchestra. Knowledge of procedural programming, of the libraries/tools relevant to the business, intimacy and “instinct” for the data at hand, are all considered necessary assets.

This is where Python, and similar solutions, like R, stand. Working with files, memory, writing your program logic is not very unlikely from using a previous generation language. Many features, like the argv[0] to get the executable path/name, is very similar to ANSI C. Where their true strength, and complexity lies, is for one to be skilled with the data and the available functions at hand. This may take much more time to learn, than just going through files and printing the infamous “Hello World”. Back to the original question, and keeping in mind the title of this article: Python is easy to “program”, yet can be infinitely hard to “develop”

Who should learn Python

Python is an analytical programming tool commonly associated with Machine Learning, Artificial Intelligence and Big Data. If engineering in those domains sounds interesting to you, it is probably time to start. While it may or may not be part of every solution, it is a very common tool to use, along with R

Simple things to get you started

First, install the environment. The packages are located here: . If there is any request to my site i can do a step by step installation

If you plan to use graphics, choose a website that offers graphics functions, and become familiar. I started to using this I seem to think I discovered through my Google feed

How to get inspired

Site offers online courses, to get one start using the language (i am not affiliated to them, but i did find their content useful). It is highly recommended, to really work on the exercises than scroll through the code. it took me a while to realize that tabs can indicate nested operations, for example. So going through the simplest examples and working your way up, is highly recommended.

Also, if I can create a plot from a CSV, and host the result online, then so can you! Have a look at my article here: Make a bar-chart from a CSV in Python

Make a bar-chart from a CSV in Python CSV bar plot bar chart using a CSV

Test Case was implemented in Python 3.6.5 running on a Ubuntu Linux 18.04 64-bit virtual machine. In order to carry out this test-case you will need to create an account in and create the credentials file on the host you will be running Python from. All instructions are on their web site

Step 1:
Suppose a CSV which has a first row we want to define as the X-Axis of our plot, and two further rows which we want as the data in the Y-Axis. It could be something like this:

~$ cat /home/nikolas/categories.csv
SciFi-Fantasy , 31.550787 , 68.449219
Spirituality , 83.411890 , 16.588112
Home-Improvement , 47.082787 , 52.917217
Gaming , 2.256584 , 97.743423
Mountain-Bike-Touring , 40.905171 , 59.094826
Korean-Culture , 71.040140 , 28.959862
Health-Safety , 32.872467 , 67.127533
Religion , 37.452973 , 62.547028
Fashion , 98.597282 , 1.402729

Step 2:
Load the CSV into a data frame using library Pandas using function read_csv and display the data of each row, using function iloc:

Python 3.6.5 (default, Apr 1 2018, 05:46:30)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> mycatg=pd.read_csv('/home/nikolas/categories.csv',sep=',',header=None)
>>> mycatg.iloc[:,0]
0 SciFi-Fantasy
1 Spirituality
2 Home-Improvement
3 Gaming
4 Mountain-Bike-Touring
5 Korean-Culture
6 Health-Safety
7 Religion
8 Fashion
Name: 0, dtype: object
>>> mycatg.iloc[:,1]
0 31.550787
1 83.411890
2 47.082787
3 2.256584
4 40.905171
5 71.040140
6 32.872467
7 37.452973
8 98.597282
Name: 1, dtype: float64
>>> mycatg.iloc[:,2]
0 68.449219
1 16.588112
2 52.917217
3 97.743423
4 59.094826
5 28.959862
6 67.127533
7 62.547028
8 1.402729
Name: 2, dtype: float64

Step 3:

Include the libraries

import plotly.plotly as py
import plotly.graph_objs as go

then define the axis data, using the above way to address the rows in the data frame. so:

yaxis1 = go.Bar(
name='Category A'
yaxis2 = go.Bar(
name='Category B'

data = [yaxis1, yaxis2]
layout = go.Layout(

then perform the plot itself.

fig = go.Figure(data=data, layout=layout)
py.iplot(fig, filename='barplot in,')

Step 4:

This is it. The plot is created as a SVG file in In this case the graph was created here:

How to install a Linux Virtual Machine


You need two pieces of software to have Linux machine going under your laptop: The virtualization software, and an image of the operating system you plan to stage in your Virtual Machine (VM). Trying to keep this article as short as possible, any issues with hardware requirements and licensing for production use are left outside.

Step 1

Identify the combination of virtualization software and Linux platform you need. This demo will install Oracle VM VirtualBox and deploy UBUNTU 18.04 64 bit, on a laptop running Windows 10 Home

Step 2

Download the VirtualBox binaries here:

VirtualBox Download
Select your platform (“Windows hosts” in our case), choose Save, let it download and then run the executable.
Oracle VM installation
Go through the installation steps. As a start, you can leave everything default. Keep disk space in mind

Download the LINUX distribution:

Download the Linux release
Review the release notes, and observe the system requirements. Since you will running the OS as a VM, all requirements add to the requirements of your host operating system and your virtualization software
Ubuntu Desktop download
Choose to save the file. Ideally, keep its download path simple (ie c:\VMs\UBU1804). Avoid long paths or complicated names

Once done, it is time to launch the VirtualBox and create the virtual machine. Find the “Oracle VM VirtualBox” icon on your desktop or program group. It looks like this:

Double click to launch

When it has launched, click on the left-most icon in the toolbar (“New”) then provide a name for your VM, the type and the version. This should much the Linux distro you have downloaded.

Creating the VM
type a name of your choice, then put in Type and Version lists the version you have already downloaded
Memory selection
This entry should match the requirements of the distro (see above) but also the capacity of your machine.

In the next three pages of the install process you need to create a Virtual Hard Disk. The simplest choice is to select VHD (Virtual Hard Disk)/Fixed size. The distro notes should point to a minimum disk requirement (see above screenshot)

Minimum disk should be the requirement of the Linux distribution of choice, or larger
…this might take so time, be patient!
When this is finished, you will see your newly created machine in the list. Click on the green “Start” button to launch it for the first time

…The first time you launch your VM, it will ask
for a start-up disk. This will be the Linux distribution file that we downloaded:

The start-up disk for your Linux Virtual Machine is the distribution you have already downloaded. Notice the .ISO extension of the file

From then on, there will be a Welcome dialogue (similar to any startup installation such those that come in smartphones).

When running this dialogue within the Oracle VirtualBox, the “Try Ubuntu” option allows you to keep using the .ISO file, the “Install Ubuntu” option will use the Virtual Hard Disk we created previously and install the operating system there. In both cases, our laptop’s existing OS will not be affected
Ready VM
The result of your effort: a fully functional LINUX machine, with Internet access and device connectivity (USB headsets, mouse) just as your “real” laptop

My take, as a starter, on “Why Big Data”?

Whether in human or machine intelligence, one can think of two main categories of solution to problems. The first, is the kind of problem that has a deterministic, rule-based solution. The second, is a problem that a decision, or outcome, cannot be derived by a mathematical formula or a correlation of factors that are both fairly constant in volume and with equal weight to each other. How are those problems solved? By data. lots, and lots of data. While how we go from X to Y (whether Y is a category, a yes/no answer or a prediction) may not be known, we have sufficient sets of (X,Y) to feel confident that we can apply different models and decide which fits the data-set the closest, with the least amount of error or uncertainty.

Current technology, both in hardware processing and in software solutions, has allowed us to design systems that can store and analyze such datasets, in a manner much more economic and scalable than before. Big Data are anything that encompasses those datasets. The data itself, the technology and software solutions to store them in a manner that is efficient at scale, the procedures to unify different data sources and generalize or prepare the data for decision making, the intuition of the Data Scientists that understand the nature of the data, and the choice of tools to be used for a certain application, are all parts of the Big Data revolution.

It would be interesting to discuss with comments from your side, what kind of problem you would categorize in which of the two cases (or possibly a different one). Thank you in advance

How to install Java JDK 1.8 on Windows 10 Home

The Java Development Kit and Runtime Environment are distributed by Oracle. “Java 8” refers to release 1.8. Java 10, the latest (June 2018) release is the first i have seen in the new naming convention. The simplest way to put it, is that a Java installation allows for Java programs to run on our Windows 10 computer. There are two main flavors of Java, the JDK, which allows development of Java, or the JRE which allows the execution of JARs. In this case we will do the JDK, as it will allow us in the future to build our SPARK installation

To find the executable for the install, search online for “java jdk 1.8 download” and find a the correct page inside It will look something like this:

JAVA SE Development kit (JDK)
Dialog to select the version for your operating system. The download link will become active after we have hit Accept for the licence agreement

in the setup dialogs, select the file path of the Java installation. I prefer to not install the JAVA Home under “Program Files”, as i feel uncertain whether all sorts of code will be able to process the space in the filename, or it will have to be defined as PROGRA%1. Also the Java path will have to be added to the machine’s file PATH, another reason to keep the tree short.

Once the installation wizard has finished, we need to create environment variable JAVA_HOME, and add the path to the java.exe program to the computer’s PATH system variable.

The way to do this, in Windows 10, is to click the mouse on the magnifying glass button, right next of the Start menu. This is this:

Getting access to environment variables
Click on the magnifier glass, and type “environment”

Windows will offer “Edit the system environment variables (Control Panel) so we select that.Click on “New…” to add the user variable for your account and then append the %JAVA_HOME%\bin to the PATH

Accessing Windows 10 Environment Variables
Click on “Environment Variables…”
it is important, to set the user variables correct for the user who will be accessing the JAVA Home

A machine can have more than one paths to JRE/JDK. Changing JAVA_HOME and PATH is all it takes to “activate” a certain version of Java. To confirm the installation was successful, and indeed we are using the version we need, we need to open a windows shell (again we can use the magnifier and type “cmd“) and then type java -version. This will both validate that the correct version of Java is active, but also that the system file path has access to the Java executable

Java installation finish
we ran this command from the user’s home path, and verified that Java 1.8 is available

A word of caution: if Oracle software is already installed, it has placed its Java path to the beginning of the PATH string.

Use an online spreadsheet to feed data to R

Instead of creating the data frame programmatically, why not use an existing spreadsheet, available online. A simple HTTP file server, free and Open Source is HFS:
Once HFS is installed, uploading the spreadsheet is as simple as dragging and dropping an existing spreadsheet. Here is how it looks on my machine:


Once the file is in place, we can get its URL directly by right-clicking on the file, and selecting “Copy URL address – Ctrl+C”

The first time it will be needed to read the spreadsheet into R, some more software is needed

Download and install a PERL home on the machine. For Windows, Strawberry Perl can be used (link: )

Next, the package to read the spreadsheet needs to be installed into R. The package name is “gdata” so we perform


at this point, we are ready to load directly the spreadsheet into a new data frame

dfXLBAN <- read.xls("")
trying URL ''
Content type 'application/octet-stream' length 8786 bytes
downloaded 8786 bytes
lets display the data frame:

Date Name Bananas
1 1 Radka 2
2 2 Radka 3
3 3 Radka 4
4 4 Radka 5
5 5 Radka 1
6 6 Radka 1
7 7 Radka 7
8 1 Natalie 6
9 2 Natalie 2
10 3 Natalie 3
11 4 Natalie 4
12 5 Natalie 2
13 6 Natalie 1
14 7 Natalie 8

Since the spreadsheet contained one more row, we need (in order to have the dataframe exactly the same as our example with inline data) to discard the Date Column:
dfB2 <- data.frame(dfXLBAN$Name,dfXLBAN$Bananas)

dfXLBAN.Name dfXLBAN.Bananas
1 Radka 2
2 Radka 3
3 Radka 4
4 Radka 5
5 Radka 1
6 Radka 1
7 Radka 7
8 Natalie 6
9 Natalie 2
10 Natalie 3
11 Natalie 4
12 Natalie 2
13 Natalie 1
14 Natalie 8

the plot function will be exactly the same as the previous post