Choose Your Database Wisely and my IMDB project

I am currently learning a lot about Python. Python is one of a suite of tools that are free that can be used for Data Analysis. In courses they teach you how to interact with a database and they show you the least effort possible. In the courses they hype the fact that SQLite is native and built-in to the core libraries of Python.

In Python, the database of the least possible effort is SQLite. So, armed with my new knowledge I charged forward using Python on my desktop against a SQLite database for millions of records.

I was so “in love” with the idea of all these free tools, that I forgot about the free tools for the desktop that I had used before – Microsoft SQL Server Express. As I did estimates of how long it would take to process 20.4 million records I was a little disappointed in SQLite, but unsure if that was causing the performance problems. Perhaps it was the disk use (100%) of the program.

There really wasn’t much to the program, though, so if the problem was the file access throttling the hard drive I was fairly stuck.

Eventually, even after processing millions of records I decided to install my old standby – Microsoft SQL Server Express – which like SQLite is free. These are two vastly different databases. SQLite is what it says – lite – and can be used on a wide variety of portable devices. SQL Server has a specific product for smaller or compact devices, but SQL Server Express is basically full SQL Server minus some features.

The people at Microsoft have done thousands of hours of programming to make the SQL Server engine perform. Quickly, I was able to rip through a million records to the SQL Server database, using virtually the same code I had been using against SQLite. Python was not the problem.

Falling in love with a way of doing things – in the tool to do things, was the problem. Now, instead of multiple computers with multiple instances of the program running I have one program running on one computer. It has been a relatively short period of time (I can be more precise later – with the datetime stamps) it has already processed 4.5 million records.

Now, instead of considering long lengths of time just to process the actors.list.txt file, I can consider getting through all the files in a relatively short period of time and focus on analyzing the data and figuring out master data structures.

Just to give a picture, against SQLite it took 1:25 seconds to process 1000 records from the actors.list.txt file to the SQLite database. It is too hard to even time 1000 records processing against Microsoft SQL Server Express – it takes 10 seconds to process 10,000 records.

I get no money from Microsoft or anyone else (no ads on my site at present). This is just the cold hard facts. Free tools are great and I would use them to teach people. It is the path of least resistance to getting someone started on a database. Microsoft SQL Server express involves installs and it take a serious commitment to get through the install process; however, the time invested in that is returned nearly immediately given the speed of storing information in the database.

Don’t fall in love with your tool!

IMDB Data Search Project

There are some things that you do almost every day that are really annoying. So you are watching show A. You see an actor. You know that face. You know that voice.

Finally, it occurs to you that you know what show you have seen that person. You know show B.

So, you hop on to IMDB (Internet Movie Database) and you look up show B and go to the cast – then whole cast because the actor isn’t one of the top names. You still don’t know their name so you start clicking on cast members and eventually – ahah – you have found them. Then you go scan their filmography to see if they are in show A.

It takes a bunch of time. I know, it isn’t exactly  a world crushing problem. I do watch shows on TV. And for some reason it really bothers me when I know I’ve seen an actor’s face before and I can’t figure it out. Sometimes days later I’ll figure it out – because I didn’t look it up or whatever.

Despite the name – IMDB isn’t all that great to search. It is good at single searches – show me this show, show me everything this actor was in, etc. But if you have to do something equivalent to a SQL Join to find out about x and y at the same time – it fails you.

How to fix this? I don’t necessarily want to replace IMDB. What I would like to do is be able to find out information quickly and easily with specialized queries. IMDB isn’t really optimized for searching, either. There are lots of graphics, stuff that all takes time to load. I know “The Internet is fast”, but it isn’t always. And even when it is it still takes time to load all the images they are in love with at IMDB. Google’s page is a page optimized for querying.

So, that’s my charter. Nice and simple. Make IMDB searchable and really useful for finding out information. And maybe do some data analysis on the side.

What are the steps to this project?

  1. Set up an account at IMDB
  2. Find the data (if available – and fortunately for us it is available)
  3. Download the data – using account information as necessary
  4. Expand the Data – the data source are compressed .gz files
  5. Data cleansing – and unfortunately the data isn’t in that great a form
  6. Load the data into a SQL Database – I’m a huge fan of SQL for data manipulation
  7. Develop an understanding of the data and the types of questions people ask – for example – who are the people that overlap show A and show B.
  8. Develop queries to deliver the information
  9. Develop web pages to run the queries and a web page to guide the users to the right place
  10. Done – well easier said that done, certainly.

This article will cover 1, 2, 3, and 4. It involves a surprising small number of lines of code.

Now, of course, everything changes. It this era things change quickly so some things that I have in this article might not be exactly as I have here, but if you are adept you can figure your way through it.

#1 Setup an account on IMDB.

Now you can use Facebook and other things, but I recommend setting up the IMDB account – this way you can type in the login data if the FTP site requests it.

imdb_initial_page

The circled area on the top right of the screenshot above of the IMDB site is the login area. Click on “Other Sign in Options”.

imdb_createaccount

The circled button to “Create a New Account”.

imdb_createaccount2

I think at this point in time – we all know what to do on a create account page like this – even if we are not programmers!

When you have entered your data press the Create your IMDb account and it should return you back to the start page for IMDB. You do not need a “Pro” membership and you don’t need to spend any money.

#2 Find the Data

Now, in this case I did some google searches before I located the data. Some people will write web crawlers and rip through the pages of the website. In this case; however, the IMDB organization does not permit you to do this. They can’t stop you easily, necessarily; however, you should respect the rules of your data source. In any case, they may not allow you to rip through all their web pages and gather data (which is a pain in the butt), but they do provide you with the data.

Go to this ((((link)))). While technically this is an FTP site, which brings you here:

imdb_ftp

#3 Download the data

Now, if there were hundreds of files perhaps we would develop a program to rip the files locally, but give the relatively small  numbers – around 48 files – that a little manually downloading isn’t going to hurt. Even repeated once a month it isn’t much of an ordeal and we can always create an automation program to download the files later. Perhaps a later update will download the files directly from the ftp site; however, this could present some difficulties as I manually downloaded the files the site would periodically ask for credentials, but it wouldn’t ask for credentials every time. That makes it hard to code. At this point hard to code and providing little benefit means there is no reason to code this.

Once loaded locally in your downloads folder – collect these files in to a folder called IMDB – in the case of the assumptions in my program C:\IMDB

I’m in Windows 10 – so the screen shots and code may be somewhat specific to Windows. It shouldn’t be too bad. Below, is the original .gz compressed files in File Explorer and cmd.

imdb_foldercontents

(NOTE: Only a partial listing in the Windows File Explorer.)

imdb_cmd_directory

(NOTE: The flexibility of using the command line in Windows. The dir /w command inside of a directory/folder allows you to see more files at one time. You can do this in Windows File Explorer as well; however, it isn’t as space efficient (shown below))

imdb_foldercontents2

In order to get to the command line in Windows 10, Press the Windows button (yes, it is really just the start button renamed) and type cmd in the “Ask me anything” entry. Then navigate to the C:\IMDB directory by typing cd.. at the command line until you are at the C:\ and then type cd IMDB

This isn’t really intended to teach the command line. I’m writing a couple of things because it is handy – and we will need it later in order to run our Python program. So, how to navigate on the Command Line and Execute a Python program will be upcoming.

#4 Expand the Data

Now we come to the heart of this posting – the programming. The language I have chosen is Python. I’m sure there are lots of reasons to choose Python, but I’m choosing it for one main reason – compact and easy code.

I’m using Python 2.7 for this article. This code may work in Python 3 and it may not. I’m not familiar with Python 3 at present. You can easily find how to install Python 2.7 on the web. This link is a good place to start. When you install it – you must add it to your path if you are in Windows. In my experience the install program doesn’t seem to add Python’s install directory to the path properly and I have had to manually update it. Adding a path is a relatively simple task, so I’ll leave it to you. Remember, frustration is often a part of the application development process.

The development platform is also a question. In Windows 10 I am suggesting you choose Notepad++. In a course that I took in Coursera this was the recommendation, and I fought it for a while by using Editplus 3, but in the end Notepad++ has worked better for me.

You can easily search for Notepad++ on the web and install it – it is a free application. Here is a link to help you get going – if it still works by the time you use this article.

Once you have Notepad++ installed, open it, create a new file notepadplusplusnewfileicon and then add the following source code (the source code will be explained):

NOTES ABOUT PYTHON CODE (when copy pasting or editing):

  1. indenting by spaces indicates that the code is part of a coding block – or scope
  2. # sign is a comment character
  3. In your Notepad++ editor – make sure indents are set to a number of spaces or your scope will not work properly
#Build a Better IMDB - AJ 20161126
#need to process files - original .gz files from IMDB
# 0) May make code to automate downloading of the files and do a difference
# 1) Get list of files in the IMDB director
# 2) Create a Date Folder, Original File Folder under the Date Folder, Unzipped File Folder
# 3) Main Process
# a) Unzip File - and save copy in the Unzipped Folder
# b) Copy Original File to Original File Folder
# c) Loop to next file
import gzip
import os
from os import listdir
from os import path
from os import makedirs
from os.path import isfile, join
from shutil import copyfile

def getfiles(homefolder):
    files = [f for f in listdir(homefolder) if isfile(join(homefolder, f))]
    return files

def decompress(filein, fileout, fileend):
    try:
        with gzip.open(filein, 'rb') as f:
        file_content = f.read()
        #print file_content
        f.close()
        print fileout
        outf = open(fileout, 'w')
        outf.write(file_content)
        outf.close()
        print "filein: " + filein
        print "fileout: " + fileend
        copyfile(filein, fileend)
        os.remove(filein)
    except OSError, e:
        print ("Error %s - %s" % (e.filename, e.strerror))
    except:
        pass

#main program logic/data
infolder = "C:\IMDB\\"
zipfolder = "C:\IMDB\Unzip\\"
archivefolder = "C:\IMDB\Original\\"
fs = getfiles(infolder)

#setup the folders
if not path.exists(zipfolder): makedirs(zipfolder)
if not path.exists(archivefolder): makedirs(archivefolder)
# Loop through files and dump to screen
for file in fs:
    decompress(infolder + file, zipfolder + file.replace(".gz",".txt"), archivefolder + file)

Save the file in Notepadd++ by using thenotepadpluplussavefileiconicon.

notepadplusplussavefiledialog

In the save file dialog, click the create folder icon (circled in red above on the upper right corner of the save file dialog). A new folder will show up – type in IMDBCode (or whatever you want to call it) and I’m using the root C drive for my IMDB data folder and the IMDBCode folder.

notepadplusplussavefiledialog2

Once you have your folder created click in to it – and name the code you copied in as IMDB_InitialDataProcess.py . It is very important to include the file extension – as the Notepad++ editor will recognize the file as Python code and properly color the text as required (which is always nice).

OK, now I’m going to do a couple of screen shots to show you how the code should look before going in to detail about the code. This is because of the whole indent thing indicating scope. You could type the code (or copy it from above) and it might look perfect, and it won’t work.

 

imdb_initialdata_sourcecode1

OK. So, the code above starts with the relatively easy to understand to something that is going to take a bit of work. First off, the green code are comments – with the leading #. It isn’t a pound sign. It isn’t sharp. It is just a number sign – shift+3.

Line 10 starts a block of code of imports. If you are familiar with .NET programming languages or Java then you understand what an import does. If not, well, here is a brief explanation.

All code depend on libraries. We write code on the shoulders of giants. And we use the giants code in our own code. Basically, someone writes a whole suite of functionality and then we use it. There is something that I would call base functionality. The base functionality – you don’t have to do any import statement. Other libraries that extend the functionality of the language – for a specific area.

So, we import gzip – to gain the functionality to compress and uncompress files. Now, we can be more precise in what we import. In line 12 we write “from os import listdir” – this allows us to use listdir directly without preceeding it with the library and sublibrary, etc . This makes for concise lines of code.

On line 15 we have “from os.path import isfile, join” – which is taking two pieces of functionality of the library at once. Of course, this means we could have reduced lines 12,13, and 14 to a single line. We could even have potentially reduced the import lines even more since line 15 are additional pieces of the os library.

Libraries can make programming languages extremely versatile.

The next segment of code starts on line 18. “def getfiles(homefolder):” – This is the definition of a function. The function takes in 1 parameter – homefolder. It returns a list of files in the folder.

But, sadly, the code I found on the web while beautiful and concise is well, not very easy to read.

 

files = [f for f in listdir(homefolder) if isfile(join(homefolder, f))]

(the above is a single line of code with line wrap)

What the above line says is: Give me the directory listing for this folder. Then check if these are files and return only files – not directories. This is much more compressed code than I would write. I’ll get to this level after a while – it just takes comfort and more experience writing code in Python. I basically use this code as a black box as I know what it does and I know it works. Below is how I would have written it:

imdb_getfiles2

You can see there are a whole lot more lines of code in this function. For every line of code a programmer writes there is typically at least one error (depending on the complexity of the line of code). In the above code, the first line (after the function def-inition) says – get me the list of objects in the directory parameter. Then I define an empty list. For every object in the directory list – check if it is a file. If it is a file – add it to the files list. Finally, return the files list.

imdb_initialdata_sourcecode2

Now, the nature of Python is that you define a lot of functionality in a program before you use it. Previously we defined a function to return all the files in a directory. This function is more complex. It performs these tasks:

  1. Unizip the file
  2. Read the unzipped file and close the unzipped file
  3. Write the unzipped file to the output folder and close it
  4. Copy the original file to the processed folder
  5. Delete the file from the source folder

These pieces of functionality are wrapped in a try..except clause. When you deal with files – errors can happen. If you don’t catch these errors your program will crash. In this case we are just printing an error statement – or just passing. If somehow I downloaded a file (or a program loaded a file) that wasn’t a zipped file then we would just skip it – and we would skip moving the file and deleting it – so that we could see what files failed to process. The with statement is great for file processing – as when the block of code ends – it automatically closes the file. In this case, the gzip operation opens the zipped file and makes the stream available to the program as the variable “f”.

Line 25 we read the entire (uncompressed) file contents in to a variable.  Line 27 closes the compressed file. Now, even though we have the with statement, we want to copy the original file and then delete it in the original location – and we can’t do that if the file is open.

There are a few print statements to print information for those times you are watching the program to make sure that it operating correctly. Line 29 – opens a new file for writing – and then on 30 we write the uncompressed file contents to the new file and finally on line 31 we close this new file. Line 34 copies the original file to the end or processed folder. Finally, line 35 deletes the file that was processed.

imdb_initialdata_sourcecode3

Above is our final sections of code – if I were in Java I would call this main function. Lines 41, 42, and 43 set up the folders that we are going to use. An interesting note here is that we are only doubling up the last slash, where in a language like Java we would probably double up all of them. It reduces the length of the line and makes it clearer so I appreciate the lack of additional escape characters.

Line 44 – calls the function defined earlier called getfiles – and sends it the infolder parameter.

Line 45 checks if the zipfolder exists – and if it doesn’t then it creates it.

Line 46 is virtually identical to line 45, except that it does it for the archivefolder – where we will store the files after they have been processed.

Finally, we have a standard for loop – for every element in the fs (list of files) it runs the decompress function (which we discussed earlier).

Then, well, we are done with the program. It has served the purpose for which it was designed. This code could be adapted in many ways for many different situations – for ETL tool type functionality.

That concludes the first four steps. Step 5 will be repeated over and over again for 48 files, so it may be covered in many posts. Still, if there is standardization in the file formatting (even if it is standard not pleasant) after solving the first file there should be a lot of code reuse.

Update and final run instructions

To run the code you need to save your file in the windows command prompt. Click on “Ask me anything” on your Windows 10 task bar and type cmd – you can either press [enter] or double clicking on the top of you results.

commandpromptopen

cmd_runcodeimdb

When you open the command prompt (as shown above) it starts you in the Users directory that is currently running the computer. Type in cd.. [enter] to move back one level in the command prompt. This moves you to the Users area. Type in cd.. [enter] and this places you at the root directory. Next we will move to the directory that we saved our code. Type cd IMDBCode [enter].

Now, if the path is set correctly in your computer for Python 2.7 – you can run the Python code by typing in the file name – IMDB_InitialDataProcess.py [enter] and the program will begin to run. Assuming the files are in the C:\IMDB folder – they will be uncompressed, saved as text files and moved/deleted.

Just to show the expected output:

dataprocess_endproduct1

This folder (IMDB) used to contain 48 files – now it contains two directories. All the files have been deleted. If there were any files remaining those files would be files that failed to be processed.

dataprocess_endproduct2

Above – we can see the contents of the “Original” folder under IMDB – contains 48 files – with a disk space of 1.71 GB.

dataprocess_endproduct3

Now, finally, we can see that the “Unzip” folder contains 48 files – with a disk space of 6.89GB.

 

Juxtaposition

I like the word juxtaposition: “The fact of two things being seen or placed close together with contrasting effect”.

It is also the title of a book that I read a long time ago by Piers Anthony. Sadly, I don’t remember much about the book at the moment.

Facebook is an interesting ‘place’ for juxtapositions. Three things are in juxtaposition for me at the moment.

  1. I created a spreadsheet of my articles and number of words and discovered that I have written 81157 words (before this article) – which according to baseline information is enough to be a novel.
  2. Scrolling through Facebook I hit on an article about 1 of the two focuses of this blog – AI – and it claims that there will be 3 billion of them in the next 5 years. At least the Facebook text on Futurism claimed that there would be 3 billion AI in the next 5 years. I think this is even a bit optimistic even for me – and I am fairly optimistic about AI and the future.
  3. Traumatised [English Spelling] family lay suicide dad to rest with wife and three children who he murdered” – the other focus is the inhumanity of humans.

I’ve written 115 (a couple not published) articles on a variety of topics. I’m going to shift my focus from writing about technology and the future, and writing about the depths of humanity – to writing my novel “The Morrigan”.

Based on information in item #2 – I may need to hurry up if I want to finish before some of the things in it become a reality.

Finally, #3 makes me sad. The grandmother on the mother’s side forgives her son-in-law for killing her daughter and her three grandchildren. So, he ends up buried with the family. There is a clash here of what people think. The article noted Women and Equalities Secretary Angela Rayner said “Hawe is no victim. We have to call out murder and domestic violence He was selfish and committed a despicable crime.”

I think Angela Rayner has the right of it. He was selfish in taking others during his suicide. Not for being a suicidal person. People that commit suicide are not selfish in taking their lives. It is often a feeling of complete loss, utter failure, and that there can never be a return to the way things should be that pushes the hand to kill one’s self. I know. I feel these things in my now 2 years of unemployment. I often feel that somehow I have become unemployable and I am fearing the future in which the money runs out and there is no way to pay the expenses. Nothing I do seems to have an effect on my employment situation.

This man; however, does not just commit suicide no matter his feelings about never getting to where he needs in life. He decided to rob his grandchildren and wife of their chance to attain happiness in life at the same time deciding that he no longer had a chance for happiness.

This is a strongly anti-woman action to take. He assumes he is the only one to lead the house. He assumes that they would never find their way without him. People find a way. Or they don’t. But if you kill yourself then you forfeit the right to knowledge of the future. Unless, of course, in a fit of spite, you kill the people you care about so that you know their future would never be any better than your future – as a corpse.

Asshole.

This ties back to the 3 billion AI in the next 5 years. If any of them are codops (Computerized Doppelgangers) – copies of human minds – what prevents them from doing horrible things to other codops, living humans, or anything else? We will need to be very careful about who attains the ability to have codops – even more careful than vetting police officers in HR as I discussed in the article previous to this.

One could well imagine thousands of copies of family members going about separate AI codop lives – and the father – perhaps even just one copy – one codop of the father decided that there is no hope for the future and turning terminator-like – and destroying all the codops of his family as well as the physical wife and children. Thousands of lives lost.

It is possible; however, that codop lives might never be lost forever. That through diligent back ups of systems – once you are a codop – you are a class, and individual codops are objects instantiated from that class. You might destroy the codop objects, but the codop class can always be found and re-instatiated with a loss of the more recent memories.

Lately, I haven’t been able to sleep – something that I had been always able to do with ease. I could just flip a switch in my brain and go do sleep. There has been some research recently confirming that there is such a thing in the brain that changes status and takes us from wakefulness to sleep.

I think more and more I am aware that money – and therefore time, is running out. But I have no idea what to do to make money – certainly not the money I was making before becoming unemployed. Two of my friends have indicated that people might be reluctant to hire me not just because of the employment gap; however, because on my resume I have a lot of knowledge and might be considered a threat.

 

Programming in Java Minecraft Forge – makePyramids

So, I’ve been exploring and having fun with Minecraft, with Forge and Java – in making new functionality.  As I have mentioned earlier there is a great book on the topic, “Minecraft Modding with Forge” by Arun Gupta and Aditya Gupta.

In order to perform mods to Minecraft you need to install Java, Eclipse, Forge and this is covered completely in the book which is only $15 on Kindle.  So, I’m not going to cover all that here.  Also, you can mostly figure it out yourself and some quick searches of the internet.

This post targets new programmers to Java, Minecraft, and Eclipse.

What I want to cover is creating a mod in Minecraft with a custom command that has several parameters.  This is also covered in the book, but I have my own slant on the process, possible errors and the code used.

Once you have Eclipse hooked up to Forge create a new package (if you have to make something up as this is not validated – com.yoururl.forge.mods – or whatever!).  Right-click on your package and create a main class.  Step by step shown below.

To create a package you have to right-click on the folder src/main/java.

createpackage

Then enter in the name of your package and click on Finish.

nameJavaPackage

 

2016-02-28_16-57-49

Enter in code as shown above adding to the original package statement and creation of the Main class.  For now, we’ll ignore any errors as you type in the code as they should be resolved after everything has been typed and you pressed control+shift+o. You don’t have to type in the import statements if you are properly connected to Forge when you type in your actual executable statements you will hit control-shift-o and they will be added automatically.

Right-click on your package and click on New->Class from the context menu.

New_Class

In the dialog box enter in the name for your class.  Classes always start with a capital letter (by convention although you can violate this) as shown below – our class will be called MakePyramid.  Then press Finish and Eclipse will create the class file and fill in the base class stub.

EnterClassName

The code will look something like this.  Just a reminder that the package at the top will be the package that you created.

StubMainClass

Of course, this doesn’t do much at the moment except exist as a file.  So we’ll just have to make it do something!

I’m not going to teach anything about OOP.  It is a big topic and I can’t treat it properly in a simple example as this.  A brief explanation is that there are objects out there that perform similar functions.  These functions have a signature of methods and that signature is called an Interface.  If all objects that perform similar functions have the same interface it makes it easier for programmers to write code and know what to expect, where to expect it, and it tells the programming some of what functionality is the bare minimum in order for this all to work.

Our MakePyramid class is going to implement the ICommand interface.  We would use this for any command that we are going to create.  In order to implement an interface you simply add “implements InterfaceName” after the class name separated by a space.  In this case, we will type implements ICommand after a MakePyramid as shown below.

AddImplementsToClass_Part1

That red squiggly line – that is an error being detected by the Eclipse IDE (Integrated Development Environment).  So perhaps you might be thinking – “great example man, barely even 1 line of code and we have an error.  What am I supposed to do now?

Well, fortunately, this error will be among one of the easiest to clear.  Later (in other programming) there will be syntax errors that befuddle you and the last hurdle – logic errors that look like they should work, but don’t.  Followed by code that looks hopelessly broken, but hey look at that it is working.  Finally, when you are a true coder you’ll say things like, “uh oh” which will scare the nontechnical and “I’m surprised that ever worked.”

To fix our error we’ll just press control+shift+o and the appropriate import will be added to the code as we did in the Main class.  At this point I’ll take a little time to explain it in a little more detail.

We build things on the shoulders of giants – or at least other coders, anyway.  When we write our code we heavily rely on classes written by other people.  Some are a part of Java, others are a part of libraries created for specific purposes.  This interface that we are implementing belongs to a library that isn’t referenced (or imported) in to the code.  So, it looks like an error.

Now, sometimes the Eclipse environment has a little difficulty.  In my case control+shift+o works almost all the time, but not this time.  There are other ways to fix the problem.  You can know the library where your interface (or class) comes from and manually type in the import statement.  Also, you can hover over the error and Eclipse will offer you options on how to fix the problem.

FixICommandInterface

Click on the top line to import the correct library.  Note that if you were creating your own new ICommand interface that is one of the options below as well as multiple other libraries that contain Command in the name of an interface.  So, if we were using something else we would need to be careful which interface library we imported or changed our code.  It is also good to note that if you forgot the interface name but knew part of it you could simply type it and see if Eclipse can find it for you.

Here is what the code looks like after you have clicked on the top link in the help pop-up.

AddImplementsToClass_Part2

Behold, we have fixed one error only to discover another error!  Man the writer of this article writes some bad code.

In reality we are just going through the process of adding an interface.  Earlier I indicated that an interface was a signature.  This signature is composed of methods that are exposed publicly of specific data types.  We haven’t added them yet so our class has an error – it says it implements an interface of ICommand, but it doesn’t contain the interfaces parts – the methods.

In order to fix this we will again hover over the error and click on the link that we need to have performed in the code.  We could potentially look up the interface and type in the code for yourself, but programming and coding is about making things easier to do and taking the path of least resistance or least code.

AddImplementsToClass_Part3

Click on the link “Add unimplemented methods” and Eclipse will automatically add the stub of the interfaces signature in to your code.

ICommandInterfacestub

Above you see the code that was generated. Automatically, the correct imports are added to the file for any libraries referenced as the data types in the individual methods.

This class and these methods are where we will add our code.  First, let’s run Minecraft from Eclipse and see what happens.

PressStartToRunMinecraft

To start Minecraft – shown above in Eclipse press the start button.  It will take a little while to start up so just be a little patient.  If this is the first time going in to Minecraft from Eclipse you will need to create a new world and make it a creative world.  A lot of code won’t run if you are not in creative mode (mostly on purpose as you can prevent non-creative mode access to your commands.

Minecraft_EnterCommand

Above you see a screen shot of Minecraft with the command entered.  To enter commands in Minecraft you press the “/” and then type in the command.

Minecraft_UnknownCommand

There is another error!  Well, not really an error.  Even though we have created the class we still have a ways to go before the command will work.  We have some code to enter in our new class MakePyramid and one line of code to add in to the Main class.  Close out of Minecraft and let’s get coding!

While we implemented the ICommand interface we didn’t put code in any of the methods.  In addition, even when this class MakePyramid is working – it doesn’t mean we have to put code in every method of the class that were created by the interface.  Some of the methods it is fine just for it to exist.

We have to add one more method – this is called a constructor and it runs every time a class (the code we are creating) it instantiated and made in to an object during run time.  A class can have many instances of objects.  To create our constructor we are going to add a variable declaration to the class – this is a variable that is available to all the methods inside the class and depending on how we declare it – can be seen outside of the class or not.  We are also going to add another variable that we will discuss later and use later.  Both of these variables will be visible only internally to our class and usable by all methods of our class.

AddClassScopeVariables

The code in the red box is what you have to add.  private – means that the scope of these variables is private to this class.  private is always in reference to the containing object which is the class in this case.  Two additional imports are on the top of the screen.  You may have to hit control-o to get them to appear.  Since I copied and pasted these variables in to place Eclipse automatically added the appropriate import statement.

The first variable is of a List type – and specifically an ArrayList which allows us to add and remove elements from the list.  In Minecraft, you can call commands in multiple ways. and sometimes using the long name makepyramid is too much of a bother to the end user and so they might want to just type mp.  The problem with this scenario is that likely players might want to install multiple mods.  If there are two commands that are mp – what runs when the user types it in?  Or does it cause Minecraft to crash?  So, personally, I would leave off implementing a smaller name for the command as long as your command isn’t crazy long.

Our next step is to add a line of code using the aliases variable in the getCommandName method.  Don’t get too hung up on the names of functions – they can and do change over time.  Functionality is deprecated between versions of Forge or Java and replaced with hopefully better functionality or at least more descriptive method names.

2016-02-28_19-05-54

This is our constructor – and a single line of code to add “makepyramid” string to the aliases list.  If we wanted a second name we would repeat this line of code and change the string being added to the aliases list.  Now we can do a lot more in the constructor of a class – just because this one only has one line of code and all it does is add a value to a list – doesn’t mean that is all you can do.

In order of the interface adding the methods is the compareTo method.  We don’t need to do anything in this method.

The next method is getCommandName. We change this method – and have it return “makepyramid” – the name of the command we want the end user to type.  Code shown below:

getCommandName

You will notice that there is an @Override above the method declaration.  This means that we are override the functionality already provided in the interface, if there is any.

The above method is a function.  It is a function because it returns a value.  The programmer that has written code calling getCommandAliases is expecting to get a list of the ways to call this command.  The return statement (the only statement in this method) returns aliases which is of a data type matching List.  If the data type didn’t match then the return statement would cause a syntax error.  For example: if I had it return “makepyramid” as a string the syntax error would be – could not convert type string to type pyramid / type mismatch.  Technically, the constructor method we used earlier is a subroutine – this is a method that does some work but does not return any data to the code calling that method – which in the case of a constructor is the new keyword.  When using the new keyword in code you are creating a new instance of a class – which causes the constructor of the class (if present) to be run.

The next method we are going to write code in is the getCommandUsage function.  This function is called when the end user enters the command /help makepyramid or generically /help commandname .

 

getCommandUsage

A couple of notes here from actually writing code.  If you mess up certain methods your /help will stop working in Minecraft.  An error is some other sections of code will result in a not authorized message appearing which is misleading when in fact you just have a problem in your code.

In the above getCommandUsage – we return a string – that tells the user the command “makepyramid” and then items inside of less-than and greater-than symbols.  These are the parameters that the end user enters that tell the command more specifically what to do.  In this case we want the user to type in a name of a material (or number) then the coordinates that they want the pyramid to start from (x,y,z) and then finally the base size of the pyramid – the biggest layer of the pyramid is the bottom and this also indicates how many layers up the pyramid will go.

The next step is to change the code in the getCommandAliases method to return the aliases list that we created in the class and added the value “makepyramid” to in the constructor MakePyramid method.  In addition to changing the return statement I typically remove the TODO comment from the method.  This means that I know for sure that I changed code in this method of the interface.  Code shown below:

getCommandAliases

Now comes the interesting method.  There is a lot of code  (well compared to what we have been doing).  The code is nested.  It isn’t really that hard to understand.

As part of the ICommand interface the processCommand method was created.  This is the workhorse of our class.  When our command is executed by the user typing it in – this is the code that runs.

First I am going to show you the code, then I will do a walk through.

processCommand_method

From top to bottom:  The first line of code is the @Override – which indicates to override any existing functionality in the original base class.  Next is the declaration of our method – which is a subroutine because it does not return any data (the void tells you that).  The scope is public – which means that any other code can see this method.  It throws an exception – which then would need to be dealt with in any code calling this function.  Unfortunately, I think the only catch they have is to say “There was an unexpected error”, instead of forwarding on the actual error message.

I created a custom method called validArgs().  It returns a boolean (true/false) and passed to it are the args passed to the command.  This if statement says “If (The opposite of the return of validArgs) return.  So, validArgs returns true if all the arguments are valid.  Then the ! sign reverses it and changes that value to false.  Only in the case that the if statement is true will return be called.  The definition of the validArgs function will be discussed in detail later.

You don’t need braces after an if statement in order for code to execute.  You only need them if more than 1 line of code is to be executed.  In this case, the only line of code to be executed is return; and so the minimalist approach to coding would be to eliminate the braces.

The next section of code is a series of variable declarations and assignments.  Based on the values the user input these values are assigned or through some small mathematical operation.

Of interest here is that there is more than 1 variable declaration per line.  Perhaps in other learning techniques it is only directly taught that each line of code is equal to one line of executable code.  This isn’t true.  Since lines of Java code are delimited (unlike Visual Basic .NET for example) you can have multiple executable lines of code on a single line.

Why would you do this?  Well, it does make the code ever so slightly more difficult to read; however, your methods – subroutines or functions – you don’t want them to have an excessive line count.  The longer the code in a single routine then reading it and understanding it become much harder.  This is also the same reasoning for the validArgs() method.  I could have simply validated all the input code right in processCommand(), but this makes the code really long.  What the other programmer (or your future self) needs to know is that all the information was validated – and you get this from the if statement containing validArgs().  It is much more important for the future coder (whoever she is) to know the main logic of the processCommand method.

Next we start our for statements and blocks of code.  The for statement has 3 major sections.

for (int y = yStart; y < yEnd; y++)

The first section – int y =yStart; //  This indicates in the first section that we are declaring a variable y, the variables type int (Integer).  Next is what I would call a “while” statement.  As long as this section is true, the for block of code will continue to be called.  Finally we have y++ // this is the increment for the variable declared in the first section.  y++ is also equivalent to y = y + 1;  It is a shorthand that is understood by all C#, C++, C, and Java programmers.  You can also do a for statement with a decrementing statement y– which is equal to y = y-1;

If we wanted to make a pyramid that was upside down, we’d have some work to do, but you can see from this for statement how to do it.  There would have to be a few other changes as well, but nothing too difficult.  Now, the trick would be to make a processCommand that both makes pyramids that have the smallest point to the top or bottom without increasing the number of lines of code.  Maybe next week.

There are two more for statements following the first one, and they all basically operate the same.  Next up of relevance is the statement that actually creates the blocks in Minecraft.

((EntityPlayer)s).worldObj.setBlockState(new BlockPos(x,y,z), block.getBlockState().getBaseState());

The funny things about the above code is that it isn’t what I would expect it to be.  Personally, I would have had block = new Block(world,coordinates); – where the block object would already have the attributes/properties necessary for creating the object – or in this case the material.  Anyway, moving on to what the code actually is…

(MoreSpecificObject)LessSpecificObject – this is a cast from one object type to another.  Objects follow something called inheritance.  All objects (no matter what type) inherit from the class object.

Casting can be a dangerous (read – can cause your code to crash) if the object you are casting is not the type you are casting it in to.  In the validArgs() method I do a quick check to see if ICommandSender s – is in fact an EntityPlayer class object.  If it isn’t it throws an error.  If it is then the code continues on.

EntityPlayer extends EntityLivingBase.  EntityLivingBase is an extension of the Entity class. ICommandSender could be of type EntityLivingBase.  It was described earlier that objects can implement interfaces.  The ICommandSender is an interface (hence the I prefix).  Therefore; any objects implementing the ICommandSender interface could be sent as the input to processCommand().  It could even be an object that isn’t an Entity base class type.

An object can implement multiple interfaces in Java.  While this should be standard in any OO (Object Oriented) model I seem to recall that you couldn’t implement more than one interface in a single class in VB.NET.

So…  (EntityPlayer)s – casts object s as EntityPlayer class.  After that point a dot operator will bring up the methods and properties of the EntityPlayer class (which should have all the methods and properties of Entity and EntityLivingBase classes and new EntityPlayer methods and properties.

Based on the names of the properties and methods then we are getting the world object that s belongs.  Then we are calling the worlds method setBlockState – which apparently creates a block if you send it a new BlockPos and a block object.  Not only are we calling the block object, but we are calling getBaseState method of the getBlockstate method.  The block object we created some time ago as a global to the class variable of type Block.

It is a little less than intuitive, unfortunately.  Once understood, though, you can use this code to make blocks of any type and put them pretty much any place.

After two of the for blocks close there are a few statements increasing or decreasing some values.  Each layer higher (increase in y) in a pyramid decreases the end of the x and z lengths and starts the x and z one position over.

After those statements it goes back up to the beginning of the first for loop and off it goes until we don’t go any higher.

Now, our pyramids will end up with a block of 4 blocks as the top of the pyramid.  It is a little inelegant.  An upgrade would be to create 4 custom blocks that combined look like 1 block centered on the top.

NOTE: if you are a VB coder and haven’t ever coded in C# or Java you may be a bit confused about some of the most minor of statements – the variable declaration.  In VB.NET you would write a variable declaration as Dim x as String.  However, in C# or Java – you would write this as String x;  .  In addition, the signature of methods that take parameters contain variable type space variable name.

Now, we wrote code in processCommand that was not defined presently.  It is a good idea to keep functions, methods, and subroutines small enough to see their logic in 1 or at most 2 screens.  After that it is best to break the functionality in to component pieces and call those pieces in your main function or method.

In processCommand we called a custom function called validArgs.  validArgs returns a boolean.  It is called in an if statement and if the if statement is true it exits our processCommand function because there is a problem.

NOTE: validArgs returns true if validations are successful.  It returns false if there was a problem.  This should be standard.  However; to make our code clean on the calling side – so that we don’t have to code an empty true evaluation and a return in the false – which adds lines of code and decreases readability – we look for the opposite of validArgs – so if validArgs returns true, then we take the opposite of that false – and code continues down our processCommand method.  If validArgs returns false, we take the opposite of that – true – and then execute the return functionality and exit the processCommand method.

validArgs

validArgs defines some local variables, and then performs a series of if statements.  Since most of the information passed to our functionality are numbers, we validate that these values are in fact numbers.  If anything fails we return false and show a message to the end user.

In searching the internet I found the NumberUtils.isDigits function which accurately performs what we need it to do.  Validating a value to be a number isn’t as easy as you would think.  In Visual Basic and even in VB.NET (and therefore C#) the isNumeric functionality will let values pass that are not necessarily numbers but can be interpreted as numbers.  Values such as scientific notation can be interpreted as numbers – and contain letters such as e.  The problem is that when you try to perform math on these values your code will explode and you will have unhappy users.

I had to add an import – import org.apache.commons.lang3.math.NumberUtils; – in order to use this function.  You may find another solution to this – or write your own.  Back in my Visual Basic coding time – I had to write my own function in frustration as nothing really performed well in the base Visual Basic library.

Finally, after we do our numeric validations – we call another function.  You can tell from the screenshot above that the validArgs function is quite tall.  Adding in the functionality for validBlock would have made it longer and less readable.  So the related functionality for validating a block is contained in one function – which can then be reused anywhere.  Again, we are using the patter of coding of a function that returns a boolean, checking for the opposite of the boolean and exiting our main function (returning false) if there is something wrong.  This code is shown below.

validBlock

This is a fairly simple function and I won’t go in to too much detail.  Just note that there is only one error message and we define it in the beginning – no matter what goes wrong in the function.  There is also a try…catch utilization here – in case something really goes wrong with our code – we prevent a crash – at a price of some performance.  The try… catch – is defined by the try and braces which contains the code that might cause a crash.  The catch area with braces is the code that will run in case the code in the try block crashes.

There are three last methods left – but they are small or not changed from the signature of the interface.

LastThreeMethods

canCommandSenderUseCommand – this was definitely changed from the original code.  In fact, if you don’t change it nothing will work.  In this case we are returning true if a user called this function.  The direct implication is that perhaps there is a way to have an NPC call your custom function.  Maybe a subject for some later tutorial or article.

The last two functions – addTabCompletionOptions and isUsernameIndex are left completely stock from the original interface.

That’s it.  Next article will be able integrating a technology in to Minecraft that you wouldn’t expect.

The code is uploaded on Git – here.

Have fun and ask questions.  If I don’t know the answers I’ll do what I can to figure them out.

Arrgggh.  As usual for programmers – we rush through things and sometimes forget to put in vital instructions.  Or not too vital as another programmer can figure out what we did and fix the problem.  In this case, though, this is meant for programmers that don’t know programming yet – and so, figuring out this problem that is minor might actually turn in to a monumental task.

Git_Pyramid

In the file list shown above from Git – there are 4 files.  README.md – just a couple lines of text.  Not really anything – you can get inventive in your own version and be highly descriptive if you want.  There is Main.java file which was discussed in detail.  This is what runs in Minecraft and creates your command.  MakePyramid.java is the class that is called for your command and referenced in the Main.java/Main class.

So, what is SErr.java?  Well, the code is shown below.

SErr

This is what I would call a static class in VB.NET or C#, but it isn’t really a static class.  It is a class with a static method.  What is a static class or method?  Well, these are classes/methods that can be called directly that store no data.  It gets more complex as you can have classes that have static and non-static methods and properties (well, if it has a property inherently this is not a static part of the class as properties store information).

What does SErr class and the s method do?  Well, it is my attempt to reduce the amount of code everywhere in the application.  If this class did not exist the line of code s.addChatMessage… would be repeated in our MakePyramid class every time we displayed a message.  Not a deal breaker, but it does make the code unnecessarily long and harder to read.

Could the s method have existed in the MakePyramid class?  Yes, yes it could.  No, no, it shouldn’t.

In the real world you would be making a mod with many commands.  Each of these commands would have need to create messaging to the end user indicating errors.  If you put the s method in MakePyramid you make that class required for all your other mods.  But what if you decide that you don’t want this MakePyramid functionality?  Then you might delete the MakePyramid class and then cause errors in the rest of your code.  Or just to be clear, if SErr.s – exists outside of any other mods – it can be equally called by the other mods/custom commands – and it will be called the same way in each class.

It is also in a separate class because like functions should be in like classes.  Sending an error message isn’t a like function of the MakePyramid class.  It is something it does, but it isn’t a direct expectation of the MakePyramid class.

Your code, until this class has been added, won’t compile.  Use the instructions earlier to create a new class called SErr – and copy the code in to it.  Save it and ensure there are no syntax errors.  Now your code should compile.

 

Enjoy.  Unless I find another problem. 🙂  I guess writing instructional content has inherent problems.

Coursera review: The Data Scientist’s Toolbox

Facebook can be a wonderful thing.  I have liked the BBC, CNN, and many other news outlets on my Facebook account.  I don’t watch the news on TV, I get upset about it on Facebook.  Really, it is much more efficient and upsetting. 🙂  But you can’t get too upset as the next post is some funny cat video.

Advertising is moving to Facebook and in talking about Data Science, Facebook and their advertisers must have some really good algorithms at work.

I remember when we all used to laugh about the inappropriate ads that would show up on the web and Facebook.  Now, these ads are targeted and they know us well.

An add for Coursera’s Facebook page is how I came to know about the Data Science certificate.  Since then I’ve read articles about the course and the enticing low cost of the course gathered my interest. ($29) [NOTE: all other courses are $49 each]

The Data Scientist’s Toolbox course contains an overview of the Data Scientist’s job, takes you through installing R, creating and using a Github account with Git GUI and Git Bash, and the steps in Data Analysis.

I went through the course very quickly; although, I had to wait for the official course end to get my certificate. (Coursera datascitoolbox 2015)

The course is composed of video lectures, dedicated forums where many classmates post questions and answers, quizzes, grading other student’s work, and projects.

The fairly simple course projects was to show that you had successfully installed R, R Studio, and established a Github account.

The lecturer is very good, easy to understand and while not vocal about it you can tell he enjoys the field and teaching.  Enjoying teaching is something I wish all my college professors had when I was in college.

When I was finished with the course I decided I really wanted more.  I enjoyed it a lot.  The amount of time to complete the course was far less than their estimates (4-9 hours a week).  So I went on to the R Programming course immediately afterward starting that course mid-course.

While I thought the course was easy and the project was easy, I graded 4 other student’s work and 1 of them was not able to properly use Github.  So, what you will get out of this course depends greatly on what you take in with you.  Even someone with advanced programming knowledge will learn at least a little something from this course.

There was only one thing that was lacking in an overview course like this one – and that was to show off R Programming languages capabilities and say why choose R over other technologies.

I’ve finished the second course “R Programming” and started “Getting and Cleaning Data” and “Exploratory Data Analysis“.  I have to say that it isn’t until the fourth course that you really see why you would choose R for data analysis over using Excel, SQL Server Analysis Services or other options out there.

Attribute #2 of a Codop

A codop is you – only electronic.

I am a business analyst, programmer and a project manager.  The concept of the codop is enticing from both the programming point of view and the project manager.

As a programmer there are repeating structures in various programming languages.  Arrays, vectors, data tables, collections, SQL table rows, objects and finally classes.  Classes really get the mind going.

For my non-programming readers a class is like a template with properties or attributes and methods or actions that class can perform.  An object is an instance of a class in Object Oriented Programming (OOP – no really OOP, and try to make sure your programming doesn’t perform like POO).

OK, sorry I try to be funny sometimes and it just doesn’t work, but I keep trying.

Initially, there would only be enough computing power to support one computerized doppelganger of a person.

Computing power increases over time and the price decreases over time – which at some point would allow for a recording of a person – to be in an array.  You can imagine a physicist that needs to explore multiple avenues to create the Theory of Everything, suddenly being able to explore all those possibilities at once.  The different codops can even communicate and meet and understand their progress in different areas and alter course if required.

Or maybe your clones will be like these

.