Cloning your first repo to Amazon Sagemaker Studio Lab

Austin Lasseter
8 min readJun 10, 2022

--

Amazon SageMaker Studio Lab is a free machine learning (ML) development environment that provides the compute, storage (up to 15GB), and security — all at no cost — for anyone to learn and experiment with ML! It’s a real game changer for ML students and enthusiasts.

SMSL provides an amazing integrated development environment (IDE) for programmers and data scientists, called Sagemaker Studio. This IDE uses the popular JupyterLab interface. JupyterLab is an interactive development environment for working with notebooks, code and data. JupyterLab enables you to use text editors, terminals, data file viewers, and other custom components side by side with notebooks in a tabbed work area.

One of the first hurdles to a new user of SMSL and JupyterLab is getting code from github into our IDE. This blog post will walk you through those steps.

Getting to know Terminal in SMSL

For starters, fire up your SMSL instance and go to the “Launcher” tab. Open up a new terminal session like this:

A new tab will open up. Try typing the commands pwd and ls -l — you’ll see that the directory and files listed in terminal match what’s in you left navbar.

The Terminal is a command line system that can help you quickly take control of your operating system and make changes without using the point-and-click GUI. It’s a power move and you’ll want to get familiar with it as fast as you can, so you can run with the big dogs 🐺. Read about some basic terminal commands here.

Now we’re going to learn how to use the Terminal to bring some code down from github and onto our SMSL server.

Pulling code from Github to SMSL

In your browser, navigate to github and find a repository that you’d like to work with. If you want, you can try using this introductory repository that I posted on github: https://github.com/plotly-dash-apps/001-intro-to-python

In the top right corner, click on the “Fork” button like this:

What is forking, you say? 🍴 Forking creates a copy of the repository that you manage. Forks let you make changes to a project without affecting the original repository. Try exploring around the repo a little, and get to know what’s in the files and directories

Now click on the green “code” button — it will give you a dialogue box with an HTTPS option. Click on the “copy” icon.

Back in SMSL, use the cd command to navigate to a folder where you’d like your code to live (or try creating a new folder for your code using mkdir). We’re going to use the version control system called “git” to keep track of our code. You can create a folder with a name like github-projects to store all your code (you’ll be cloning a lot of repos from github very soon!). But be careful to NEVER clone a repo inside of another cloned repo — this is bad mojo and will have unpleasant downstream consequences. Every cloned repo should be independent of every other.

To clone, try this command using the HTTPS you copied:

git clone https://github.com/austinlasseter/001-intro-to-python.git

You should see all the code appear as you expected! (read more about using git for version control here). For example, you’ll see a one-to-one correspondence between the folders and files in github (remote repo) and your Sagemaker console (local repo). Compare like this:

One file which you won’t see displayed in SMSL lab is the .gitignore file. This file specifies intentionally untracked files that Git should ignore. Because it’s “hidden”, the only way to view and edit it is to click on the git → gitignore option in the command ribbon at the top of SMSL. You can leave it as-is for now.

Try opening the notebook lecture/intro-to-python-solutions.ipynb and running a few of the cells. Experimentation is the best way to learn, but you can also check out some great tutorials online, like this one. When you’re done running the notebook, save your work using the 💾 icon.

Pushing code from SMSL back to Github

So at this point, we have some changes and newly saved work in our “local” repo and we want to push them back to the “remote” repo in github. Before we can do this, we need to set up a github token in SMSL.

The very first thing to do is set up a personal access token in github — this will allow you to push changes to your cloned repos. Github recently updated their requirements so it’s more difficult to clone repos (and also more secure). Follow the steps outlined here. Also note: at this time, you can’t install the Github CLI onto AWS Studio Lab, so you can ignore those instructions.

Go to the top right corner of your Github profile and choose “Settings.”

Under “Settings”, in the left navbar, scroll down to “Developer Settings.”

Now find “Personal Access tokens.”

Give it a name (called “Note”) and an Expiration. Consider setting the expiration for 3–12 months from now (it’s not recommended to leave it open forever, but you don’t want to have to reset it every few days either).

You should also set “scope” — at a minimum, select “repo”. For authentication with the gh library, the minimum required scopes are ‘repo’, ‘read:org’, ‘workflow’.

You may also need this to avoid an error about scope:

This will be sufficient for our class. Then generate the token.

Make sure to copy your personal access token now. You won’t be able to see it again! It’s a good idea to save your tokens and passwords using a password manager like Lastpass or 1Password. Once you have a token, you can enter it instead of your password when performing git operations over HTTPS — we’ll do this in just a moment.

Let’s push our changes from local to remote. In terminal, enter the following commands:

git add .

The add command updates the index using the current content found in the working tree, to prepare the content staged for the next commit. Don’t forget the “dot” . at the end! It’s a wildcard that means “all changes”

git commit -m "type your message here"

The commit command creates a new commit containing the current contents of the index and the given log message describing the changes.

git push

The push command updates remote refs using local refs, while sending objects necessary to complete the given refs.

When you run git push for the first time, you may get the following error message:

Simply run the two suggested commands as indicated, and then try git push again.

git config --global user.email "you@example.com"
git config --global user.name "Your Name"
git push

When you finally run git push it will look a lot like this:

When prompted for the password, use ctrl+V to paste the token that you created earlier for github. When you paste the token you won’t see anything — that’s a security feature, to protect passwords. Just press enter and everything will be okay. You should now see output like this:

Mouse on over to github in your browser, and you should see your updates and commit message now appear in your remote repo, like this:

Storing your Github token on SMSL for future use

Of course, copying and pasting the token every time gets old fast. There’s a shortcut you can add, so that SMSL will store your token for you. You can use thegh library to remember your token. Just follow these easy steps in the Terminal in Studio Lab.

Type the following command to install the gh library.

conda install -c conda-forge gh -y

You’ll see the following output:

Once the installation is complete, type this command:

gh auth login

Use the arrow and enter keys to navigate a series of prompts, like this:

Now you’ve added your token — this will make pulling and pushing code a lot easier! Let’s try it out. Make a simple change inside the repo, save your work, and run the following commands:

git add .
git commit -m 'your message here'
git push

If your token is successfully installed, SMSL won’t challenge you for your username and password anymore. Your commit should now push seamlessly, like this:

Conclusion

Good work! You’ve completed your first repo with github and SMSL. Now try a few others and pretty soon you’ll be a pro at this!

Remember that for SMSL, you can find fixes to problems and submit your own questions to the Sagemaker team here.

Be sure to check out my next blog post in this series, about displaying Plotly visualizations in SMSL.

Happy programming to you!

--

--