Week 9

Table of Contents

Due Dates & Links
Lab Tasks
- Using a Script to Run Many Files
- Working with Many Files, Using Java
  - getLinks on a Directory
  - Understanding Files Programmatically
Skill Demonstration 2

Due Dates & Links

Quiz 9 (will post Monday evening) - Due 11:59am March 2, 2022
Skill Demonstration 2 - Due 5pm March 4, 2021

Lab Tasks

Using a Script to Run Many Files

Starter Code

Clone the latest version of our shared markdown-parse on ieng6:

https://github.com/ucsd-cse15l-w22/markdown-parse

Notice a new directory called test-files/ that wasn’t there in past weeks. There are a lot of files in test-files/! From the quiz, we know there are over 1000 files, and around 650 of them are test input .md files.

Run make, then run time bash script.sh in this repo.

Write down in notes: What did it do? How long did it take? (time should give some explicit data here). Copy/paste the output.

Let’s reflect on this output a bit.

We don’t know if the output is correct. All we’ve done is shown what the output of our program happens to be on these inputs. We don’t know what the expected output is. At least for the provided code, we have learned that it doesn’t cause any errors or infinite loops.
Most of the runs print out [], indicating our program didn’t find any links in them. And indeed, most of them don’t. However, we don’t know that for each time our program printed [], there are truly no links in the corresponding .md file!
It’s difficult to tell which output corresponds to which input file.
We can only access and read the output by scrolling back in our terminal, which is annoying. If we lose the terminal we’d have to run the command to get the output back; if we ran more commands we’d have even longer scrollback (and most terminals have a limit for how much they’ll show).
If we make an edit and run the program on all of these inputs again, it would be infeasible to tell if small changes to the output happened.

Fix #3 first. Add some code to script.sh to print out the name of each test file before its output. (Hint: echo might be useful)

Write in notes copy/paste the output after making this change and running the script. You can use Ctrl-C (press it multiple times or hold it down) to stop the running script after you’ve just seen a bit.

Let’s fix #4 next. We’d like to save the output to a file. One option is to scroll up and do a looooong copy/paste of the terminal output. This is not a bad idea, but there’s a dedicated tool in bash and similar command-line tools for doing this that we can use instead. The tool is called

output redirection

and it works by telling a command that we want its output to go to a file rather than just be printed at the terminal. We trigger it by adding > some-filename after a command. So in this case we might run:

bash script.sh > results.txt

This will still take the same amount of time to run, but when you’re done, you should be able to vim results.txt or cat results.txt and see the results. The name results.txt isn’t special, and you could pick a different name each time.

A word of warning: when a command uses output redirection, it deletes and recreates the target file each time. Run the following commands in order to see this effect:

echo "hello" > another-result.txt

cat another-result.txt

echo "overwrite it!" > another-result.txt

cat another-result.txt

Write in notes show the output of the above commands

If you want to append, instead of recreating, the target file, use >> instead. Try the same four commands as above using >> instead of > and write in notes what you see instead.

With Your Code

Next, and also on ieng6, make a clone (or update an existing clone) of your repository for markdown-parse in your home directory. Then, copy script.sh and test-files/ into your markdown-parse directory.

It’s likely that a sequence of commands like this will be useful, but don’t copy them directly! They may or may not match how you’ve set things up. Think about what each of them means before running them.

cd ~ # go back to your home directory

git clone ... your-markdown-parse ...

# these commands assume that the provided course one is stored in
# cse15l-markdown-parse and yours was cloned to your-markdown-parse

cp -r cse15l-markdown-parse/test-files your-markdown-parse/
# The -r option above stands for "recursive", which means that files and other
# directories inside the given directory are copied recursively

cp cse15l-markdown-parse/script.sh your-markdown-parse/

Use lots of cd and ls and pwd and git status as appropriate to confirm that you’ve moved the files correctly. It’s really good practice to do this all at the terminal, so make use of it!

Write in notes take note of all the commands you ended up running to get the files moved over.

Once you’re done with this, run script.sh in your repository, and use output redirection to store its results.

Write in notes what happened when you ran script.sh in your repository? Did you get any exceptions? Did you get an infinite loop? If you got an exception or infinite loop, spend 10-15 minutes trying to debug it. If you’re totally stuck on a particular file, rename it to something without .md at the end (use the mv command, ask your tutor for help if you’re not sure how!) so you can make progress.

Comparing Two Implementations

Now, you have two results.txt files, ideally each with the name and results for each one of these tests. One is the one you just generated, and another is the one from our provided implementation that you made in an earlier step.

These we can compare line-by-line. In fact, there are programs to help us do so! There is a program called diff that is for just the purpose of showing the differences between files.

diff takes two files as arguments and shows their differences in a stylized format I did this on two different implementations that I happened to have checked out (remember, your paths and filenames might be different!) and got this result:

[cs15lwi22@ieng6-202]:~:438$ diff student-mdparse/results.txt markdown-parse/results.txt 
92c92
< []
---
> [/foo]
... lots more output here ...

This means that on line 92 of the results.txt in the student-mdparse directory, the line contained [], while on line 92 of the markdown-parse/results.txt directory, the line contained [/foo]. If we look at line 92, in those files, that’s the test output for the file 14.md (good thing I added code to print out the name of the file!). We can look at that file to get a picture of what’s going on, because the discrepancy is interesting:

[cs15lwi22@ieng6-202]:~:440$ cat markdown-parse/test-files/14.md 
\*not emphasized*
\<br/> not a tag
\[not a link](/foo)
\`not code`
1\. not a list
\* not a list
\# not a heading
\[foo]: /url "not a reference"
\&ouml; not a character entity

So it looks like in this case, the student implementation correctly identified this as not a link, while the provided implementation identified it as a link! The input uses \ before a [ to escape it, so it shouldn’t be treated as an open bracket for a link but rather as just an open bracket character.

Whew! That was a deep dive to figure all of that out! We:

Generated output from each implementation using our script
Put the output into a results file using output redirection
Used diff to see the differences
Checked in the files to find which input file it was referring to
Looked at the input file to use our judgment to tell what the expected output should be

Your implementation probably won’t have exactly the same diff as above! What you should do as a team is find at least three differences between your implementation and the provided one. Try to find at least one where your implementation is incorrect, and one where the provided implementation is incorrect (and one more of your choice).

Write down in notes: Indicate which test files had different results; show the test files, their names, the differing results, and which implementation was correct (or if neither implementation was correct).

Working with Many Files, Using Java

In the last section we saw how to work with a lot of files using a bash script and some command-line tools. This isn’t the only way to manage a lot of files! Programming languages also have tools for working with and managing files and directories. This part you can do either on ieng6 or your own computer.

getLinks on a Directory

The provided implementation of markdown-parse has a new overloaded getLinks method that takes a File parameter, which could represent a file or a directory. (Blame java for the fact that a class called File can represent either…)

First, change the main method of MarkdownParse so that if the command-line argument is a directory the getLinks method that takes a File is called, and if the command-line argument is just a single file, the existing behavior is maintained. Check that this is working. (Note that make test isn’t very helpful here; just make and running with the command line. Could you write a unit test for this?)

You should be able to run, for example

java MarkdownParse test-file.md
# Produces output for a single file

java MarkdownParse test-files/
# Produces output for all files in test-files/

Write in notes: Make a commit and push with your updated version of the main method. (If you can’t push for some reason, copy the code into your notes).

Write in notes: Use time with your updated main method to get all the links for the files in test-files/. Did this take more or less time than using script.sh? Why might that be?

Review the code for getLinks(File). Discuss any lines you are confused about with your group and your lab tutor. You should have some questions! Take one question that’s unresolved from your discussion and ask it on Piazza, signed with your group name.

Understanding Files Programmatically

One problem we have with using these 650 files as tests for our particular purpose is that we aren’t sure what the expected values are. We could review them all manually, but let’s think about if there’s a better way (Say it takes 1 minute to review each one and write down the expected output for it – how long would it take to write down all of them?)

Write down in notes. Brainstorm some ways we can do better than manual review of all 650 files to determine expected values. Write down your ideas.

One observation we can make is that files without any [ cannot have any links. The expected output for files without any [ at all must be an empty list. Files with [ might have links, and probably warrant some closer review. We could say the same for files with(out) ], (, and ). Beyond that, things get a little murky because we get into complex questions like the ones we’ve seen in lectures and past tests.

Let’s try to modify getLinks(File) so that it will print out, for each .md file, whether it contains all of these characters or not. This is the set of files we need to consider. Add code to getLinks that will print out the filename if the file’s contents has all of these characters.

Try it!

Write down in notes: Make this code change and write down which test files might possibly have a link in them. How many are there? How many did we indicate our implementation reports as having a link in the quiz? What does this mean for where we could focus our efforts in checking for the correct results?

Skill Demonstration 2

For the second skill demonstration, you will create a video screencast of yourself doing some tasks related to editing and debugging entirely in a remote terminal.

You should complete the following tasks, all recorded as part of your screencast:

Show your face on a webcam and your picture ID (ideally your student ID).
Log into your course-specific account on ieng6.
Clone this repository: skill-demo-2-starter
In that repository, run make test, which causes an infinite loop.
Open the makefile using vim. Add a new rule debug-test that will run the tests using jdb instead of java. (Use the material/videos from Week 8 if you’re not sure how.)
Use make debug-test to determine which test is causing the infinite loop. Demonstrate this by suspending the program in jdb and showing the stack trace. Say out loud in the video which test is triggering the infinite loop, and which methods in LinkedList are on the stack.
Use jdb to show the local variables in the method/loop that was running at the time the program was suspended. Use step to move forward in the program until it reaches the same line again.
Say out loud in the video what you think the problem is that’s causing the infinite loop.
Exit the debugger, then open the code file LinkedList.java with vim and edit it to fix the infinite loop while still passing all the other tests.
Re-run make test and show all the tests passing successfully.

Your video should be no longer than 10 minutes. You may need to practice it a few times to get it right and get it to under 10 minutes. It’s impossible for us to enforce that you don’t discuss the bug that causes the infinite loop, but you’ll learn more if you try to figure it out on your own.

You’re free to use any and all course notes/code/videos/labs/etc. for help, along with your prior work. Your video must be entirely your own work.

Submit your video to Skill Demonstration 2 by 5pm on March 4.