NPRG041 C++ programming (cs/en) 2016/17 @ e-x-a.org

Infopage about the C++ lab of NPRG041.

Taught by: Miroslav Kratochvil, contact: exa.exa@gmail.com (please include the course name/code in the subject)

Meeting point: MS MFF, fall semester 2017
- English group: every Tuesday at 17:20 in SW1
- Czech group: every Wednesday at 10:40 in SW2
Reading material:
- Very probably any good book about C++ will do. There are some in the faculty library. Because MFF students are generally more apprehending than the common programming crowd, I highly suggest you go directly to the Standard . (click on the 2014 Programming Language C++ link in Published standards)
- Cplusplus.com — especially following sections:
  - Containers (list, vector, set, map, string, ...)
  - Commandline I/O in iostream
  - File I/O and in-memory I/O
  - complex
  - ...
When in any doubt, mail me.

Lab requirements

To get the credit, you have to attend the course reasonably (at least accordingly to your knowledge of the topic), do the homework (see below) and finish an individual project (also below).

You will be assigned several points for finishing the homework. These will be added to the points from the final test, therefore improving your final grade for the whole course.

Depending on many factors, students from Erasmus programs may need a completely different set of rules to fit into different time limits. If you are an ERASMUS student, contact us!

Homework

Assignment A — CString

Wrap a C-style string functionality in a STL-style string container. You are not allowed to use STL though: All memory management and required data structures need to be implemented by you.

The string class should support:

methods .empty(), .clear() and .length() with standard behavior
operator+ and operator+= that work as string concatenation
multiplication using operator* by a scalar value N, which works as concatenating the string N times with itself
operator[] that accesses n-th character of the array
comparison operators ==, !=, <, ...etc
rule of three (i.e. reasonable default and copy constructor, assignment using operator=, and destructor), class MUST NOT leak memory, free unallocated memory, or cause access to unallocated memory.
transparent assignment and construction from const char*
method .c_str() that provides a const char* that points to string data (and can be used in C functions)
method .sub(first,n) that returns a substring that starts at position first and contains at most n characters (or less, if there aren’t sufficient characters in string). Argument n should default to “take all characters to the end of the string”.
PHP-style operator+ that works with simple types that can be converted to string, e.g. CString("aaa")+3 == CString("aaa3"); the accepted types should include int, float and const char*.
being a const (i.e. the constant methods should be marked constant, non-changed reference parameters also, etc.)

Hints:

Choice of memory allocation “API” is yours, you may use new and delete as well as malloc() and free().
Your implementation can use a single pointer that points to the beginning of allocated string. The other implementation with three pointers (begin, end of string, end of allocated memory, as used in std::string) is also possible, but not required.
The implementation should not crash, except for reasonable cases that you document in comments (examples: out-of-bounds index given to operator[], or str.sub(x) with too large x)
You do not have to implement string operations by hand (but you can), use functions from <cstring> (see here, but make sure you understand their semantic, esp. the conditions when they crash).
You do not have to implement integer to string (and similar) conversions; use C-style conversion using snprintf().
The assignment (and the whole C++ course) is about programming nicely, not about algorithms or speed. Make sure that your code is easy to use, readable and portable:
- Consider using a code formatter tool (astyle).
- Use appropriate and portable types! E.g. int is NOT a good return type for your string::length(), for at least 2 good reasons.
- Add comments where needed, make sure your variable names make sense and the code constructions are semantic.
- Check if your code works on another platform (e.g. ssh to the computer laboratory and use GCC to compile it).
- Avoid compiler warnings (with GCC you may turn on more warnings with -Wall parameter, which may point you to many problems in the code)

Save your solution to a .h file and (optionally) accompanying .cpp file, and submit them using the SIS study group interface — corresponding fields are marked DU1 (czech) or HW1 (english). Your score and comments to the solution will be available at the same place.

DEADLINE: Sunday November 19th, 23:59:59.999

UPDATE: If you are not sure whether your solution really works as expected, you can test it with the test program. (I have assumed that your class is called CString and is available in file CString.h, change that if required.)

UPDATE OF UPDATE: There was a mistake in the original test program on line 22 (there was = instead of originally intended +=). The result was that the program printed out lots of non-ascii garbage instead of what I originally intended (randomly looking strings of numbers and letters).

Assignment B — Simple Grep

You certainly know the grep utility from UNIX systems. The assignment is to write a simple reimplementation of that. Let’s call it sgrep.

sgrep receives a single command-line argument, which is a regular expression. Then it processes the input line-by-line and prints out the lines that match the regex.

Your program should accept a simple subset of the regular expression patterns:

Single characters, like a, z, or 5
Single character choices, like [abxyz] or [a-z] (which means the same thing as [abcdefgh....xyz]) or [a-zA-Z] (which basically means “any english letter”).
Negation of character choice — [^abc] which matches any character that is not a, b or c.
“Any” character, the .
Escaped characters — any character prefixed by \ is not a special character, but the literal character. Backlash is entered as \\. E.g. \. means a literal dot in output, not “any character” as above.
Kleene star — a* may match empty string, or a, or aaaa, or generally any number of repetitions of a
Alternatives — ab|cd may match either ab or cd
Parentheses ( and ) for grouping stuff
All these patterns concatenated and combined, like a(b|c|d) or a[bcd]*

After parsing the regular expression from the argument (argv argument of main() ), your programm should read the standard input line-by-line, decide which lines match the given regular expression (WHOLE line must match the expression, unlike grep), and print them out of they do.

Example usage input:

$ ./sgrep 'aba*cd'       --on windows, it would be:  # sgrep.exe 'aba*cd'
xxx
abcd
abaacd
xabcdx
abd

Corresponding output:

abcd
abaacd

If there are parentheses in the input, the program should instead print the matched content of the parentheses, one by one, separated by a newline:

Example input:

$ ./sgrep 'a([bc]([de]*))'
ab
acdedddeee
ade

Corresponding output:

b           --first line, first paren
            --first line, second paren is empty!
cdedddeee   --second line, first paren
dedddeee    --second line, second paren

The main requirement of assignment B is to never use manual memory management. Preferably not even raw pointers nor anything that can get broken easily. Exploit STL as much as you can, preferably smart pointers and other data structures, to get the safest and cleanest code possible. Except for the STL regex library, of course.

Your program should also behave as a reasonable command-line utility:

never crash
do not leak memory during the processing
behave silently — output only the exactly required output, unless there is an non-recoverable error that needs to be reported (e.g. a wrongly formatted regex pattern on input)
behave reasonably in case of mis-invocation — report the error to standard error output and return a corresponding exitcode

On the other hand, you may ignore several practical and efficiency-related measures that are present in traditional grep implementations:

You can load whole lines into memory, ignoring the fact that multiple-terabyte lines are a common sight nowadays. Do not load the whole input into memory though (inputs in the test cases will be larger than the memory available).
Your regular expression matching doesn’t have to be particularly fast, but it should not be extremely slow either. Find a balance between speed and code readability that fits you.
You may use “unbounded” recursion, of depth roughly equal to the length of supplied regular expression (ignoring the fact that evil users will give you really long regular expressions).
You do not have to compile the regular expression to a finite state machine (or any similar machine) — interpreting it is OK.
If the content of the parentheses is ambiguous (e.g. when matching pattern (a*)(a*) on string aaaa), print out any valid matching.
Do not handle any kind of complicated character encodings — all input and regex characters will be plain ASCII single-byte characters with ASCII value lower or equal to 127.
Do not try to solve problems inherent to regular expressions — for example, matching a pattern like .*.*.*.*.*.*a.* in a word abbbbbbbbbbbbbbbbbbbb is just slow; you don’t need to worry about it.
You may reject regular expressions that match stuff in a really wrong way (like a**).

Points for solution will also be derived from the usual criteria:

Readability/maintainability/portability of the code
- Use an automatic code formatting tool to keep the formatting concise.
- Try compiling the code for other platforms
- Add comments to code when its complete meaning is not instantly obvious
- Avoid compiler warnings (compile with -Wall)
- Avoid repetitive code, use the SPOT rule (see here), exploit generics and/or macros to remove the repetitions.
Appropriate types of all data
Reasonable solutions of all underspecified/ambiguous corner cases from the assignment. E.g. when given pattern [a-], you may either match both characters - and a, or report a syntax error that - is used wrongly (and should be used as [a\-] instead), or use any other reasonable behavior as long as you describe it in the comments.

Submit your solution to SIS using the study group interface (as usual). The corresponding field for the .cpp file (no headers this time!) is DU2 - sgrep.cpp for Czech group, and HW2 - sgrep.cpp for English group.

DEADLINE: 2017/12/17

UPDATE: Because this is a C++ lab (not a formal grammars lab), recommended structure of tokens/parsers/matchers is provided here.

UPDATE 2: checking your solution. To verify that your solution works as expected, you can compare its output to the output of standard UNIX utilities grep and sed:

When matching a regular expression without parentheses, output of your ./sgrep regexregex should be the same as the output of grep -E ^regexregex$ (except for minor differencies or incompatibilities between used regex syntaxes, which you can fix by hand).
When matching a regular expression with parentheses, the output of ./sgrep regexregex should be the same as the output of sed -E -ne 's/^regexregex$/\1\n\2\n\3\n...\9/p' (again modulo some syntax incompatibilities and the adjustment of backreferences (\1, \2, ...) by hand)

UPDATE 3: You may submit the assignment solution after the deadline, with appropriate score penalization. The penalization for delay of n full days after 2017/12/17 is computed as floor(n*n/8) (so a few days delay is mostly OK).

Project

Each student has to create an individual project and submit it to get the credit. Topic is variable, but should be discussed and agreed upon before the end of November. Size of the project is not an issue and largely depends on the topic, around 500 lines of (neat) C++ code is a good guideline (on the other hand, packing the same functionality into a smaller program using e.g. advanced language features is even better).

Bad topics:

Boring libraries
Boring games
Anything that does not behave C++-ish (e.g. code without a single class construction)
Anything that requires complicated or very specific user interface (e.g. bank account simulator, unless the interface is solved neatly in some novelty C++ way)
Anything that has 100000 implementations already hanging around the internet
Also, there’s already too many of checkers, worms, tetris, pacman, snake, tictactoe, etc.

Good topics:

Small fast games (fun!)
Convenience programming libraries (convenient!)
Convenient data structures with demos (useful!)
Physics-like simulations of something (collisions etc. are cool)
Networking (internet!)
Compilers, transpilers, virtual machines, typecheckers or interpreters for small programming languages.

Deadlines:

Topic agreed upon, written down in SIS: November 30th, 2017. Send me an e-mail to make sure the topic is confirmed and added to SIS.
“Beta” version showing some work done: April 14th, 2018
Final version incl. documentation: May 15th, 2018

Submission guidelines:

Make the code portable — it should not depend on the platform unless it is, by design, tied to that platform. Windows-specific projects should work on Windows in the computer lab, all other projects should work on Linuxes in the computer lab.
Do not over-engineer, avoid feature creep. The simplest project that satisfies the following conditions will do:
- There’s a reasonable amount of C++ that shows you know what you’re doing
- It does not crash, leak or otherwise torture memory; it doesn’t contain any inefficiencies that could be fixed by better C++. (Repeat: const references! avoid allocation!)
- It provably does what the topic says, and it can be demonstrated on data of reasonable size.
- It has sufficient comments. If you are unsure if you should add comment somewhere, try to tear the surrounding program block or function out of context, and ask yourself if anyone can still fully understand what it does (and why). If not, it needs comments.
- If the topic is a data structure, include a comparison with a naive approach or some C++ counterpart (std::chrono provides timers). Note that your data structure does not need to “win” the comparison (that’s the topic of the HPC course), you should only provide a reasonable testing framework to assess the performance.
If you think the project is ready, either pack it up and mail it to me, or send me a git link.
Having received your program, I should be able to convince myself that it works in less than around 15 minutes. You can help it a lot:
- Include a file INSTALL (.md) that describes how to make it work on a fitting computer configuration. (Best: How to make it work on computers in MFF’s computer lab.)
- Include a file DEMO or TUTORIAL that describes what should I do with the compiled program to see all the important functionality as quickly as possible.
- If the DEMO requires some data for the demonstration, include it! (Advice: If I’m forced to create testing data by hand, it will take more than 15 minutes. Also, result will contain lots of hideous corner cases.)
- If there’s lot of source code, include INTERNALS file that describes which parts of the functionality can be found in which part/file of the code. This is sometimes also called “programmers documentation”. Imagine it as a signpost, compare with artist’s impression thereof.
- If the documentation files are not very big, pack their contents into one nice README with corresponding sections.

Lab timeline

Week 1 (2017/10/3,4)

Introduction to (and necessity of) C and low-level programming in general. How to compile C to assembly, how does linker link the symbols to produce executables.

Source code here. (Source from the labs will always appear there. Also, if you are new to programming on UNIX, you will certainly want to examine the makefile from 01/asm2/Makefile !)

Week 2 (2017/10/10,11)

Plain C programming practice (with pointers).

Sad face about how modern CPUs won’t ever process strings automagically.

Source code can be found on the usual place.

Week 3 (2017/10/17,18)

C++ leap: Encapsulation of various stuff in nice interfaces, a word about references.

Week 4 (2017/10/24,25)

Constructors and destructors, RAII (do NOT use new if you really don’t want heap memory), a tiny hint of resource ownership. We have applied the Rule of Three (copy constructor+copy assignment+destructor) to make sure that our list class will be handled as we expect in all cases, which makes it perfectly encapsulated (and basically as safe as we want to). Source code is on the usual place.

Week 5 (2017/10/31, 2017/11/01)

A bit about templates, STL containers and iterators.

Week 6 (2017/11/07,08)

A bit of I/O. Iterators on your own class.

Czech class was cancelled due to a sport event. The material from english class is available for study as always, I suggest you take a look at it. We’ll eventually run through it at the lab.

Week 7 (2017/11/14,15)

Run-time polymorphism, virtual, vtables, sneaky peek on smart pointers.

Czech class will be taught by Vlastimil Dort this week.

Week 8 (2017/11/20,21)

Conversion operators, proxy classes and their usage: get/set-style properties, safe array indexing.

Week 9 (2017/11/28,29)

Recursive Descent Parsing —- a straight-forward way to convert a formal grammar to a tokenizers and parsers. AST as a really vital data structure.

Week 10 (2017/12/5,6)

Using the previous approach to write an interpreter. Using preprocessor to generate repetitive code.

Week 11 (2017/12/12,13)

UNIXifying the interpreter, smart pointers and move references.

Week 12 (2016/12/19,20)

A bit of practice —- implementing the postfix calc, various goodies from STL iterator/algorithm library.

We still have 2 labs left now, so if you want to practice something specific, send me an e-mail.

Week 13,14

Practice for the exam.

Extra stuff we managed to fit in the czech lab: Computing with templates, low-level program loading (dynamically linked libraries), simple HTTP-like server.

Bonus material

Do not, never, ever use reinterpret_cast for converting pointers. Using a pointer converted by reinterpret_cast works by accident and may cause undefined behavior, EXCEPT in the case you reinterpret the pointer back EXACTLY to the original type (so you basically don’t do any conversion), provided the byte sizes of all values in the conversion chain are EQUAL. There are two legal usages: one for printing out a pointer address value as an integer, which you are usually not ever supposed to do; and one for converting binary data (e.g. creating floats from mantissa and exponent bits by hand), which you usually do not want to do. You want static_cast, or possibly dynamic_cast when downcasting base class pointers to derived class pointers. If you still want to use reinterpret_cast, remember there are architectures where sizeof(char*) != sizeof(int*).
The Art Of UNIX Programming (online version) by E.S.Raymond is a brilliant material not only about the origins of current computing practice and the C programming language (that emerged together with UNIX and gave rise to C++), but also contains almost all the most important insights about software development gained in last 50 years. Surprisingly, all of them still apply. Sometimes a bit philosophical. If you are developing a large system (or a part of some) for the first time, this book will tell you what design mistakes you are almost certainly going to make, what are they called, and how to avoid them.
Doug Lea’s Malloc — a webpage about the original default allocator of most C/C++ libraries, which makes an extremely good read about allocators. (anyway — you probably use that guy’s code several times per second just by silently watching this website)
If you would like to create a game for the semestral project and have no idea where to start with the graphics, try OpenGL. Easiest way to do it is using e.g. GLUT, example programs are available here and here . Compile with gcc glut.cpp -o glut -lGL -lGLU -lglut . You might also want to see SDL that is traditionally used for portable games, or a newer alternative SFML. Following sites provide a good introduction to modern OpenGL: https://open.gl/ and https://learnopengl.com/.
If you would like your game to be a bit more advanced (and save yourself a lot of coding), use a graphics/game engine like Magnum (from a former MFF student), Ogre3D or Urho3D.
If you like template programming but the syntax seems unwieldy, use this :]

...