Starting from April 2015, I was occupied with doing research for my Master’s thesis in one of the globally recognized companies from the Stuttgart industrial area. Unfortunately, I can’t give many details due to confidentiality issues. I am writing this post right after I am done with my work at the company, as a testament to this exhilarating period, before I start to rest for 6 weeks.

I was working really hard from the beginning, and especially harder from September on. Long days and nights of intense algorithm-smithing are finally over, and I felt like creating these statistics would present itself as a nice souvenir.

Although I was working on different types of physics problems in general, I was developing software at the same time, and the realization of all of the ideas happened through code. Therefore if I were to say that there is a correlation between lines of code typed and mental effort, the statement would be 100% true. I can directly measure the effort I had spent by counting the lines of code through time, and I am in luck since I have been using git as my version control system from the very beginning.

I have analyzed my commit history to get an overview on possible candidates to quantify my effort. You can obtain some crude stats by running git log or through some graphical tool such as gitg which would give you the number of lines inserted and deleted per commit. But just stating these could give wrong ideas, because some files I have committed may exaggerate the numbers, such as SVG or EPS files which contain graphical information, I had to analyze only the files that contained code that I have typed. I therefore used the GitPython interface to analyze each diff in each commit, and exclude the contribution if the file is renamed, or has an undesired extension. I also chose to exclude deleted lines and whitespace, and allow only inserted lines into the stats.

For example, if a new file is created and 10 lines are inserted, then the contribution is 10. If 5 of these lines are deleted, then the contribution is 0. If 3 of the remaining lines are edited, meaning that 3 lines will be deleted and another 3 will be inserted, the contribution will be 3.

In the graphs below, I used the cumulative sum of all contributions in each repo, which should represent the total effort spent on a project in a single place. I have excluded file renames, but there is no way to cleverly exclude the refactorings, which should correspond to the instantaneous jumps that happen from time to time.

Main Thesis Code

My main task was to read literature and create a physical model by reverse-engineering (because the literature was mostly verbal and did contain algorithmic details) and developing code from scratch. I had to do it in MATLAB, which would be reasonable for most, but I would have used some other language if it were up to me. In fact, I wrote the postprocessor in Python later on.

Inserted lines of code (%95 MATLAB, %5 Python) versus time. The jumps correspond to refactorings.
Inserted lines of code (%95 MATLAB, %5 Python) versus time. The jumps correspond to refactorings.

The total number of lines inserted adds up to a whopping 30000! I have worked on such big projects in the past, but they were in C++ and which results in much more bloated code. Even without the refactorings, it is a considerable amount considering all of it is numerical algorithms for physics, written in MATLAB. The current number of lines of code in this repository is in fact 15000.

The pace of development stays in general the same. I started in April 4th, which was initially just literature study. After that, you can see my first attempts at implementation through May and June. Since I didn’t commit changes during that period, you can see two steep jumps. The network at the company was very restrictive, and I had to send myself the changes via email in order to commit at home where I had access. I was lazy to do that every day, and you can see all the changes adding up in those jumps.

I was taking it easy from July to the last week of August, because it was summer. Nobody should have to work with such good weather outside. After that, I went into panic mode and restarted a crazy pace.

There is a steep region towards the end of October, which were my last two weeks at the company and were very intense. My obligations decrease after that and I don’t work too much on the algorithms, but create test cases and such.

Thesis Text

My plan was to write the text alongside the development, but I know understand that is impractical and unnecessary. A lot of theory has changed during the last months, and I am glad I did not have to rephrase anything.

I used LaTeX for this part. I used my own package of macros for equations, lazyeqn and used Inkscape for all figures. Things that I have discovered while writing this thesis:

  • pdflatex got an upgrade while I was not looking. The last time I attempted to use was in 2011 which was when I was first learning TeX. I did not prefer it back then because it could not include EPS images. I found out that it now can, which makes compilation a lot easier. Converting to DVI first is still a lot faster because it does not include images, but you get a more WYSIWYG feeling with pdflatex.
  • Inkscape SVGs can directly be included using the svg package! And the text still gets reinterpreted through LaTeX, which means that you can type out equations in Inkscape, and they get converted to equations in the end result. This is what I used to do with xfig, but it is old and it sucks. It doesn’t even have Bezier curves. Inkscape increased my figure productivity tenfold.
Inserted lines of code (LaTeX) versus time
Inserted lines of code (LaTeX) versus time

Everyone leaves this part for the last month, and hate it, but this is the part that I love the most. Converting hastily handwritten notes to perfectly set pages is a great feeling. I enjoyed it even though I was in a tight schedule.

The current lines of code in this repository is around 3200. But this is TeX, and we can get much more interesting stats. The following are generated using texcount.

  • Words in text: 9007
  • Words in headers: 259
  • Words outside text (captions, etc.): 1574
  • Number of headers: 67
  • Number of floats/tables/figures: 28
  • Number of math inlines: 434
  • Number of math displayed: 120

And of course, the number of pages is 80. I should mention that the text is still not done, and there are lots that I should add, so this part should later get an update.

HiWi

In the midst of all this work, I decided to work as a working student (HiWi, hilfswilliger) at a completely separate institute at the university, totally unrelated to my thesis. It basically started in July with 40 hours per month, and I increased it to 80 hours in October, and started working with two projects at once (not counting my thesis of course).

The commits started so late, because I was prototyping initially and did not need to create a repository. Everything I did for the institute was in Python, which thank god enabled me to finish everything in time. Python is the best for prototyping.

Inserted lines of code (Python) versus time for Projects 1 and 2 as a HiWi
Inserted lines of code (Python) versus time for Projects 1 and 2 as a HiWi

I should mention that

1 line of Python = 3 lines of MATLAB = 10 lines of C

roughly. So you might have an idea of the extent of work even though the total insertions add up to 6000. Eventually, I was finished with two Python packages for two projects, complete with respective API, scripts and documentation, one week ago on Friday.

I have a feeling that if I had also used Python for my thesis codebase, I would have finished much earlier, but alas.

Total

I merged all these datasets to give a definitive summary of my work pace.

Inserted lines of code versus time
Inserted lines of code versus time

The total lines inserted adds up to 40000. It is a number that I should frame and put up on my wall, because it was a test for me as much as it was a task. I finished everything successfully amid other problems, like trying to finish a thesis in a foreign country while having accommodation problems. My computer also died in the last month. I had to make an extra effort to stay focused. It was an interesting period, where I learned many things, so I could say it was a good experience altogether, especially since I am finished now.

I have proved myself on topics and projects presented by other people. Now, I can’t wait to start working on my own projects.

Update:

I finally presented the thesis on April 26th, 2016. It was delayed due to some complications, and I had to do some additional work, which I call the “second part” below. Now, I am completely finished and have a MSc degree in Computational Mechanics.

The final thesis is 115 pages long, typed in LaTeX, 12pt and A4 paper. The second part basically doubled the previous numbers:

  • Words in text: 19265
  • Words in headers: 324
  • Words outside text (captions, etc.): 2680
  • Number of headers: 83
  • Number of floats/tables/figures: 53
  • Number of math inlines: 848
  • Number of math displayed: 231

So I have learned by experience that it takes approximately one year to write a book on something :)

I generated a word-cloud as a bonus:

Word-cloud for the thesis
Word-cloud for the thesis

You could guess that the topic was pore-networks. Interestingly, the second part which deals with the continuum-mechanics based Theory of Porous Media, does not appear so much.

This chapter is now finished. Now, it is time to go up, up and away.