Leadership: The Scientific Approach

I've been recently reading a book called Managing Behavior in Organizations by Jerald Greenberg. And I want to share  the ideas about leadership that I've picked up from the book.

What is leadership?

In order to understand how to become a leader, we should first define what leadership is. Leadership is an ability of an individual to influence others in ways that help to reach group or organizational goals. The essential goal of a leader is to create a purpose or mission of the organization and strategy for attaining it (whereas goal of the manager is to implement this strategy).

You are now probably asking: "how do leaders influence others?" According to "the theory", they use position and/or personal powers. Position power comes from the posts individuals hold, i.e. individuals can influence others because such powers are associated with their jobs. Such powers are available to anyone that holds a particular position. Position power has four different sources:

  • Legitimate power - individuals gain such power when others recognise and accept his or her authority;
  • Reward power - the power to control the rewards others receive, e.g. a supervisor can reword other by recommending a pay raise;
  • Coercive power - the capacity to control punishment;
  • Information power - power gained by having access to valuable data or knowledge.

Another source of power comes from unique qualities of an individual. Such power is called personal power. There are four sources of personal power:

  • Rational persuasion - ability to provide logical arguments and factual evidence to support his/her arguments;
  • Expert power - the power individuals gain when others recognise the expertise on a topic;
  • Referent power - the power individuals gain because they are liked and admired by others;
  • Charisma - power that comes from engaging and magnetic personality.

Luckily, the book provides some tips on how to strengthen your powers:

  • You can increase your information power by expanding your network of communication contacts and keeping in touch. The more contacts you have, the more information will be accessible to you; and the more information you have, the more people will count on you;
  • Take responsibilities that are unique. You will gain more power if you will be the only one that can perform certain tasks;
  • Perform less routine tasks and instead do some novel ones. If you do only routine tasks, you will be easily replaceable, whereas ones that perform novel tasks are indispensable;
  • Be involved in organisational decisions by joining task forces and making contact with senior people. The more important others consider your input to be, the more power you will have;
  • Perform activities that are organisation's top priorities.

What does it take to be a successful leader?

Up until this point we have taken a look to what kind of powers leaders use in order to influence others. But what makes leaders successful? What behaviour leads to leaders success?

Leaders are likely to be most successful when they demonstrate high concern for both people (showing consideration) and production (initiating structure).

In simple words, successful leaders (1) cares about you as a person and (2) gives you an advice, answers to your questions, and shows you what is expected of you. In fact, we can plot leadership effectiveness into a two dimensional diagram (which is called managerial grid):

Managerial Grid

Managerial Grid. Effective leaders should demonstrate high amounts of both dimensions.

In this diagram there you can see five green dots that represent different names for management style: "country club", impoverished, task, "middle of the road" and team managements. Team management is considered to be the ideal management style and this style is observed between very successful leaders. The diagram is mainly useful for two things: determining a manager's position in this grid (i.e. determining his/her management style), and helping him/her to train certain skills in order to reach the ideal management style (grid training).

LPC contingency theory: different leaders for different situations

According to contingency theories, certain leadership styles may be most effective under certain conditions. One example of such theories is LPC contingency theory. The theory states that the most important personal characteristic is esteem (liking) for least preferred co-worker (LPC). In order to evaluate this LPC, you have to take a person with whom a leader has troubles working with. The leader who perceive this person in negative terms (low LPC) are primarily concerned about carrying out the task itself. The leader who perceive this person in positive terms (high LPC) tends to accomplish the task by developing good relationships with the group. I believe this can be related to management styles. Low LPC leaders will probably show task management style, whereas high LPC leaders will probably prefer "country club" management style. LPC contingency theory though states that LPC is relatively fixed and cannot be changed, whereas managerial grid suggests otherwise.

When a certain type of leader is the most valuable? According to LPC contingency theory it depends on a situational control. It's not clear from the book what exactly does this mean (nor I was able to find a definition on the Internet), but it seems that it describes if everyone knows what to do, how much subordinates tend to follow the command, and how much power a leader has. When situational control is low then the group does not like the leader, and when the situational control is high, the leader is very liked by the group. So, LPC contingency theory states that low LPC leaders are best when situational control is either very low or very high. When situational control is low, leader who can give clear orders fits best; and when situational control is high, power of the leader is not challenged, therefore it is perfectly acceptable for the leader to focus on tasks.

High LPC leaders are best when situational control is moderate. A good example would be a research lab, where relations with colleagues are good, but the power of a leader is somewhat limited (you cannot force innovations out of people). In such situations a leader that gives clear orders will probably not appropriate, whereas collaborative leader, i.e. high LPC, would likely be more effective.

Apparently, you can match a certain leader type to a certain situation in order to boost effectiveness. Read more about this on Wikipedia article about Fiedler contingency model.

Situational leadership theory: leaders should adapt to situation

Situational leadership theory is another contingency theory stating that leaders are effective when they select the right leadership style for the situation they face. The situation depends on two major attributes of followers:

  • task behaviour - knowledge and skills followers have for specific task, or how much guidance they need, and
  • relationship behaviour - willingness of followers to work without taking directions from others, or their need for emotional support.

Yes, these are the same values that every effective leader has, but now these values are applied to followers instead. We can draw almost the same diagram as before, except that access will say how much directive or supportive behaviour followers need from the leader:

Situational Leadership

Situational Leadership. Best leadership style depends on how much support or directions followers need. (Picture adapted from robertjrgraham.com)

As you can see from this diagram, scientists identify four different situations depending on behaviour of followers:

  • High directive and low supportive (S1): in situations where followers need a lot of directions, but don't need support, a directing leader, that simply directs his/her followers,  is best;
  • High directive and high supportive (S2): in situations where followers need both directions and support a coaching leader works best. In this case leader needs to direct, but in a selling style, so that followers are talked into following the directions;
  • High supportive, but low directive (S3): when followers do not need directions, but need a lot of support, supporting leader does the job. Followers have already good expertise in what they are doing and leader just needs to motivate them to do the job;
  • Low supportive and low directive (S4): in cases when followers do have expertise and motivation to do the job, a delegating leader style is best. Instead of giving orders, leaders should delegate tasks and do monitoring tasks.

In summary, situational leadership theory states that leaders should identify the situation, choose the right management style, and implement it.

Develop the leader inside you

Good news is anyone can improve her/his leadership skills! In fact there is a definition for systematic process of training people to expand their leadership capacity. It's called leadership development. Most of the companies focus their efforts on the following three major areas:

  • Developing social interaction between people and close ties within organisation;
  • Developing trusting relationships between individuals;
  • Developing common values and shared visions with others.

The main focus here is the development of emotional intelligence. The following are the most widely used leadership development techniques:

  • 360-degree feedback is the process that nearly all companies from Fortune 500 rely on this technique. The idea is to collect feedback from multiple sources around you: your subordinates, peers and supervisors. During this process leaders can get the idea what others thing about them. The problem with this technique is that collecting feedback and taking appropriate action are two different things. Many people, when encounter negative feedback, defend psychologically by dismissing it or simply ignoring it.
  • Networking technique intends to help leaders to not get too isolated from other departments. Specifically, it is targeted to help leaders learn who should they ask for information when they need to solve problems. Also peer relationships promote cooperation.
  • Executive coaching is a method for improving leader's performance. Usually includes assessment of a leader's strengths and weaknesses and a plan for improvement. This method usually follows these steps: define what will be done and how, assess individual performance (e.g. by using 360-degree feedback), customise plan with consulting the leader's immediate supervisor, implement the plan. Such coaching can be done either for groups or for individuals. It was found that combination of these two increase leaders' productivity by 88 percent.
  • Mentoring is a method when leaders receive mentoring from more experienced colleagues (called mentors).

This is it, folks! If you managed to read up till here, you have a knowledge of the entire section of the book! If you find this material engaging, I recommend you to read this book. I also believe that every of us should seek to improve our leadership skills, as with these we will have a more successful careers and better relationships between colleagues and friends!

Split and Concatenate Videos with FFmpeg: It's Trivial!

The main idea of my future amazing system is to split video file into pieces, send them to workers in order to re-encode these pieces and then concatenate them back into a single video file. What tool will come to your mind for completing such task? For me it's FFmpeg. It's an astonishing tool for decoding, encoding, resizing and performing other manipulations on video files. You may cut and concatenate videos as well! How cool is that?

The ffmpeg that comes with Ubuntu is actually avconv. Since I wanted the true version of FFmpeg, I've first downloaded the source code:

git clone git://source.ffmpeg.org/ffmpeg.git ffmpeg

Then I've installed couple of dev pacakges in order to enable couple of FFmpeg features:

sudo apt-get install yasm libfaac-dev libfaad-dev libx264-dev \
    libxvidcore-dev libmp3lame-dev libtheora-dev libopenjpeg-dev

Later I've enabled all these features and enabled debug information through configure script:

./configure --enable-shared --enable-gpl --enable-nonfree \
    --enable-libfaac --enable-libx264 --enable-libxvid \
    --enable-libmp3lame --enable-libtheora --enable-libopenjpeg \
    --disable-stripping --enable-debug=3 --extra-cflags="-gstabs+" \

Finally, I've made the last step in order to build everything:


According Y. Sambe et al. work "High-speed distributed video transcoding for multiple rates and formats", a good result can be achieved when you split the video in between the GOPs. This makes sense, since every GOP should start with an i-frame (the frame that contains all the information, not just differences between frames). But in video files Decode Time Stamp (DTS) and Playback Time Stamp (PTS) may differ which may introduce some problems. Authors state that this may lead into a situation where despite i-frame being first in the GOP it may not be the one that will be played first. They call such GOP an Open-GOP. To me it seems a bit strange. I haven't confirmed such thing yet, but it doesn't make sense to play an i-frame after the b-frame. Authors continues that because of existence of Open-GOP and several other reasons, it is good to split videos in such a way that every piece (except the first one) would have one additional GOP in the beginning (which is the last GOP of the previous piece). They did some tests and somehow showed that it does have a slight effect on the resulting video quality after transcoding process.

For testing purposes, let's try splitting videos so that every piece would contain complete GOPs. For this, we need to know how big the GOP of the video is. There is a tool called ffprobe that shows various information about streams in a video container, but to my disappointment this tool cannot show the GOP size. In order to make it show this information, I needed to add a single line of code to ffprobe.c:

static void show_stream(WriterContext *w, AVFormatContext *fmt_ctx, int stream_idx)
        case AVMEDIA_TYPE_VIDEO:
            print_int("width",        dec_ctx->width);
            print_int("height",       dec_ctx->height);
            print_int("has_b_frames", dec_ctx->has_b_frames);
            print_int("gop_size",     dec_ctx->gop_size); // A single line is all I need...

After recompiling and then launching ffprobe, I've learned the details about my video clip:

Duration: 00:01:00.00
Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1920x818, 1239 kb/s, 24 fps
Audio: aac (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 127 kb/s
codec_long_name=H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10

Good, so now I know that there should be exactly two i-frames per second. This means that it should be possible to nicely split video into pieces of 2 seconds length. In order to test this little theory, I wrote a small python script that would generate me an ffmpeg command for splitting the video:

import sys
if __name__=="__main__":
	s="ffmpeg -i video.mp4 \n"
	for i in range(0,60,2):
		s+="-vcodec copy -acodec copy -ss 00:00:"+str(i).zfill(2)
		s+=" -t 00:00:02 out"+str(i)+".mp4 "
		if i

And also a script that concatenates these pieces back into a single video file:

import os
if __name__=="__main__":
	for i in range(0,60,2):
		f.write("file 'out"+str(i)+".mp4'\n")
	os.system("ffmpeg -f concat -i list.tmp -c copy joined.mp4")

I've uploaded the resulting video (you can also see an original video) to YouTube:

As you can see, this method of splitting and concatenating greatly reduces the quality at splitting points (every 2 seconds). There is no degradation of quality in audio, despite this, such level of quality is unacceptable for production use.

In fact, after the video was split I was expecting that the duration of video file pieces would be exactly 2 seconds. Instead it turned out to be like this:

for e in $(ls out*.mp4 | sort -V); do echo -n $e; ffprobe $e 2>&1 | grep Duration; done;
out0.mp4  Duration: 00:00:02.02, start: 0.000000, bitrate: 1250 kb/s
out2.mp4  Duration: 00:00:02.00, start: 0.020000, bitrate: 1838 kb/s
out4.mp4  Duration: 00:00:02.00, start: 0.017007, bitrate: 1871 kb/s
out6.mp4  Duration: 00:00:02.00, start: 0.012993, bitrate: 1179 kb/s
out8.mp4  Duration: 00:00:02.00, start: 0.010000, bitrate: 1719 kb/s
out10.mp4  Duration: 00:00:02.00, start: 0.008005, bitrate: 1217 kb/s
out12.mp4  Duration: 00:00:02.00, start: 0.005011, bitrate: 1336 kb/s
out14.mp4  Duration: 00:00:02.02, start: 0.000998, bitrate: 1329 kb/s
out16.mp4  Duration: 00:00:02.00, start: 0.020998, bitrate: 1366 kb/s
out18.mp4  Duration: 00:00:02.00, start: 0.019002, bitrate: 1421 kb/s
out20.mp4  Duration: 00:00:02.00, start: 0.016009, bitrate: 1136 kb/s
out22.mp4  Duration: 00:00:02.00, start: 0.011995, bitrate: 418 kb/s
out24.mp4  Duration: 00:00:02.00, start: 0.010000, bitrate: 411 kb/s
out26.mp4  Duration: 00:00:02.00, start: 0.007007, bitrate: 486 kb/s
out28.mp4  Duration: 00:00:02.00, start: 0.002993, bitrate: 598 kb/s
out30.mp4  Duration: 00:00:02.02, start: 0.000000, bitrate: 649 kb/s
out32.mp4  Duration: 00:00:02.00, start: 0.020000, bitrate: 776 kb/s
out34.mp4  Duration: 00:00:02.00, start: 0.018005, bitrate: 331 kb/s
out36.mp4  Duration: 00:00:02.00, start: 0.015011, bitrate: 322 kb/s
out38.mp4  Duration: 00:00:02.00, start: 0.010000, bitrate: 281 kb/s
out40.mp4  Duration: 00:00:02.00, start: 0.008005, bitrate: 137 kb/s
out42.mp4  Duration: 00:00:02.00, start: 0.005011, bitrate: 196 kb/s
out44.mp4  Duration: 00:00:02.02, start: 0.000998, bitrate: 350 kb/s
out46.mp4  Duration: 00:00:02.00, start: 0.020998, bitrate: 455 kb/s
out48.mp4  Duration: 00:00:02.00, start: 0.019002, bitrate: 1176 kb/s
out50.mp4  Duration: 00:00:02.00, start: 0.016009, bitrate: 1230 kb/s
out52.mp4  Duration: 00:00:02.00, start: 0.011995, bitrate: 817 kb/s
out54.mp4  Duration: 00:00:02.00, start: 0.010000, bitrate: 744 kb/s
out56.mp4  Duration: 00:00:02.00, start: 0.007007, bitrate: 729 kb/s
out58.mp4  Duration: 00:00:02.00, start: 0.002993, bitrate: 414 kb/s

This sounds fishy, doesn't it?

Björn recommended me to use libavcodec library directly instead of using ffmpeg. This sounded like a solution, so I spent a couple of days reading libavcodec code. But what I've found out is not very pleasing.

There is a function in libavcodec called av_seek_frame(). However, it is not very reliable. First, you cannot specify a frame number where you want to jump to. Moreover, according to a blog post Picture Go Back, it is not possible to reliably jump to a frame you want:

I repeatedly tried to seek forward and backwards to different frames -- frame 5000, 10,000, and 15,000 in divx, avi, and other video formats. Each time, the resulting location is close, but not exact. FFmpeg thinks it knows the frame number after seeking, but usually it is off. Frankly, when I want to jump to frame 5000, I want to be at frame 5000 and not 5015, 4079, or some other nearby frame.

So, I've just thought that maybe I can just scan the file without decoding it and check where are the beginnings of GOPs. However, I did not find any field that could provide this kind of information, but since all GOPs should start with an i-frame, I may try to just cut before each i-frame. However, I have to decode a frame in order to learn if it's an i-frame or not, and I don't really want to do that. And I really don't want to develop my own tool, because it will not stand a chance against ffmpeg in terms of supported formats, even if I use libavcodec.

And my research continues... Now, I'm thinking to look into VLC, see if it can cut accurately and if so, see if it is possible to use it as a library. Another option is to actually try to implement a new option in ffmpeg that would perform video copying, but will split video file nicely into pieces so that it would be possible to playback smoothly after joining these pieces back into a single video file.

Edit: I have to mention that I've found another way how to split videos using ffmpeg:

ffmpeg -i movie.mp4 -f segment -c copy -segment_time 120 -map 0 out%03d.mp4

After splitting a video using this method and joining the pieces back together, artefacts are still created in between split points. Slight time spaces appears between each segment and the total length gets increased as well. At least no frames are dropped which means that there is probably a slight bug somewhere. It may be a good idea to report this to FFmpeg community and see what are they thinking.

To visualise the behaviour, I have performed a simple test, I've end up with a video that is 00:00:08.05 length (as reported by ffprobe. It also reported errors with STTS twice), however it actually contains around 1 minute video. What I did then is:

ffmpeg -i joined.mp4 -c copy fixed.mp4

Then ffprobe reported that a duration of a file is 00:01:07.40 (still reported STTS error once). Here is the resulting video:

A Massive Choice of Technology

Once you have decided what you want to do for your distributed systems project, you have a broad selection of tools out there that may or may not help you. Actually, there are so many, that after wasting several hours just by looking through them, you can get a headache. This is what happened to me, so if you are looking into it, let me reduce your burden by summarizing my thoughts. There are couple of approaches for developing a distributed system. First, you can use an existing framework or platform (such as the famous Apache Hadoop) for managing a big portion of work for you. In fact, that would be a preferable approach, it would help to avoid bugs and reduce development time. There are number of such frameworks to choose from:

  • Apache Hadoop is the blockbusting project that contains distributed file-system and map-reduce like distributed computation model. It's written in Java and therefore requires your code to be in Java too (I suppose, Scala would fit too). However, it is still possible to code in different languages once you use Hadoop Streaming API. Hadoop provides a map-reduce programming model, where the data is first devided into groups and assigned to workers and the later collected and "reduced".
  • Disco project is a Hadoop MapReduce alternative developed by Nokia Research Center. It is written in Erlang, but users usually write algorithms for it in Python.
  • Spring Batch is part of a Spring project and used to distributed the workload across computers. It seems that it fits whenever you want to use Java EE and split the work according to master-workers programming model.
  • Gearman yet another framework for developing distributed systems. May be worth to take a look.

Keep in mind that this list is very short (or maybe too short). There is a lot of research in this field, e.g. a lot of researchers try to escape map-reduce paradigm and write systems that improve performance for more difficult computational tasks. Such systems tend to use directed acyclic graphs or so. Examples of these systems would be Dryad or Spark. If you do not want to use the framework, or maybe the framework does not fit for the task you want to solve, you may build your own architecture. For this it may be a good idea to use an actor model or some kind of message passing library. A couple examples of actor model frameworks:

  • Akka the most famous one and is for JVM;
  • Pykka for ones using Python;
  • Theron for ones using C++.

Message passing libraries:

If nothing touches your heart then at least you may want some tools for building your network protocols. Tools that could help serialize and de-serialize objects you send over network, such as:

  • Protobuf has support for Java, C++ and Python and a very good documentation;
  • Apache Thrift supports many languages (including Python, Ruby, Java, C++), however does not have as good documentation as protobuf has.

Up until now, I don't really know what I will be using for my project, however my heart falls down to custom protocol thingie! :)