Python and Grad Thesis

Just read some blogs: The Waiting Time Paradox and The Inspection Paradox is Everywhere.

The interesting things they stated are that:

The difference is substantial: In this dataset, the average user has 44 friends; the average friend has 104, more than twice as many. And the probability that your friend is more popular than you is 76%.

The reason is that if you pick a friend, they already has a friend: you. That is a distortation in a networking context.

My Grad Thesis

Well, I was network engineering major. When preparing grad theisis topics, professor like to recommend us topics related to network; at least in a word-like manner. So I get a topic with graph database - to analyze similarity between patents.

I didn’t finish that in the end. I don’t know why, but I always feel that that topic was wrong. However the professor pushed another student to finish it next year.

Looking back now, with more knowledge, I somehow get the reason.

Similarity is not transive

Well, patents are created using technologies. If they are using the same technology, then, in the text, they have many common words. But they may be two completely different products.

NLP is a hard area

A similar article the professor recommend to me is a “using knowledge base to analyze people’s relations in a novel”.

In that article, the author extracted who is who’s father, daughter, son, etc.

However, things are not that simple - they never stated that clearly in the novel. So you have to use NLP - nature language processing. Which is an area totally different. I think the author mostly built that maually.

Python is not a language for deduction

There is some language used specifically for that: Coq and Prolog. Also, I think neo4j is not a real database, just a tool used to earn some citing.

A help call from Senior

Well, that concludes my opinions.

I also have a senior struggled with her grad thesis. Well, she is doing simulation - using Python.

Python is not a tool for simulation. Simulation will process large amount of data, and speed is key.

To do that, you will need C++. But C++ is a hard language; I spend years and I’m just at the beginning.

I remembered that I talked with her about using Cython; but Cython is just another language with the syntax similar to Python.

The end?

Python is a glue language. What is it good for? convenient tooling, websites, data analysis and visualization.

To do complex and deep topics, you need deep knowledge.

My Grad Thesis#

Similarity is not transive#

NLP is a hard area#

Python is not a language for deduction#

A help call from Senior#

The end?#