Pages

Hadoop Interview Questions




Below are some questions that have been recently asked in top I.T companies and these are something that you should definitely be prepared with, if you are appearing for a Hadoop Administrator Interview.


Comment below on what you think the answers could be for these questions.





What is your cluster configuration?
How many master and slave nodes are there in your cluster?
What are the components in your cluster?
Do you have Nosql database?
What is Hive?
When will you use Hbase and when Hive?
Any experience in shell scripts?
Have you ever come across garbage collection, out of memory error?
How much memory have you given to your data nodes?
How many jobs can you run in your cluster simultaneously?
How much data is incorporated every day in your cluster?
What is Kafka? What is Flume?
Difference between Kafka and Flume?
If you shutdown your cluster will you be able to come to your Cloudera manager page?
Even if you shutdown the Cloudera agent completely and when you go to Cloudera page it gives you error page not found. How will you bring up your Cloudera and your cluster?
What is performance tuning?
What if a job fails, How will you troubleshoot?
What type of scheduling are you guys using?
How many queues are there for scheduling jobs?
How do you decide which job will go to which queue?


Explain the working of HDFS in HA in depth.
Write permission on Journal node
What kind of fencing you have configured on your cluster.
How EPOC Fencing works
How SSH Fencing works
How SHELL/BASH  Fencing works
What is BASHRC ? what is written there?
What variable are entered in Bashrc?
Which file is used for User Management.
What are the 2 important files used for user managerment for hadoop?
If I have to make changes to 1000 user on Hadoop Cluster which user management file i will use (this is apure linux file)

Explain in detail about working of YARN
How the Container is created and on what basis
How the containers are allocated and how we decide container size as per job and RAM  and how yarn slots are getting allocated.
When a job is suppose to be run , who provides the details of the resource configuration or where to run the job on data node to the client.
What is scheduling used in YARN
What algorithm used by YARN scheduling
Type of Schedulers

Explain the working of KAFKA
what are the errors you faced during Kafka installation
What details you have to take care while instaling Kafka

Linux messaging service you have ever used.
Is you cluster in HA environment
What are the components which need to have HA configured

what technologies you use for Hadoop security
Which tool u have used for Hadoop cluster governance

Have you worked or used Atlas?
Have you configured Sentry or Ranger?

Example of Row level filtering done on Ranger  like File table and columns
user in US should be able to view one particular View

How was the rule was written?
What logic was used , what are the different logics used for writing these security filter
Tell me the filter you wrote for security at database level

5 location , all are in same group , so how do you write the filter to make sure that a person from one location can access data of specific location, even though they are in same group

Have set up cluster from scratch
What is  the biggest cluster you have set

Explain Raid
What is combination of Raid 1 and Raid 0?
how would raid 0 will give redundancy?
Does stripping happens in Raid 0?

How do you use RAID in operating system
Explain LVM
Explain working of EXT3 file system
Explain working of any linux file system
Have u worked on server level configuration on Linux
Do you know DNS
Explain the working of DNS (How it  works?)
How do you get Host name resolved in hadoop cluster
what DNS is used for small cluster (Etc/host)
For Big Cluster we use the AMAZON DNS OR we create our own DNS server.
Explain How this DNS server is configured.
did you  work on NFS storage Sambha



Suppose you are given a baremetal box, and on that you have to install Hadoop Cluster, then what are the details you have to take care,
Explain everything in detail which you will do to set up operating system and then Hadoop

Prerequisite at operating level
What is swappiness
What is swap space and how it is utilized in Linux
Full form of SELINUX
Full form of THP

In Redhat NTP service is replaced by something else, what is that called?
It is with some greater features

What is FQDN
Is it one of the prerequiste for Hadoop

Explain what is SELINUX and what kind of security it provides to kernel
How does selinux akes care from user level
What kind of warning and errors you get when you disable selinux

Once you install Hadoop and then activate Selinux what errors you get in Cloudera
what are the best practice followed to keep the hadoop cluster up and running

Suppose you need to set up  50 nodes clutser , how much time you will take to set up complete hadoop cluster including Operating system
How will you deploy  O/S  on 10 nodes
Common issues trouble shooted on the cluster and how u have trouble shooted it
what are the counters in hadoop jobs?


Hadoop Interview Questions Hadoop Interview Questions Reviewed by Admin on March 29, 2018 Rating: 5

2 comments:

  1. Great content.
    Thank you for these Hadoop Admin Interview questions admin.

    ReplyDelete
  2. These questions seems valid to me..
    Good job.

    ReplyDelete

Powered by Blogger.