nlp - How do I get started with information extraction? -


If it comes to information extraction, then I am a newbie. Over the past several days, I have read a lot of academic papers and have ordered a book on NLP. I want to know how can I create a system like Flipgad Com (hopefully not scratch) they get more than 60,000 companies from the web sites to get the job openings. How do I get started?

I am open to learning any programming language. Has anyone used Mallet / Gate / Minor Third or Roadrunner? Ideally, I want to be able to train a system with data set specifically for my domain and I want to remove the information based on it.

Thanks

Fast way to remove job offers (from websites Using a Web Service) You can read Dapar very easily to remove the data using the visual editor, it works very well on the tables on your targeted websites.

To know the information extraction, I suggest starting with this is a Java Framework for Information Extraction, so you do not need to learn architectural specifications of the structure, such as Gate or Apache UIMA . On the LingPip website, you will find a lot of tutorials that will help you learn various information extraction approaches. After that, I suggest getting to know the gate and the UIMA.

If you want to realize such a website, you will find the Web crawler frameworks (eg,), Web Search Engine (), and information to provide a search service at the top of the retrieved engine (like,).

Update:

For Python, it's best to start with:


Comments